Predictive Modeling: The Basics

Predictive Modeling: The Basics

What is Predictive Modeling?

Predictive modeling builds a mathematical description of a process to make accurate, data-driven predictions about future outcomes. This contributes to:

  • increased patient treatment and care,
  • improved clinical decisionmaking, 
  • created risk models to assist with cancer prevention, and
  • expanded fundamental understanding of cancer etiology.

Why is Predictive Modeling Important for Cancer Research?

Predictive modeling uses advanced numerical methods, mathematics, and computer science to help researchers like you anticipate what might happen within the realm of cancer care. Predictive modeling is essential in oncology because many early-stage cancers don’t show symptoms. Consequently, doctors often rely on predictions to decide if a patient should undergo treatment. Fortunately, this prediction process has evolved over time to the point that a model can consider an individual’s genomics without compromising personal privacy. (Try your own data in this tool that NCI staff members helped develop.) 
Examples of predictive modeling used in the cancer research field include, but are not limited to, the following:

  • Drug development and discovery identify potential anticancer compounds and drug candidates. 
  • Risk prediction models incorporate factors such as age, genetics, lifestyle, and medical history to estimate an individual’s risk of developing a particular cancer.
  • Researchers use predictive modeling to perfect radiation therapy planning.

These predictive models are like powerful calculators that help us better understand a patient by considering factors such as patient information, genetics, and treatment history. There are two types of models: mechanistic and non-mechanistic. The former relies on mathematical descriptions of the disease process that are put to the test by the accuracy of the predictions. The latter includes a wide variety of techniques, ranging from training artificial intelligence engines to describe the relation between variables to forecasting entirely based on past occurrences. (Try your own data in this tool that NCI staff members helped develop.)

What Do I Need to Know? 

Fundamental Tips for Practicing Predictive Modeling

  • Have an interest in data analysis.
    • Predictive modeling involves working with large and complex data sets. You’ll enjoy this stage of the data science lifecycle if you appreciate digging into data, finding patterns, and drawing insights from numbers.
  • Develop a programming habit.
    • Coding doesn’t have to be intimidating, but if you can learn, predictive modeling often involves writing and implementing algorithms. Proficiency in JavaScript, Python, or R can be highly beneficial.
  • Don’t be afraid to learn new mathematical methods and devise novel statistical procedures.
    • When understanding algorithms and interpreting the results in physics, Galileo Galilei, astronomer, physicist, and engineer, once said, "Mathematics is the language of nature.” It really is!
  • Be ready for a team-driven, hands-on challenge.
    • Predicting real-world outcomes often requires creativity in adapting and combining methods to suit the specific problem. Approaching it as collaborative interdependence is key.
  • Find an open-source repository/community.
    • Share what you do and discover what others are attempting through data science communities. For example, GitHub lets you host your live applications, serve packages without loss of attribution, manage your projects efficiently, collaborate effectively, showcase your skills, and be a part of a lively community of data scientists. NCI has multiple GitHub webpages with hundreds of repositories to search through.
  • Become nimble and ever-present.
    • Predictive modeling is constantly evolving with new algorithms, tools, and data sources. Find a computational environment where you are comfortable; computing moves you from being a consumer to creating opportunities for secondary data analysis. 

NCI Predictive Modeling Resources and Initiatives

Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the data science lifecycle.

Resources and Tools

  • Genomic Data Commons (GDC): The NCI GDC uses predictive modeling techniques to analyze vast amounts of genomic data from various cancers. This helps identify genetic mutations and alterations contributing to cancer development, leading to insights for targeted therapies and precision medicine.
  • Imaging Data Commons (IDC): NCI’s IDC makes predictive modeling tools available for analyzing and interpreting medical images.
  • SEER Cancer Statistics: NCI’s Surveillance, Epidemiology, and End Results (SEER) program uses predictive modeling to estimate cancer incidence, mortality, and survival rates. These data help researchers and policymakers understand cancer trends, allocate resources, and develop effective prevention and treatment strategies.
  • Predictive Oncology Model and Data Clearinghouse (MoDaC): MoDaC is a data repository and model clearinghouse, which consists of predictive oncology data sets and mathematical models (such as machine learning and deep learning models) developed within NCI and in collaborative programs.


Return to the previous stage
Exploring and Analyzing Data
Continue to the next stage
Visualizing Data
Vote below about this page’s helpfulness.