Cancer Data Science Pulse

Complete Your Research Project with Tips from a Cancer Data Scientist

When we begin a research project, we focus on identifying and studying a hypothesis that aims to understand a natural phenomenon better. No matter what level of experience one may have, pursuing an innovative research project takes a lot of time and effort.

If you’re completing academic studies, a research project may be an opportunity to engage the cancer research community and utilize data science tools and its methods. This could boost your skills and help guide your career to conducting future projects in support of cancer research.  

It starts with developing a research topic or question. It’s important to learn where the existing research findings are on the topic of interest. This enables a researcher to either build new ideas upon what has already been discovered or different scientific approaches that previous studies didn’t attempt. Some examples of topics, in the context of cancer, include:

  • studying a cancer disease progression for certain patients.
  • identifying new biomarkers for early detection/screening.
  • response to a disease treatment by patient groups.
  • developing new therapies.
  • repurposing existing therapies.

A realistic hypothesis to test is one of the most important aspects of a project. The method should be scientifically sound and validated by a scientific team. This includes:

  • patients,
  • biostatisticians,
  • epidemiologists,
  • clinicians,
  • informaticians, and
  • other social scientists (depending on the problem).

Based on my experience as a past professor and current chief of CBIIT’s Clinical and Translational Research Informatics Branch, I’ve curated some lessons I’ve learned that can help you when including data science in your research project.

  1. Pick a realistic project and ask yourself if it is going to address or help an actual problem. If a solution is produced, will it help people with cancer or clinicians?
  2. Understand the problem being studied. Despite the benefits of data science, produced solutions may not be scientifically sound without understanding the problem. In my experience, if you ask the wrong question, data science will give you the wrong answer.
  3. Recognize the limitations of data. It’s important to know if the data in hand is sufficient to answer your question and if it is believable (regarded as true, real, and credible data). Also, respect the preferences of the people with cancer who provided the data.
  4. Know what biostatistics or artificial intelligence (AI) approaches can and cannot do. You cannot expect miracles from AI when going into a project. AI methods (and some biostatistics foundations) may have capabilities that may save time or change the way you structure your problem.
  5. Keep the unintended harm in mind and avoid it. A data scientist should ask if the model is going to help all the patients. For example, is the data representative of the target patient group? See how you can eliminate or minimize bias for underrepresented groups.

Data science-focused research projects can bring new opportunities to cancer research not observed or made previously possible. For example, new AI approaches could reveal undiscovered insights due to their ability to process massive amounts of data that may not be humanly possible in a reasonable time. Additionally, there are many large data networks and data sets, especially within NCI, that enable research teams to access hundreds of thousands of data points to test their hypotheses.

With the advent of real-world data and real-world evidence, combining data collected during the standard of care practices with patient-reported outcomes now presents immense opportunities for new discoveries. This wouldn’t be possible without the help of utilizing data science

If you have any questions about incorporating data science into your research project, leave your comments below!
Clinical and Translational Research Informatics Branch Chief, Informatics and Data Science Program, Center for Biomedical Informatics and Information Technology
Older Post
Career Confessions: 5 Things a Research Fellow Learned in a Cancer Data Scientist Lab
Newer Post
Trusting the Data—A Look at Data Bias

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.


Enter the characters shown in the image.

The insights provided by the cancer data scientist are invaluable for anyone embarking on a research project, particularly in the field of cancer research. The emphasis on selecting a realistic project that addresses a genuine problem and benefits cancer patients or clinicians is crucial. It's essential to ensure that the research question aligns with the existing knowledge and scientific understanding of the problem.

Understanding the limitations of the available data and acknowledging the preferences of the individuals who provided the data are essential aspects of conducting responsible and ethical research. Data scientists should also be aware of the capabilities and limitations of biostatistics and AI approaches, setting realistic expectations and avoiding overreliance on technology.

Considering potential unintended harm and addressing biases in the data are critical for ensuring fair and inclusive research outcomes. It is imperative to evaluate whether the model or methodology being used represents the target patient group accurately and to minimize any bias that may affect underrepresented groups.

The integration of data science in cancer research opens up new opportunities for discovery and analysis. AI approaches, in particular, enable the processing of vast amounts of data that would be challenging for humans to handle within a reasonable timeframe. The availability of large data networks and datasets, coupled with real-world evidence, provides researchers with access to a wealth of information to test hypotheses and uncover novel insights.

Considering all these factors, my question for further exploration would be: How can researchers effectively balance the power and potential of data science with the ethical considerations and human-centric approach necessary for impactful and responsible cancer research?
Thank you for your comment and for your interest. The responsible and effective use of data science tools, including artificial intelligence and machine learning, is an important consideration in cancer research. It’s an evolving dynamic, and one we know readers want to learn more about! We will continue to share relevant blogs and news articles, and hope you’ll be sure to subscribe to our weekly email so that you don’t miss the latest: