Cancer Data Science Pulse
Complete Your Research Project with Tips from a Cancer Data Scientist
When we begin a research project, we focus on identifying and studying a hypothesis that aims to understand a natural phenomenon better. No matter what level of experience one may have, pursuing an innovative research project takes a lot of time and effort.
If you’re completing academic studies, a research project may be an opportunity to engage the cancer research community and utilize data science tools and its methods. This could boost your skills and help guide your career to conducting future projects in support of cancer research.
It starts with developing a research topic or question. It’s important to learn where the existing research findings are on the topic of interest. This enables a researcher to either build new ideas upon what has already been discovered or different scientific approaches that previous studies didn’t attempt. Some examples of topics, in the context of cancer, include:
- studying a cancer disease progression for certain patients.
- identifying new biomarkers for early detection/screening.
- response to a disease treatment by patient groups.
- developing new therapies.
- repurposing existing therapies.
A realistic hypothesis to test is one of the most important aspects of a project. The method should be scientifically sound and validated by a scientific team. This includes:
- informaticians, and
- other social scientists (depending on the problem).
Based on my experience as a past professor and current chief of CBIIT’s Clinical and Translational Research Informatics Branch, I’ve curated some lessons I’ve learned that can help you when including data science in your research project.
- Pick a realistic project and ask yourself if it is going to address or help an actual problem. If a solution is produced, will it help people with cancer or clinicians?
- Understand the problem being studied. Despite the benefits of data science, produced solutions may not be scientifically sound without understanding the problem. In my experience, if you ask the wrong question, data science will give you the wrong answer.
- Recognize the limitations of data. It’s important to know if the data in hand is sufficient to answer your question and if it is believable (regarded as true, real, and credible data). Also, respect the preferences of the people with cancer who provided the data.
- Know what biostatistics or artificial intelligence (AI) approaches can and cannot do. You cannot expect miracles from AI when going into a project. AI methods (and some biostatistics foundations) may have capabilities that may save time or change the way you structure your problem.
- Keep the unintended harm in mind and avoid it. A data scientist should ask if the model is going to help all the patients. For example, is the data representative of the target patient group? See how you can eliminate or minimize bias for underrepresented groups.
Data science-focused research projects can bring new opportunities to cancer research not observed or made previously possible. For example, new AI approaches could reveal undiscovered insights due to their ability to process massive amounts of data that may not be humanly possible in a reasonable time. Additionally, there are many large data networks and data sets, especially within NCI, that enable research teams to access hundreds of thousands of data points to test their hypotheses.
With the advent of real-world data and real-world evidence, combining data collected during the standard of care practices with patient-reported outcomes now presents immense opportunities for new discoveries. This wouldn’t be possible without the help of utilizing data science.
Leave a Reply
- Data Sharing (56)
- Genomics (33)
- Informatics Tools (33)
- Data Commons (32)
- Data Standards (29)
- Precision Medicine (23)
- Seminar Series (22)
- Data Sets (21)
- Machine Learning (19)
- Artificial Intelligence (13)
- Leadership Updates (12)
- High-Performance Computing (HPC) (9)
- Training (7)
- Imaging (7)
- Policy (7)
- Funding (5)
- Jobs & Fellowships (4)
- Proteomics (4)
- Semantics (3)
- Publications (2)
- Information Technology (2)
- Awards & Recognition (1)
- Childhood Cancer Data Initiative (1)
Ayushi sharma on May 15, 2023 at 11:13 p.m.
Understanding the limitations of the available data and acknowledging the preferences of the individuals who provided the data are essential aspects of conducting responsible and ethical research. Data scientists should also be aware of the capabilities and limitations of biostatistics and AI approaches, setting realistic expectations and avoiding overreliance on technology.
Considering potential unintended harm and addressing biases in the data are critical for ensuring fair and inclusive research outcomes. It is imperative to evaluate whether the model or methodology being used represents the target patient group accurately and to minimize any bias that may affect underrepresented groups.
The integration of data science in cancer research opens up new opportunities for discovery and analysis. AI approaches, in particular, enable the processing of vast amounts of data that would be challenging for humans to handle within a reasonable timeframe. The availability of large data networks and datasets, coupled with real-world evidence, provides researchers with access to a wealth of information to test hypotheses and uncover novel insights.
Considering all these factors, my question for further exploration would be: How can researchers effectively balance the power and potential of data science with the ethical considerations and human-centric approach necessary for impactful and responsible cancer research?