Cancer Data Science Pulse
For the Love of. . . Data! The Conversation Continues with NCI Staff
In celebration of NCI’s 50th anniversary and recognizing the Power of Data, NCI and CBIIT staff are blogging about what data means to them and to the field of cancer research. In this post, we feature Eytan Ruppin, M.D., Ph.D., chief, Cancer
What made you fall in love with data?
Eytan: This was simple. I recognized that analyzing data is one way to really advance our understanding of biomedicine, and it offers a huge return on the investment. In my research in the Center for Cancer Research, we’re using multi-omics data approaches to find new and better ways of predicting and testing novel drug targets and biomarkers to treat cancer more effectively.
Subhashini: I would say the opportunity to work as a Program Officer on large-scale translational research program like NCI’s Cancer Target Discovery and Development (CTD2) Network helped me realize the importance of data, and especially data sharing. The Network developed its own data sharing policy before the NIH Genomic Data Sharing policy came out in 2015. The CTD2 set up an area to share the data prior to publishing the results within the Network to accelerate the discovery process. As a bench scientist, I used to think that publishing manuscripts was the only method for data sharing. I didn’t realize the importance of broad data sharing and, most importantly, sharing the results from failed experiments. Sharing these results along with the experimental approaches is important for moving the science forward. It helps other scientists avoid repeating time-consuming experiments and keeps them from making the same mistakes, ultimately saving time, effort, and valuable resources.
Emily: I agree. Becoming a program officer for NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) helped me fully appreciate the power of data. Collaborating with precisionFDA on a data challenge using CPTAC’s multi-omics cancer data sets also showed me the importance of data and data integration. Before coming back to NCI’s CBIIT, I worked for the health informatics office at the U.S. Food and Drug Administration, where I managed data-related projects mostly focused on ensuring next-gen sequencing and clinical data in the context of informing regulatory decisions. Now back at NCI, I have an opportunity to see the full picture of how integrating multimolecular data with clinical annotations is advancing what we know about the pathophysiology of cancer.
Personally, the impact of data really hit home several years ago when my mom was diagnosed with early stage breast cancer at age 75. Since our family had no cancer history, it was the first time I witnessed standard-of-care cancer diagnosis and treatment in real time with data. Data are information. Information is power. With precision oncology approaches in practice, personalized treatment is bound to improve patient survival and quality of life.
What do you think has been the single greatest accomplishment in data science over the past 50 years of cancer research?
Eytan: Without a doubt, I think it was the assembly of the human genome. That’s the event that truly launched the power of data.
Emily: I’d add to that a few more recent advances such as integrating multi-modality data, developing machine learning and artificial intelligence methodologies for cancer research (e.g., imaging data analysis), developing and adopting the FAIR principles—a federated data ecosystem for interoperability and cancer data harmonization, as well as high-performance computing in the cloud.
Subhashini: I’d also add The Cancer Genome Atlas (TCGA) program as one of the greatest accomplishments in the history of cancer research data science. That landmark program generated more than 2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data. Those data are publicly available for anyone in the research community to use.
As NCI embarks on the next 50 years, can you offer any practical tips or advice that should be considered?
Eytan: Everyone talks about “Big Data,” but in reality, the availability of big data with clinical relevance in cancer research is still much too small. To advance on that, we first need to take steps to ensure that all data generated with public grant money are made available to all. Second, we need to invest more in collecting detailed molecular and clinical data from patients, both before and during treatment, in clinical settings and in clinical trials. In my opinion, these two steps have the potential to revolutionize cancer treatment and care.
Emily: I think we need to engage more people, right from the start. We need a variety of stakeholders to brainstorm and come up with innovative ideas on new programs and infrastructure/framework in a coordinated way so that research outputs aren’t developed in solo and will benefit the broader research community.
Subhashini: I’d like to see us focus on education and outreach, especially around data sharing. We need to alert the field on new policies and help everyone involved in research understand the benefits of data sharing; for example, we should encourage the use of digital object identifiers, adopt data citation methods, and give guidance on obtaining patient consent for data usage.