Cancer Data Science Pulse

Training

We asked 8 of our data scientists across the National Cancer Institute to share their advice and career journeys to help you answer the question—what should I know to start my career in cancer data science?

Meet the people who are breaking new ground in the data science field, whether it’s a new tool, a new model, or a completely new way of using data. Here, we’re featuring Svitlana Volkova, Ph.D., chief scientist at Pacific Northwest National Laboratory. She’ll describe how she’s using “foundation models” to give scientists and analysts a new tool for unleashing the power of artificial intelligence (AI).

Whether you are in the data science field, interested in developing computational solutions for clinical oncology, or a clinical researcher, we’ve curated a list of data sets, tools, and learning resources to showcase how these disciplines can and are working together to empower cancer research.

CBIIT’s NIH Data and Technology Advancement (DATA) Scholar, Dr. Jay G. Ronquillo, offers a bird’s-eye view of cloud computing, including tips for managing costs, access, and training to help advance precision medicine and cancer research.

In this blog, Dr. Elana J. Fertig describes how she is using artificial intelligence, blended with spatial and single cell technologies, to better understand how cancer will respond to treatment. Predicting the changes that occur in the tumor during treatment may someday enable us to select therapies in advance, essentially stopping the disease in its tracks before it reaches the next stage in its evolution.

Staying afloat in today’s torrent of data sets, tools, and applications calls for both a deep knowledge of cancer, as well as the know-how to apply highly specialized technological solutions. A new resource from NCI’s Informatics Technology for Cancer Research program gives researchers, with varying skills and experience, the training they need to manage technology-driven approaches to cancer research and care.

Converting the many petabytes of cancer data available on the cloud from information to answers is a complex task. In this blog, Deena Bleich shares how the ISB Cancer Gateway in the Cloud (ISB-CGC), an NCI Cloud Resource, hosts large quantities of cancer data in easily accessible Google BigQuery tables, expediting the process.

This blog offers a primer on semantics, a topic that has broad implications for the biomedical informatics and data science fields. Here, Gilberto Fragoso, Ph.D., describes the structures that serve as a foundation for data science semantics. Those systems help improve data interoperability, allowing researchers to query, retrieve, and combine very different data sets for more extensive analysis.

In this blog, University of Maryland's Mrs. Aya Abdelsalam Ismail examines the use of Deep Learning in medical applications, especially as a means for following a disease or disorder over time. She’ll describe how a “wrong turn” in her research on forecasting Alzheimer’s Disease led her to question her model’s performance. Her findings are particularly relevant for Deep Learning models in the cancer field, which use images obtained from patients, often at different points in time.

In this latest Data Science Seminar, Jim Lacey, Ph.D., M.P.H., shares the lessons he learned in transitioning a large cancer epidemiology cohort study to the cloud, including the importance of focusing on people and processes as well as technology. Project managers, principal investigators, co-investigators, data managers, data analysts—really anyone who is part of a team that wants to use the cloud or cloud-based resources for their studies—should attend.