Cancer Data Science Pulse

The Cancer Data Science Pulse blog provides insights on trends, policies, initiatives, and innovation in the data science and cancer research communities from professionals dedicated to building a national cancer data ecosystem that enables new discoveries and reduces the burden of cancer.

In this blog, Dr. Elana J. Fertig describes how she is using artificial intelligence, blended with spatial and single cell technologies, to better understand how cancer will respond to treatment. Predicting the changes that occur in the tumor during treatment may someday enable us to select therapies in advance, essentially stopping the disease in its tracks before it reaches the next stage in its evolution.

To celebrate the “International Day of Women and Girls in Science,” we asked CBIIT’s Associate Director for Informatics and Data Science, Dr. Jill Barnholtz-Sloan, to share her experience as a woman in data science. She tells about her journey to CBIIT and underscores the value of adding women’s perspectives to the data science field.

Staying afloat in today’s torrent of data sets, tools, and applications calls for both a deep knowledge of cancer, as well as the know-how to apply highly specialized technological solutions. A new resource from NCI’s Informatics Technology for Cancer Research program gives researchers, with varying skills and experience, the training they need to manage technology-driven approaches to cancer research and care.

Converting the many petabytes of cancer data available on the cloud from information to answers is a complex task. In this blog, Deena Bleich shares how the ISB Cancer Gateway in the Cloud (ISB-CGC), an NCI Cloud Resource, hosts large quantities of cancer data in easily accessible Google BigQuery tables, expediting the process.

This blog offers a primer on semantics, a topic that has broad implications for the biomedical informatics and data science fields. Here, Gilberto Fragoso, Ph.D., describes the structures that serve as a foundation for data science semantics. Those systems help improve data interoperability, allowing researchers to query, retrieve, and combine very different data sets for more extensive analysis.

In this blog, University of Maryland's Mrs. Aya Abdelsalam Ismail examines the use of Deep Learning in medical applications, especially as a means for following a disease or disorder over time. She’ll describe how a “wrong turn” in her research on forecasting Alzheimer’s Disease led her to question her model’s performance. Her findings are particularly relevant for Deep Learning models in the cancer field, which use images obtained from patients, often at different points in time.

Allen Dearry, Ph.D., will retire from NCI’s CBIIT on December 31. Here Dr. Dearry reflects on his 31 years at NIH, including his role in helping to establish the Cancer Research Data Commons. He also offers advice for people just entering the field and describes what he’s planning to do next.

In this latest Data Science Seminar, Jim Lacey, Ph.D., M.P.H., shares the lessons he learned in transitioning a large cancer epidemiology cohort study to the cloud, including the importance of focusing on people and processes as well as technology. Project managers, principal investigators, co-investigators, data managers, data analysts—really anyone who is part of a team that wants to use the cloud or cloud-based resources for their studies—should attend.

The diversity, complexity, and distribution of data sets present an ongoing challenge to cancer researchers looking to perform advanced analyses. Here we describe the Cancer Genomics Cloud, powered by Seven Bridges, an NCI Cloud Resource that’s helping to bring together data and computational power to further advance cancer research and discovery.

Get to know David Kepplinger, Ph.D., who will present the next Data Science Seminar, “Robust Prediction of Stenosis from Protein Expression Data. ” In this Q&A, he describes who should attend the talk, how his topic relates to cancer, and why it’s important to delve into unexpected data values when conducting biostatistical analysis.