Cancer Data Science Pulse

Data Sets

Whether you are in the data science field, interested in developing computational solutions for clinical oncology, or a clinical researcher, we’ve curated a list of data sets, tools, and learning resources to showcase how these disciplines can and are working together to empower cancer research.

Data and Artificial Intelligence (AI) are a match seemingly made in heaven. By joining data and AI, scientists are able to shift a lot of the burden associated with using data from human to machine. See why the data-AI relationship works so well for cancer research in this offbeat blog featuring two fictitious characters—Datum and his pal Aida.

At the start of the COVID-19 outbreak, NCI’s Frederick National Laboratory for Cancer Research, along with the Centers for Disease Control and Prevention and the National Institute of Allergy and Infectious Diseases, modified an existing tool used for managing NCI’s clinical trials to create a COVID-19 Seroprevalence Hub (SeroHub) to track COVID-19 seroprevalence across the United States. This blog looks at how SeroHub has evolved since the pandemic first began and shows how it could serve as a blueprint for monitoring future infectious disease outbreaks that threaten public health.

On March 23, Dr. Ben Raphael will present the next Data Science Seminar, “Quantifying tumor heterogeneity using single-cell and spatial sequencing.” In this blog, Dr. Raphael describes how he’s using this technology to dig deeper into the complexity of cancer.

CBIIT’s NIH Data and Technology Advancement (DATA) Scholar, Dr. Jay G. Ronquillo, offers a bird’s-eye view of cloud computing, including tips for managing costs, access, and training to help advance precision medicine and cancer research.

Converting the many petabytes of cancer data available on the cloud from information to answers is a complex task. In this blog, Deena Bleich shares how the ISB Cancer Gateway in the Cloud (ISB-CGC), an NCI Cloud Resource, hosts large quantities of cancer data in easily accessible Google BigQuery tables, expediting the process.

In this latest Data Science Seminar, Jim Lacey, Ph.D., M.P.H., shares the lessons he learned in transitioning a large cancer epidemiology cohort study to the cloud, including the importance of focusing on people and processes as well as technology. Project managers, principal investigators, co-investigators, data managers, data analysts—really anyone who is part of a team that wants to use the cloud or cloud-based resources for their studies—should attend.

The diversity, complexity, and distribution of data sets present an ongoing challenge to cancer researchers looking to perform advanced analyses. Here we describe the Cancer Genomics Cloud, powered by Seven Bridges, an NCI Cloud Resource that’s helping to bring together data and computational power to further advance cancer research and discovery.

On Wednesday, September 22, 2021, Yanjun Qi, Ph.D., from the University of Virginia, will present “AttentiveChrome: Deep Learning for Predicting Gene Expression from Histone Modifications,” in the kickoff of the Fall Data Science Seminar Series. This blog offers insight on Dr. Qi’s research and why this topic is important to her.

To the NCI Cancer Research Data Commons, cloud computing means three words: NCI Cloud Resources. These are real-world examples of making data accessible and available to all cancer researchers. Kicking off the first of a four-part blog series, the NCI Cloud Resources share their origin story and the problems that cloud computing could solve in cancer research.