Cancer Data Science Pulse

Data Sets

Learn more about new streamlined access to broad-use data sets within the database of Genotypes and Phenotypes (dbGaP).

In honor of National Lung Cancer Awareness Month, we’re highlighting the “data deets” (details) for the National Lung Screening Trial, a large-scale effort that collected imaging data for more than 53,000 heavy smokers. In this blog, we’ll cover the research that drove this data, specific metrics about the data set, how to access it, and some of the exciting data science projects using the data.

As NCI recognizes Breast Cancer Awareness Month this October, we highlight several data science resources to assist in your breast cancer research.

This year’s theme for World Cancer Research Day is “support research to prevent cancer and catch it early.” How can data science drive progress in international cancer research? Check out our NCI resources and learn what we have to support those global research efforts.

As NCI recognizes Childhood Cancer Awareness Month this September, we highlight a list of data science resources and tools to aid your pediatric cancer research.

Discover how NIH is working to make generalist repositories (GRs) part of the data sharing ecosystem. The goal is to minimize data sharing barriers while still taking advantage of GR convenience and usability.

Whether you are in the data science field, interested in developing computational solutions for clinical oncology, or a clinical researcher, we’ve curated a list of data sets, tools, and learning resources to showcase how these disciplines can and are working together to empower cancer research.

Data and Artificial Intelligence (AI) are a match seemingly made in heaven. By joining data and AI, scientists are able to shift a lot of the burden associated with using data from human to machine. See why the data-AI relationship works so well for cancer research in this offbeat blog featuring two fictitious characters—Datum and his pal Aida.

At the start of the COVID-19 outbreak, NCI’s Frederick National Laboratory for Cancer Research, along with the Centers for Disease Control and Prevention and the National Institute of Allergy and Infectious Diseases, modified an existing tool used for managing NCI’s clinical trials to create a COVID-19 Seroprevalence Hub (SeroHub) to track COVID-19 seroprevalence across the United States. This blog looks at how SeroHub has evolved since the pandemic first began and shows how it could serve as a blueprint for monitoring future infectious disease outbreaks that threaten public health.

On March 23, Dr. Ben Raphael will present the next Data Science Seminar, “Quantifying tumor heterogeneity using single-cell and spatial sequencing.” In this blog, Dr. Raphael describes how he’s using this technology to dig deeper into the complexity of cancer.