Cancer Data Science Pulse

Data Sets

Whether you are in the data science field, interested in developing computational solutions for clinical oncology, or a clinical researcher, we’ve curated a list of data sets, tools, and learning resources to showcase how these disciplines can and are working together to empower cancer research.

Are you familiar with TCGA? This landmark data set maps the genomic profiles of 33 cancer types and subtypes. Learn how this rich data collection helps researchers like you better understand the molecular features associated with cancer.

Are you new to the cancer research lab and have realized how important it is to have basic data science knowledge? See how many of these cancer data science questions you answer correctly. After, you can use our training resources to improve your score!

We’re celebrating “Love Data Week” by featuring scientists who love data—especially diverse data. In this blog, scientists tell why they love diverse data and offer tips for increasing diversity in your research data.

How can data science support your cancer research? Explore this helpful quick start guide to find out! We’ll show you an overview of how data science enhances cancer research and how you can get started applying it to your work.

Are you researching genomic abnormalities? Bioinformatician Deena Bleich gives an overview of the online tool, “Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer,” and how it can help you analyze genomic data.

In this blog, we’re spotlighting how researchers can leverage FireCloud, one of NCI’s Cloud Resources, for accessing data, running analysis, and collaborating with others in the cancer research community.

Read the blogs that topped our charts in 2022 and see if your favorite made #1!

Learn more about new streamlined access to broad-use data sets within the database of Genotypes and Phenotypes (dbGaP).

In honor of National Lung Cancer Awareness Month, we’re highlighting the “data deets” (details) for the National Lung Screening Trial, a large-scale effort that collected imaging data for more than 53,000 heavy smokers. In this blog, we’ll cover the research that drove this data, specific metrics about the data set, how to access it, and some of the exciting data science projects using the data.