The NCI Data Catalog is a listing of data collections resulting from major NCI initiatives and other widely used data sets. Data collections in the catalog meet the following criteria:

  • Products of NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
  • Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
  • Well documented and available for download

Although this is not a comprehensive listing of data sets available from NCI, you can expect quarterly updates.


Categories of data sets include:

CategoryData Collection NameDescription
BiospecimensBiospecimen Research Database (BRD)

BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science.
Each literature curation captures the following relevant parameters:

  • Biospecimen investigated, analyte(s) of interest, and technology platforms employed
  • Pre-analytical factors investigated
  • An original summary of relevant results

You can find SOPs in a system with Biospecimen Evidence Based Practices (BEBP).

Cancer Screening TrialCancer Data Access System (CDAS)

CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data.
The data includes a:

  • summary of the trial.
  • description of the data collected.
  • searchable list of research projects and publications.
ClinicalClinical and Translational Data Commons (CTDC)

CTDC provides access to a vast array of clinical and translational data from NCI-funded clinical trials, correlative studies, and interventional studies. While some data sets are publicly accessible, others are registered-access and controlled-access tier.

August 2024 Update:
The CTDC has launched! Currently, you can find the Cancer MoonshotSM Biobank data set in the CTDC, which includes clinical data shared by participants throughout their standard of care treatment at participating U.S. medical institutions. Stay tuned as CTDC expands its data collection over the coming months!

ClinicalNCI National Clinical Trials Network (NCTN)/Community Oncology Research Program (NCORP) Data Archive

This centralized, controlled-access database is a repository of de-identified, patient level data from phase III clinical trials conducted by NCI’s NCTN network, NCI’s NCORP network, and the Canadian Cancer Trials Group (CCTG).

After approval of a signed Data Use Agreement (DUA), you can download patient level clinical data sets and their associated data dictionaries.

ClinicalNCTN BiobanksMake a request for well-annotated biospecimen samples, derived from phase II and phase III NCTN clinical trials. For your secondary research studies, use the newly available “NCTN Biospecimen Catalog” for comprehensive data searches.
Clinical, GenomicsPersonalized Cancer Therapy

The Personalized Cancer Therapy website is a tool for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site.

The available data includes:

  • study association of a detected genomic alteration with tumor development and growth.
  • association of genomic alteration to increased or decreased response to therapy.
  • availability of relevant Food and Drug Administration (FDA) approved therapies or investigational agents in clinical trials.
Drug DiscoveryCellMinerCDB: National Center for Advancing Translational Sciences (NCATS)The NCI Center for Cancer Research developed this database that details how 2,600+ different compounds affect cancer cell growth. NIH NCATS tested 183 cancer cell lines, and you can find drug response data from that work.
Drug DiscoveryNCI Panel of 60 Human Tumor Cell Lines (NCI-60)

You can find/analyze NCI-60 in CellMiner. NCI’s Developmental Therapeutics Program used this panel of 60 diverse human cancer cell lines to screen over 100,000 chemical compounds and natural products since 1990.

You can download gene expression data files from NCI-hosted FTP sites:

  • Gene Expression 1
  • Gene Expression 2
EpidemiologySurveillance, Epidemiology and End Results (SEER) database

SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 50% of the U.S. population.

The SEER database includes incidence and population data associated by:

  • Age
  • Sex
  • Race
  • Year of diagnosis
  • Geographic areas

You can access SEER statistics and the Division of Cancer Control and Population Sciences (DCCPS) data resources on the DCCPS website.

EpidemiologySEER - CAHPS

The SEER-CAHPS data resource links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program, the Centers for Medicare & Medicaid Services’ (CMS) Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient experience surveys, and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries.

You can request data access by emailing the required documents to NCISEERCAHPS@nih.gov.

EpidemiologySEER - MHOS

The SEER-MHOS database links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program and the Centers for Medicare & Medicaid Services (CMS) Medicare Health Outcomes Survey (MHOS) that provides information about the health-related quality of life (HRQOL) of Medicare Advantage Organization (MAO) enrollees. NCI and CMS sponsor the database.

You can email the required documents to SEER-MHOS@hcqis.org to request access to the data.

GenomicsAll of Us Researcher WorkbenchYou can register to access data and tools including whole genome sequencing (WGS) and genome-wide genotyping data. The NIH-wide All of Us initiative collects this data.
GenomicsNCI Genomic Data Sets Available in Database of Genotypes and Phenotypes
(dbGaP)

NCI developed dbGaP to archive and distribute the data and results from studies that investigated the interaction of genotype and phenotype in humans. You can request controlled access to data from over 150 NCI studies by following the instructions.

July 2024 Update: Request access to recently released data referenced in the study, “Childhood Cancer Data Initiative (CCDI): Single-Cell Atlas of NF1 Nerve Sheath Tumors.” The data includes 63 clinically annotated NF1-associated peripheral nerve sheath tumors.

GenomicsGenomic Data Commons (GDC)

GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG) including:

GenomicsIntegrated Canine Data Commons (ICDC)ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the open access data within the ICDC portal, and you may analyze the associated data files in the Seven Bridges Cancer Genomics Cloud.
GenomicsCancer Genome Characterization Initiative (CGCI)

CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles inform better cancer diagnosis and treatment.

  • You can access CGCI data through the project data matrix.
GenomicsInvestigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1

The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer.

  • You can download gene expression data files from an NCI-hosted FTP site.
GenomicsMolecular Targets for Cancer

Researchers have measured thousands of molecular targets in the NCI panel of 60 human tumor cell lines. You can search for or browse through a list of targets.
Measurements include:

  • Protein levels
  • RNA measurements
  • Mutation status
  • Enzyme activity levels
GenomicsNCI Brain Neoplasia Data (Rembrandt Database)

NCI Brain Neoplasia Data (Rembrandt Database) integrates clinical and functional genomics data from clinical trials involving brain tumor patients and provides the ability to perform ad hoc querying, reporting and analysis across multiple data domains, including gene expression, gene copy number and clinical data.

  • You can download gene expression files from a NCI-hosted FTP site.
GenomicsTARGET: Therapeutically Applicable Research to Generate Effective Treatments

TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications.

The TARGET data matrix includes:

  • genomic data.
  • clinical information that doesn’t identify patients.
GenomicsThe Cancer Genome Atlas (TCGA)

The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. Through the TCGA Data Portal, you can search, download, and analyze data from over 30 different types of cancer. It contains:

  • clinical information that doesn’t identify patients.
  • genomic characterization data.
  • high level sequence analysis of the tumor genomes.
GenomicsThe NCI Director’s Challenge Adenocarcinoma Lung Study

A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The study looked at whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) predicted overall survival in lung cancer subjects.

ImagingImaging Data Commons (IDC)

IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. IDC uses the Digital Imaging and Communication in Medicine (DICOM) standard to harmonize the data. You can find imaging data from the following NIH/NCI projects:

September 2024 Update: The repository now includes over 25,000 newly curated and harmonized pathology images. These include slides from the Cancer Moonshot Biobank and CCDI’s Molecular Characterization Initiative.

ImagingThe Cancer Imaging Archive (TCIA)TCIA is a curated archive of medical images that you can download. It includes data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). You can find data divided into collections and grouped by common cancer types or research aims. You can also search these collections by modality, anatomic location, or various acquisition parameters. You can access pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other available supporting data.
ImagingSLICE-3D

Use this data set, which contains more than 400,000 skin lesion image crops extracted from 3D total body photography (TBP), for skin cancer detection. 

Metadata entries include the following:

  • Age
  • Sex
  • General anatomic site
  • Common patient identifier
  • Clinical size
  • Various data fields from the TBP Lesion Visualizer
MultipleCancer Data Service (CDS)CDS provides you with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (current majority is genomic and imaging data) in the CDS that are both open and controlled access. Prior to requesting access though, consider searching and browsing the data (no login required) via the CDS Portal.
Nanomaterial CharacterizationscaNanoLab

caNanoLab includes over 1,000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications

  • You can perform web-based queries and download reports for re-use and additional analysis.
NetworksThe Network Data Exchange
(NDEX)

NDEx allows you to share, store, manipulate and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium.

  • Access published networks aggregated from several pathway and interaction databases.
  • Create your own networks to use, share, or publish.
Pediatric, Adolescent, and Young Adult (AYA)Childhood Cancer Data Initiative (CCDI) Childhood Cancer Data Catalog (CCDC)

The CCDI CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data.

December 2024 Update: The Catalog now includes two new resources—the Specimen Resource Locator and the Pediatric Malignancies: Inventory of DCEG Research—alongside several new data sets from existing resources:

  • Cancer Research Institute iAtlas
  • Pancancer Analysis of Whole Genomes (PCAWG)
  • The Cancer Genome Atlas Program (TCGA)
  • cBioPortal’s Pediatric European MAPPYACTS Trial, Gastrointestinal Stromal Tumors, and Mature B-Cell Neoplasms
  • Imaging Data Commons CCDI Molecular Characterization Initiative (MCI)
  • Single-cell Pediatric Cancer Atlas (ScPCA) Single-cell RNA sequencing of diverse pediatric leukemias
  • Identification of drug-resistance-related cell states in paired pre- and post-treatment neuroblastoma PDXs

Visit the catalog website for full details.

Pediatric, Adolescent, and Young Adult (AYA)CCDI Hub Resources

CCDI Hub Explore Dashboard: This integrated tool provides you with the search functionality to connect participants with files and samples. It enables you to find data within a single study or across multiple studies, and create synthetic cohorts based on filtered metrics of interest (i.e. demographics, diagnosis, samples).

December 2024 Update: In addition to both new and updated data sets, discover enhancements to this tool as well as new features both in the dashboard and My Files Cart.

CCDI Molecular Target Platform (MTP): Use this tool to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers.

Childhood Cancer Clinical Data Commons (C3DC): This open access, web application allows you to find harmonized demographic and clinical data, create custom cohorts, and download data for local analysis.

November 2024 Update: C3DC’s latest release includes expanded data sets, enhanced tools, and an improved user experience, making pediatric clinical data more accessible and valuable for your childhood cancer research.

CCDI Data Federation Resource: Search for de-identified individual-level data through the API, which provides an open access subset of the metadata and gives you the location of the complete data set. To determine data access, check the policies for the resource that submitted the data (currently includes the Kids First Data Resource Center, the Pediatric Cancer Data Commons, St. Jude Cloud, and the Treehouse Childhood Cancer Data Initiative).

ProteomicsProteomic Data Commons (PDC)

You can get open access, highly curated, and standardized biospecimen, clinical, and proteomic data from PDC. You can analyze PDC data files using tools found in the CRDC Cloud Resources. Data sets include the following:

August 2024 Update: PDC released six new studies on non-clear cell renal cell carcinomas, plus a new metabolomics feature that includes mass spectrometry-based CPTAC metabolomic and lipidomic data. Visit the PDC for more details.

ProteomicsThe Clinical Proteomic Tumor Analysis Consortium (CPTAC)CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome.
The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs).
Target DiscoveryCancer Target Discovery and Development (CTD2)

CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are open access.

Updated:
Vote below about this page’s helpfulness.