NCI Data Catalog

The NCI Data Catalog is a listing of data collections resulting from major NCI initiatives and other widely used data sets. Data collections in the catalog meet the following criteria:

Products of NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
Well documented and available for download

Although this is not a comprehensive listing of data sets available from NCI, you can expect quarterly updates.

Categories of data sets include:

Biospecimen
Cancer Screening Trial
Clinical
Drug Discovery
Epidemiology
Genomics
Imaging
Multiple
Nanomaterial Characterizations
Networks
Pediatric, Adolescent, and Young Adult (AYA)
Proteomics
Target Discovery

Category	Data Collection Name	Description
Biospecimens	Biospecimen Research Database (BRD)	BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science. Each literature curation captures the following relevant parameters: Biospecimen investigated, analyte(s) of interest, and technology platforms employed Pre-analytical factors investigated An original summary of relevant results You can find SOPs in a system with Biospecimen Evidence Based Practices (BEBP).
Cancer Screening Trial	Cancer Data Access System (CDAS)	CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data. The data includes a: summary of the trial. description of the data collected. searchable list of research projects and publications.
Clinical	Clinical and Translational Data Commons (CTDC)	CTDC provides access to a vast array of clinical and translational data from NCI-funded clinical trials, correlative studies, and interventional studies. While some data sets are publicly accessible, others are registered-access and controlled-access tier. August 2024 Update: The CTDC has launched! Currently, you can find the Cancer Moonshot^SM Biobank data set in the CTDC, which includes clinical data shared by participants throughout their standard of care treatment at participating U.S. medical institutions. Stay tuned as CTDC expands its data collection over the coming months!
Clinical	NCI National Clinical Trials Network (NCTN)/Community Oncology Research Program (NCORP) Data Archive	This centralized, controlled-access database is a repository of de-identified, patient level data from phase III clinical trials conducted by NCI’s NCTN network, NCI’s NCORP network, and the Canadian Cancer Trials Group (CCTG). After approval of a signed Data Use Agreement (DUA), you can download patient level clinical data sets and their associated data dictionaries.
Clinical	NCTN Biobanks	Make a request for well-annotated biospecimen samples, derived from phase II and phase III NCTN clinical trials. For your secondary research studies, use the newly available “NCTN Biospecimen Catalog” for comprehensive data searches.
Clinical, Genomics	Personalized Cancer Therapy	The Personalized Cancer Therapy website is a tool for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site. The available data includes: study association of a detected genomic alteration with tumor development and growth. association of genomic alteration to increased or decreased response to therapy. availability of relevant Food and Drug Administration (FDA) approved therapies or investigational agents in clinical trials.
Drug Discovery	CellMinerCDB: National Center for Advancing Translational Sciences (NCATS)	The NCI Center for Cancer Research developed this database that details how 2,600+ different compounds affect cancer cell growth. NIH NCATS tested 183 cancer cell lines, and you can find drug response data from that work.
Drug Discovery	NCI Panel of 60 Human Tumor Cell Lines (NCI-60)	You can find/analyze NCI-60 in CellMiner. NCI’s Developmental Therapeutics Program used this panel of 60 diverse human cancer cell lines to screen over 100,000 chemical compounds and natural products since 1990. You can download gene expression data files from NCI-hosted FTP sites: Gene Expression 1 Gene Expression 2
Epidemiology	DCCPS Cancer Epidemiology Descriptive Cohort Database (CEDCD)	Use this database for information about cohort studies that follow groups of persons over time for cancer incidence, mortality, and other health outcomes. Information includes: general study information (e.g., eligibility criteria and size). the type of data collected at baseline. cancer sites. the number of participants diagnosed with cancer. biospecimen data. You can access CEDCD on the Division of Cancer Control and Population Sciences (DCCPS) website.
Epidemiology	Surveillance, Epidemiology and End Results (SEER) database	SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 50% of the U.S. population. The SEER database includes incidence and population data associated by: Age Sex Race Year of diagnosis Geographic areas You can access SEER statistics and the DCCPS data resources on the DCCPS website.
Epidemiology	SEER - CAHPS	The SEER-CAHPS data resource links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program, the Centers for Medicare & Medicaid Services’ (CMS) Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient experience surveys, and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries. You can request data access by emailing the required documents to NCISEERCAHPS@nih.gov.
Epidemiology	SEER - MHOS	The SEER-MHOS database links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program and the Centers for Medicare & Medicaid Services (CMS) Medicare Health Outcomes Survey (MHOS) that provides information about the health-related quality of life (HRQOL) of Medicare Advantage Organization (MAO) enrollees. NCI and CMS sponsor the database. You can email the required documents to SEER-MHOS@hcqis.org to request access to the data.
Genomics	All of Us Researcher Workbench	You can register to access data and tools including whole genome sequencing (WGS) and genome-wide genotyping data. The NIH-wide All of Us initiative collects this data.
Genomics	NCI Genomic Data Sets Available in Database of Genotypes and Phenotypes (dbGaP)	NCI developed dbGaP to archive and distribute the data and results from studies that investigated the interaction of genotype and phenotype in humans. You can request controlled access to data from over 150 NCI studies by following the instructions. July 2024 Update: Request access to recently released data referenced in the study, “Childhood Cancer Data Initiative (CCDI): Single-Cell Atlas of NF1 Nerve Sheath Tumors.” The data includes 63 clinically annotated NF1-associated peripheral nerve sheath tumors.
Genomics	Genomic Data Commons (GDC)	GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG) including: The Cancer Genome Atlas (TCGA) Therapeutically Applicable Research to Generate Effective Treatments (TARGET) The Cancer Genome Characterization Initiative (CGCI)
Genomics	Integrated Canine Data Commons (ICDC)	ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the open access data within the ICDC portal, and you may analyze the associated data files in the Seven Bridges Cancer Genomics Cloud.
Genomics	Cancer Genome Characterization Initiative (CGCI)	CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles inform better cancer diagnosis and treatment. You can access CGCI data through the project data matrix.
Genomics	Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1	The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer. You can download gene expression data files from an NCI-hosted FTP site.
Genomics	Molecular Targets for Cancer	Researchers have measured thousands of molecular targets in the NCI panel of 60 human tumor cell lines. You can search for or browse through a list of targets. Measurements include: Protein levels RNA measurements Mutation status Enzyme activity levels
Genomics	NCI Brain Neoplasia Data (Rembrandt Database)	NCI Brain Neoplasia Data (Rembrandt Database) integrates clinical and functional genomics data from clinical trials involving brain tumor patients and provides the ability to perform ad hoc querying, reporting and analysis across multiple data domains, including gene expression, gene copy number and clinical data. You can download gene expression files from a NCI-hosted FTP site.
Genomics	TARGET: Therapeutically Applicable Research to Generate Effective Treatments	TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications. The TARGET data matrix includes: genomic data. clinical information that doesn’t identify patients.
Genomics	The Cancer Genome Atlas (TCGA)	The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. Through the TCGA Data Portal, you can search, download, and analyze data from over 30 different types of cancer. It contains: clinical information that doesn’t identify patients. genomic characterization data. high level sequence analysis of the tumor genomes.
Genomics	The NCI Director’s Challenge Adenocarcinoma Lung Study	A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The study looked at whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) predicted overall survival in lung cancer subjects. The DC Lung Study data set is available for analysis in Gene Expression Omnibus (GEO).
Imaging	Imaging Data Commons (IDC)	IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. IDC uses the Digital Imaging and Communication in Medicine (DICOM) standard to harmonize the data. You can find imaging data from the following NIH/NCI projects: The Cancer Genome Atlas (TCGA) Lung Imaging Database Consortium (LIDC) Clinical Proteomics Tumor Analysis Consortium (CPTAC) Human Tumor Atlas Network (HTAN) NCI Quantitative Imaging Network (QIN) National Library of Medicine Visible Human Project (VHP) July 2025 Update: The CCDI Molecular Characterization Initiative (CCDI-MCI) collection now includes double the number of digital pathology slides (3,715 total) and patients (3,582 total). Use this notebook to start working with the slides, and use this guide for guidance on accessing patient-level clinical data with images. Functionality updates on the IDC include: a “cart” feature (create a collection of data files for individual patients, studies, or series, and then download the selected files directly to their computing environment). a more compact and intuitive display of the Explore section. the replacement of standing virtual office hours with online appointments. Fill out this short form to request a one-on-one support session with an IDC team member.
Imaging	The Cancer Imaging Archive (TCIA)	TCIA is a curated archive of medical images that you can download. It includes data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). You can find data divided into collections and grouped by common cancer types or research aims. You can also search these collections by modality, anatomic location, or various acquisition parameters. You can access pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other available supporting data.
Imaging	SLICE-3D	Use this data set, which contains more than 400,000 skin lesion image crops extracted from 3D total body photography (TBP), for skin cancer detection. Metadata entries include the following: Age Sex General anatomic site Common patient identifier Clinical size Various data fields from the TBP Lesion Visualizer
Multiple	General Commons (GC)	GC provides you with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (the current majority is genomic and imaging data) in the GC that are both open and controlled access. Prior to requesting access though, consider searching and browsing the data via the GC Portal (no login required).
Nanomaterial Characterizations	caNanoLab	caNanoLab includes over 1,000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications You can perform web-based queries and download reports for re-use and additional analysis.
Networks	The Network Data Exchange (NDEX)	NDEx allows you to share, store, manipulate and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium. Access published networks aggregated from several pathway and interaction databases. Create your own networks to use, share, or publish.
Pediatric, Adolescent, and Young Adult (AYA)	Childhood Cancer Data Initiative (CCDI) Childhood Cancer Data Catalog (CCDC)	The CCDI CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data. July 2025 Update: The Catalog now includes 13 new data sets from three existing resources: cBioPortal for Cancer Genomics, CCDI, and dbGaP. You’ll find updated counts for eight of the resources, as well as additional updates for the “Patient-Derived Xenograft and Advanced In Vivo Models” resource and the “WHO-International Agency for Research on Cancer” resource. Visit the catalog website for full details.
Pediatric, Adolescent, and Young Adult (AYA)	CCDI Hub Resources	CCDI Hub Explore Dashboard: This integrated tool provides you with the search functionality to connect participants with files and samples. It enables you to find data within a single study or across multiple studies, and create synthetic cohorts based on filtered metrics of interest (i.e. demographics, diagnosis, samples). March 2025 Update: The Hub now features a modified data model and enhanced tools for browsing and selecting cohorts. An updated “Explore Dashboard Participants” table means customizable columns, a feature for creating cohorts, and a map of associated information from the Cancer Participant Index. New TARGET data sets are available, including: Acute Lymphoblastic Leukemia (ALL) Pilot Phase 1 (PHS000463) ALL Expansion Phase 2 (PHS000464) Acute Myeloid Leukemia (PHS000465) CCDI Molecular Target Platform (MTP): Use this tool to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers. Childhood Cancer Clinical Data Commons (C3DC): This open access, web application allows you to find harmonized demographic and clinical data, create custom cohorts, and download data for local analysis. June 2025 Update: C3DC’s latest release features new data sets, updates to the CCDI Participant Index, improved cohort tools, and access to the interactive C3DC Data Model Navigator. Visit the C3DC website for the full release notes. CCDI Data Federation Resource: Search for de-identified individual-level data through the API, which provides an open access subset of the metadata and gives you the location of the complete data set. To determine data access, check the policies for the resource that submitted the data (currently includes the Kids First Data Resource Center, the Pediatric Cancer Data Commons, St. Jude Cloud, and the Treehouse Childhood Cancer Data Initiative).
Proteomics	Proteomic Data Commons (PDC)	You can get open access, highly curated, and standardized biospecimen, clinical, and proteomic data from PDC. You can analyze PDC data files using tools found in the CRDC Cloud Resources. Data sets include the following: NCI’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) Children’s Brain Tumor Tissue Consortium (CBTTC) International Cancer Proteogenomic Consortium (ICPC) April 2025 Update: PDC released additional CPTAC Pan-Cancer analysis data, including both proteome and phosphoproteome data from the “PTRC Triple-negative Breast Cancer Mitotic Vulnerability Study.”
Proteomics	The Clinical Proteomic Tumor Analysis Consortium (CPTAC)	CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs).
Target Discovery	Cancer Target Discovery and Development (CTD²)	CTD² bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are open access. The CTD² Data Portal consists of raw and analyzed primary data.

Updated: Jul 15, 2025

NCI Data Catalog

ALSO IN THIS SECTION

Follow Us on LinkedIn