NCI Data Catalog
- Produced by NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
- Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
- Well documented and available for download
Although this is not a comprehensive listing of data sets available from NCI, the list will be updated quarterly.
Categories of data sets include:
- Cancer Screening Trial
- Drug Discovery
- Nanomaterial Characterizations
- Pediatric, Adolescent, and Young Adult (AYA)
- Target Discovery
|Biospecimen Research Database (BRD)
|BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science.
Each literature curation captures the following relevant parameters:
SOPs are organized in a system with Biospecimen Evidence Based Practices (BEBP).
Cancer Screening Trial
|Cancer Data Access System (CDAS)
|CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data.
The data includes a:
|NCTN/NCORP Data Archive
|This centralized, controlled-access database is a repository of de-identified, patient level data from phase III clinical trials conducted by NCI's National Clinical Trials Network (NCTN), (NCI's Community Oncology Research Program (NCORP), and the Canadian Cancer Trials Group (CCTG).
After approval of a signed Data Use Agreement (DUA), researchers can download patient level clinical data sets and their associated data dictionaries.
|Personalized Cancer Therapy
|The Personalized Cancer Therapy website was developed as a tool for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site.
The available data includes:
|CellMinerCDB: National Center for Advancing Translational Sciences (NCATS)
Developed by CCR scientists, this database details how 2,600+ different compounds affect cancer cell growth. It includes drug response data from 183 cancer cell lines tested by the NCATS center.
NIH NCATS tackles ongoing challenges in disease research so that new treatments can reach all people faster. The center focuses on commonalities across diseases and develops solutions that reduce, remove, or bypass obstacles in the translational process.
|NCI Panel of 60 Human Tumor Cell Lines (NCI-60)
|NCI-60, available for analysis in CellMiner, is a panel of 60 diverse human cancer cell lines used by the NCI Developmental Therapeutics Program to screen over 100,000 chemical compounds and natural products since 1990.
Gene expression data files can be downloaded from an NCI-hosted FTP sites:
|Surveillance, Epidemiology and End Results (SEER) database
|SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 30 percent of the U.S. population.
The SEER database includes incidence and population data associated by:
SEER statistics and the Division of Cancer Control and Population Sciences (DCCPS) data resources can be accessed on the DCCPS website.
|SEER - CAHPS
|The SEER-CAHPS data resource links data from NCI's Surveillance, Epidemiology and End Results (SEER) cancer registry program, the Centers for Medicare & Medicaid Services' (CMS) Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient experience surveys, and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries
To request access to the data, email the required documents to NCISEERCAHPS@nih.gov.
|SEER - MHOS
|The SEER-MHOS database links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program and the Centers for Medicare & Medicaid Services (CMS) Medicare Health Outcomes Survey (MHOS) that provides information about the health-related quality of life (HRQOL) of Medicare Advantage Organization (MAO) enrollees. The database is sponsored by the NCI and CMS.
To request access to the data, email the required documents to SEER-MHOS@hcqis.org.
|All of Us Researcher Workbench
This cloud-based platform gives registered users access to data and tools, including whole genome sequencing (WGS) and genome-wide genotyping data. Those data are collected as part of the NIH-wide All of Us initiative.
Early 2023 Update: Additions include 245,000+ WGS, 312,000+ arrays, CRAM files, and 1,000 long-read sequences.
|NCI Genomic Data Sets Available in Database of Genotypes and Phenotypes
|dbGaP was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in humans. Over 150 NCI studies are now registered in dbGaP. To request controlled access data in dbGaP, view our instructions on the Access Genomic Data page.
|Genomic Data Commons (GDC)
|GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG) including:
|Integrated Canine Data Commons (ICDC)
ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the open-access data within the ICDC portal, and you may analyze the associated data files in the Seven Bridges Cancer Genomics Cloud.
May 2023 Update:
ICDC now includes a new data set that links specific DNA methylation patterns to key transcriptional programs in both human and canine osteosarcoma. This study includes 44 added cases and 88 files. Review this ICDC article for full details.
|Cancer Genome Characterization Initiative (CGCI)
|CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles are used to inform better cancer diagnosis and treatment.
|Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1
|The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer.
|Molecular Targets for Cancer
|Thousands of molecular targets have been measured in the NCI panel of 60 human tumor cell lines. You can search for a target of interest or you may browse through a list of targets.
|NCI Brain Neoplasia Data (Rembrandt Database)
|NCI Brain Neoplasia Data (Rembrandt Database) integrates clinical and functional genomics data from clinical trials involving brain tumor patients and provides the ability to perform ad hoc querying, reporting and analysis across multiple data domains, including gene expression, gene copy number and clinical data.
|TARGET: Therapeutically Applicable Research to Generate Effective Treatments
|TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications.
The TARGET data matrix includes:
|The Cancer Genome Atlas (TCGA)
|The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. The TCGA Data Portal provides a platform for researchers to search, download, and analyze data from over 30 different types of cancer. It contains:
|The NCI Director's Challenge Adenocarcinoma Lung Study
|A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The proposed hypotheses examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) could be used to predict overall survival in lung cancer subjects.
|Imaging Data Commons (IDC)
IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. Data is harmonized using the Digital Imaging and Communication in Medicine (DICOM) standard. You can find imaging data from the following NIH/NCI projects:
|The Cancer Imaging Archive (TCIA)
TCIA is a curated archive of medical images accessible for public download and includes the data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). Data are divided into collections grouped by common cancer types or research aims. Users can also search these collections by modality, anatomic location, or various acquisition parameters. Pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other supporting data are also provided where available.
June 2023 Update: The Brain-TR-GammaKnife data set has been added, containing brain cancer MRIs with companion GammaKnife treatment planning and clinical data. Visit the TCIA for more details.
|Cancer Data Service (CDS)
|CDS provides the cancer research community with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (current majority is genomic and imaging data) in the CDS that are both open and controlled access. Prior to requesting access though, consider searching and browsing the data (no login required) via the CDS Portal.
|caNanoLab includes over 1000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications
|The Network Data Exchange
|NDEx allows researchers to share, store, manipulate and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium.
Pediatric, Adolescent, and Young Adult (AYA)
|CCDI Childhood Cancer Data Catalog (CCDC)
The Childhood Cancer Data Initiative’s (CCDI’s) CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data.
February 2024 Update: The National Clinical Trials Network (NCTN) Navigator and the Cancer Epidemiology Descriptive Cohort Database (CEDCD) are available in the CCDC. The NCTN Navigator allows investigators to query biospecimens and to request biospecimens to validate exploratory correlative analysis hypotheses. The CEDCD is a searchable database containing information about cohort studies that follow groups of persons over time for cancer incidence, mortality, and other health outcomes. Several existing and new data sets are up to date. Visit the catalog website for full details.
Pediatric, Adolescent, and Young Adult (AYA)
|CCDI Molecular Targets Platform (MTP)
|The CCDI MTP allows researchers to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers.
|Proteomic Data Commons (PDC)
|PDC provides the cancer research community with open access, highly curated, and standardized biospecimen, clinical, and proteomic data. PDC data files are available for analysis in the CRDC Cloud Resources, and the resource also offers analysis tools. Data sets include:
|The Clinical Proteomic Tumor Analysis Consortium (CPTAC)
|CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome.
The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs).
|Cancer Target Discovery and Development (CTD2)
|CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are open access.