NCI Data Catalog
The NCI Data Catalog is a listing of data collections resulting from major NCI initiatives and other widely used data sets. Data collections in the catalog meet the following criteria:
- Products of NCI intramural researchers or major NCI initiatives, or regularly referenced NCI-funded extramural research data
- Available to all researchers and may be Open or Controlled Access (requiring approval by a Data Access Committee)
- Well documented and available for download
Although this is not a comprehensive listing of data sets available from NCI, you can expect quarterly updates.
Categories of data sets include:
- Biospecimen
- Cancer Screening Trial
- Clinical
- Drug Discovery
- Epidemiology
- Genomics
- Imaging
- Multiple
- Nanomaterial Characterizations
- Networks
- Pediatric, Adolescent, and Young Adult (AYA)
- Proteomics
- Target Discovery
Category | Data Collection Name | Description |
---|---|---|
Biospecimens | Biospecimen Research Database (BRD) | BRD is a free, publicly accessible literature database that contains peer-reviewed primary and review articles and Standard Operation Procedures (SOPs) in the field of human biospecimen science.
You can find SOPs in a system with Biospecimen Evidence Based Practices (BEBP). |
Cancer Screening Trial | Cancer Data Access System (CDAS) | CDAS is a submission and tracking system for the National Lung Screening Trial (NLST) data and the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial data.
|
Clinical | Clinical and Translational Data Commons (CTDC) | CTDC provides access to a vast array of clinical and translational data from NCI-funded clinical trials, correlative studies, and interventional studies. While some data sets are publicly accessible, others are registered-access and controlled-access tier. August 2024 Update: |
Clinical | NCTN/NCORP Data Archive | This centralized, controlled-access database is a repository of de-identified, patient level data from phase III clinical trials conducted by NCI’s National Clinical Trials Network (NCTN), NCI’s Community Oncology Research Program (NCORP), and the Canadian Cancer Trials Group (CCTG). After approval of a signed Data Use Agreement (DUA), you can download patient level clinical data sets and their associated data dictionaries. |
Clinical, Genomics | Personalized Cancer Therapy | The Personalized Cancer Therapy website is a tool for physicians and patients to assess potential therapy options based on specific tumor biomarkers. The focus is on the potential therapy strategies for tumors harboring certain genomic alterations regardless of disease site. The available data includes:
|
Drug Discovery | CellMinerCDB: National Center for Advancing Translational Sciences (NCATS) | The NCI Center for Cancer Research developed this database that details how 2,600+ different compounds affect cancer cell growth. NIH NCATS tested 183 cancer cell lines, and you can find drug response data from that work. |
Drug Discovery | NCI Panel of 60 Human Tumor Cell Lines (NCI-60) | You can find/analyze NCI-60 in CellMiner. NCI’s Developmental Therapeutics Program used this panel of 60 diverse human cancer cell lines to screen over 100,000 chemical compounds and natural products since 1990. You can download gene expression data files from NCI-hosted FTP sites:
|
Epidemiology | Surveillance, Epidemiology and End Results (SEER) database | SEER collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 50% of the U.S. population. The SEER database includes incidence and population data associated by:
You can access SEER statistics and the Division of Cancer Control and Population Sciences (DCCPS) data resources on the DCCPS website. |
Epidemiology | SEER - CAHPS | The SEER-CAHPS data resource links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program, the Centers for Medicare & Medicaid Services’ (CMS) Medicare Consumer Assessment of Healthcare Providers and Systems (CAHPS®) patient experience surveys, and longitudinal Medicare claims data on utilization and costs of care for Fee-For-Service beneficiaries. You can request data access by emailing the required documents to NCISEERCAHPS@nih.gov. |
Epidemiology | SEER - MHOS | The SEER-MHOS database links data from NCI’s Surveillance, Epidemiology and End Results (SEER) cancer registry program and the Centers for Medicare & Medicaid Services (CMS) Medicare Health Outcomes Survey (MHOS) that provides information about the health-related quality of life (HRQOL) of Medicare Advantage Organization (MAO) enrollees. NCI and CMS sponsor the database. You can email the required documents to SEER-MHOS@hcqis.org to request access to the data. |
Genomics | All of Us Researcher Workbench | You can register to access data and tools including whole genome sequencing (WGS) and genome-wide genotyping data. The NIH-wide All of Us initiative collects this data. |
Genomics | NCI Genomic Data Sets Available in Database of Genotypes and Phenotypes (dbGaP) | NCI developed dbGaP to archive and distribute the data and results from studies that investigated the interaction of genotype and phenotype in humans. You can request controlled access to data from over 150 NCI studies by following the instructions. July 2024 Update: Request access to recently released data referenced in the study, “Childhood Cancer Data Initiative (CCDI): Single-Cell Atlas of NF1 Nerve Sheath Tumors.” The data includes 63 clinically annotated NF1-associated peripheral nerve sheath tumors. |
Genomics | Genomic Data Commons (GDC) | GDC provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports several cancer genome programs at the NCI Center for Cancer Genomics (CCG) including: |
Genomics | Integrated Canine Data Commons (ICDC) | ICDC provides the cancer research community with data that enables a comparative analysis between human and canine cancers. You can explore the open access data within the ICDC portal, and you may analyze the associated data files in the Seven Bridges Cancer Genomics Cloud. |
Genomics | Cancer Genome Characterization Initiative (CGCI) | CGCI researchers develop and apply advanced sequencing and other genome-based methods to identify novel genetic abnormalities in both adult and pediatric cancers. The genetic profiles inform better cancer diagnosis and treatment.
|
Genomics | Investigation of Serial Studies to Predict Your Therapeutic Response with Imaging and Molecular Analysis, I-SPY1 | The I-SPY 1 TRIAL sought to identify indicators of response to neoadjuvant chemotherapy that predict survival in women with high-risk breast cancer.
|
Genomics | Molecular Targets for Cancer | Researchers have measured thousands of molecular targets in the NCI panel of 60 human tumor cell lines. You can search for or browse through a list of targets.
|
Genomics | NCI Brain Neoplasia Data (Rembrandt Database) | NCI Brain Neoplasia Data (Rembrandt Database) integrates clinical and functional genomics data from clinical trials involving brain tumor patients and provides the ability to perform ad hoc querying, reporting and analysis across multiple data domains, including gene expression, gene copy number and clinical data.
|
Genomics | TARGET: Therapeutically Applicable Research to Generate Effective Treatments | TARGET applies a comprehensive genomic approach to determine molecular changes that drive childhood cancers. The goal is to facilitate the discovery of therapeutic targets for childhood cancers and catalyze the translation of these discoveries into clinical applications. The TARGET data matrix includes:
|
Genomics | The Cancer Genome Atlas (TCGA) | The Cancer Genome Atlas (TCGA) is a comprehensive effort to accelerate the understanding of the molecular basis of cancer through the application of genome analysis technologies. Through the TCGA Data Portal, you can search, download, and analyze data from over 30 different types of cancer. It contains:
|
Genomics | The NCI Director’s Challenge Adenocarcinoma Lung Study | A large, training, testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The study looked at whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) predicted overall survival in lung cancer subjects. |
Imaging | Imaging Data Commons (IDC) | IDC provides the cancer research community with a cloud-based repository of cancer imaging data, image annotations, and analysis results. IDC uses the Digital Imaging and Communication in Medicine (DICOM) standard to harmonize the data. You can find imaging data from the following NIH/NCI projects:
September 2024 Update: The repository now includes over 25,000 newly curated and harmonized pathology images. These include slides from the Cancer Moonshot Biobank and CCDI’s Molecular Characterization Initiative. |
Imaging | The Cancer Imaging Archive (TCIA) | TCIA is a curated archive of medical images that you can download. It includes data from the National Lung Screening Trial (NLST) and many subjects from The Cancer Genome Atlas (TCGA). You can find data divided into collections and grouped by common cancer types or research aims. You can also search these collections by modality, anatomic location, or various acquisition parameters. You can access pathology imaging, patient demographics/outcomes, expert-derived segmentations/annotations, genomics, and other available supporting data. |
Imaging | SLICE-3D | Use this data set, which contains more than 400,000 skin lesion image crops extracted from 3D total body photography (TBP), for skin cancer detection. Metadata entries include the following:
|
Multiple | Cancer Data Service (CDS) | CDS provides you with data storage and sharing capabilities for NCI-funded studies that meet particular requirements. You can find a variety of data types (current majority is genomic and imaging data) in the CDS that are both open and controlled access. Prior to requesting access though, consider searching and browsing the data (no login required) via the CDS Portal. |
Nanomaterial Characterizations | caNanoLab | caNanoLab includes over 1,000 curated nanomaterials relevant in cancer with detailed characterizations and associated nanotechnology protocols and publications
|
Networks | The Network Data Exchange (NDEX) | NDEx allows you to share, store, manipulate and publish biological network information. The project maintains a public NDEx server and is a joint effort of the UC San Diego School of Medicine and the Cytoscape Consortium.
|
Pediatric, Adolescent, and Young Adult (AYA) | Childhood Cancer Data Initiative (CCDI) Childhood Cancer Data Catalog (CCDC) | The CCDI CCDC is an inventory of pediatric oncology data resources. This includes childhood cancer repositories, registries, knowledge bases, and catalogs that either manage or refer to data. September 2024 Update: The Catalog now includes 5 new data sets:
Additionally, you may explore recently improved repository links, reference links, and resource URLs. Visit the catalog website for full details. |
Pediatric, Adolescent, and Young Adult (AYA) | CCDI Hub Resources | CCDI Hub Explore Dashboard: This integrated tool provides you with the search functionality to connect participants with files and samples. It enables you to find data within a single study or across multiple studies, and create synthetic cohorts based on filtered metrics of interest (i.e. demographics, diagnosis, samples). CCDI Molecular Target Platform (MTP): Use this tool to browse and identify associations between molecular targets, diseases, and drugs specific for childhood cancers. Childhood Cancer Clinical Data Commons (C3DC): This open access, web application allows you to find harmonized demographic and clinical data, create custom cohorts, and download data for local analysis. November 2024 Update: C3DC’s latest release includes expanded data sets, enhanced tools, and an improved user experience, making pediatric clinical data more accessible and valuable for your childhood cancer research. CCDI Data Federation Resource: Search for de-identified individual-level data through the API, which provides an open access subset of the metadata and gives you the location of the complete data set. To determine data access, check the policies for the resource that submitted the data (currently includes the Kids First Data Resource Center, the Pediatric Cancer Data Commons, St. Jude Cloud, and the Treehouse Childhood Cancer Data Initiative). |
Proteomics | Proteomic Data Commons (PDC) | You can get open access, highly curated, and standardized biospecimen, clinical, and proteomic data from PDC. You can analyze PDC data files using tools found in the CRDC Cloud Resources. Data sets include the following:
August 2024 Update: PDC released six new studies on non-clear cell renal cell carcinomas, plus a new metabolomics feature that includes mass spectrometry-based CPTAC metabolomic and lipidomic data. Visit the PDC for more details. |
Proteomics | The Clinical Proteomic Tumor Analysis Consortium (CPTAC) | CPTAC analyzes cancer biospecimens by mass spectrometry, characterizing and quantifying their constituent proteins, or proteome. The CPTAC Data Portal is the centralized repository for the dissemination of proteomic data collected by the Proteome Characterization Centers (PCCs). |
Target Discovery | Cancer Target Discovery and Development (CTD2) | CTD2 bridges the gap between the enormous volumes of data generated by genomic characterization studies and the ability to use these data for the development of human cancer therapeutics. It specializes in using computational and functional genomic approaches to translate next-generation sequencing data, and data from high-throughput and high content small molecule and genetic screens, into clinical applications. All data generated are open access.
|