Cancer Research Data Commons Repositories
NCI's Cancer Research Data Commons (CRDC) repositories contain data that have been harmonized and stored in a format which is ready for analysis by the research community. The data are brought together within an infrastructure designed to facilitate security, interoperability, and elastic compute capability.
The CRDC offers researchers, tool developers, clinicians, and in the future, patients, a consolidated network for accessing data, tools, and workflows across scientific domains, such as genomics, proteomics, and imaging.
Each CRDC repository features a submission-and-curation process that harmonizes the data and applies standardized metadata to enable sharing and analysis. Repositories are centered on a specific scientific domain. Additional analytic and visualization tools will be developed based on the needs and feedback of the scientific community.
Repositories will be populated with data generated by NCI-funded programs. In addition, as part of the Data Commons Framework, CRDC repositories will include data from commons being developed by other NIH institutes and organizations.
Current Status of Repositories in the CRDC
There are currently three accessible repositories within the CRDC:
Genomic Data Commons
The Genomic Data Commons (GDC) is a unified repository for genomic, clinical, and biospecimen data from cancer research programs. The GDC includes data from The Cancer Genome Atlas (TCGA), its pediatric equivalent, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program, Foundation Medicine (FMI), the Cancer Cell Line Encyclopedia (CCLE), and a growing number of other sources.
Proteomic Data Commons
The Proteomic Data Commons (PDC) offers access to highly curated and standardized biospecimen, clinical, and proteomic data. The PDC includes data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program and will grow to include other sources over time. The data are open access and can be browsed interactively using a series of filters and accessed by an API.
Integrated Canine Data Commons
The Integrated Canine Data Commons (ICDC) includes genomics, proteomics, and imaging data from naturally occurring cancer in canine cancer patients. Researchers can explore multiple types and collections of open-access data directly through the portal or through one of NCI's Cloud Resources.
Imaging Data Commons
The Imaging Data Commons (IDC) provides cloud-based access to a wide variety of medical imaging and metadata from The Cancer Imaging Archive and other NCI projects. Its connection to a wide variety of analytical tools allows researchers and data scientists to train and explore imaging models without downloading data.
Future Development of Repositories in the CRDC
Several repositories are currently planned or under development, including those related to immuno-oncology and epidemiological data.
Data available through the CRDC will come from many sources, including:
- NCI grant programs, such as the Human Tumor Atlas Network
- Third-party programs, such as the American Association for Cancer Research (AACR), Genomics Evidence Neoplasia Information Exchange (GENIE), Applied Proteomics Organizational Learning and Outcomes (APOLLO) network, and the Multiple Myeloma Research Foundation
- NCI labs and intramural programs
The Cancer Data Service
The Cancer Data Service (CDS) is a repository for cancer research data generated by NCI-funded programs that do not meet the submission criteria for a specific CRDC Data Repository.
This repository accommodates:
- Data that do not meet the minimum metadata standards for submission to another CRDC Repository
- Studies that are on a waiting list for submission to a specific CRDC Repository (e.g., GDC)
- Studies that are in progress and need a place to store and analyze data during the acquisition phase
Access to Data in CRDC Repositories
Although CRDC repositories are publicly accessible, some data have controlled access and require prior approval for use, depending on the Data Use Agreements in place.
Each repository will provide information about the data it hosts and instructions on how to gain access.
dbGaP access is required for any researcher who wishes to access controlled data.