NCI Cancer Research Data Commons
The NCI Cancer Research Data Commons (CRDC) is a cloud-based data science infrastructure that connects data sets with analytics tools to allow users to share, integrate, analyze, and visualize cancer research data to drive scientific discovery. To learn more, watch the video below or visit datacommons.cancer.gov.
As a major component within the broader National Cancer Data Ecosystem, the CRDC provides access to data-type specific repositories (genomic, proteomic, comparative oncology, imaging, and others) and data from NCI programs such as The Cancer Genome Atlas (TCGA) and its pediatric counterpart, Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and The Clinical Proteomics Tumor Analysis Consortium (CPTAC), through:
- Genomic Data Commons (GDC)
- Proteomic Data Commons (PDC)
- Integrated Canine Data Commons (ICDC)
- Imaging Data Commons (IDC)
- Cancer Data Commons (CDS)
- NCI Cloud Resources
The CRDC is growing to include a wider range of data. The fundamental principles of the CRDC include:
- building with the input and collaboration of the broad research community.
- building in an open and modular way to make components extendable and reusable.
- ensuring broad interoperability, by basing the Data Commons on standards developed by coalitions, such as:
- striving for FAIR principles of data stewardship: Findability, Accessibility, Interoperability, and Reusability.
The Data Commons Framework describes the core principles and components on which the CRDC has been built upon.
Two developing infrastructure pieces will drive the interoperability and accessibility of data within the CRDC:
- Data Standards Services (DSS): Working with representatives across the CRDC and its communities to develop CRDC data standards that enable interoperability across the CRDC data repositories.
- The Cancer Data Aggregator (CDA): Acting like a search engine, the CDA will help researchers to query data across CRDC’s varied repositories. Using the CRDC-H data model, the CDA will aggregate different kinds of data into a harmonized data set to allow for easier integrative analysis. The CDA is currently in development and is targeted to launch in 2022.