NCI Cancer Research Data Commons
The vision for the Cancer Research Data Commons (CRDC) is a virtual, expandable infrastructure that provides secure access to many different data types across scientific domains, allowing users to analyze, share, and store results, leveraging the storage and elastic compute, or ability to easily scale resources, of the cloud. The ability to combine diverse data types and perform cross-domain analysis of large datasets can lead to new discoveries in cancer prevention, treatment and diagnosis, and supports the goals of precision medicine and the Cancer Moonshot℠.
The CRDC provides access to data from NCI programs such as The Cancer Genome Atlas (TCGA) and its pediatric counterpart, Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and The Clinical Proteomics Tumor Analysis Consortium (CPTAC), through:
The CRDC is growing to include a wider range of data. The fundamental principles of the CRDC include:
- Build with the input and collaboration of the broad research community
- Build in an open and modular way to make components extendable and reusable
- To ensure broad interoperability, base the Data Commons on standards developed by coalitions, such as:
- Adhere to FAIR principles of data stewardship: Findable, Accessible, Interoperable, and Reusable
The Data Commons Framework describes the core principles and components on which the CRDC is being built.
Two developing infrastructure pieces will drive the interoperability and accessibility of data within the CRDC:
- The Center for Cancer Data Harmonization (CCDH): Working with representatives across the CRDC and its communities, the CCDH will develop resources to meet the needs of CRDC users including the creation of a standard data model, CRDC-H, to harmonize data across the CRDC nodes.
- The Cancer Data Aggregator (CDA): Acting like a search engine, the CDA will help researchers to query data across CRDC’s varied repositories. Using the CRDC-H data model, the CDA will aggregate different kinds of data into a harmonized data set to allow for easier integrative analysis. The CDA is currently in development and is targeted for launch in 2021.