Nodes in the Cancer Research Data Commons
A node is a repository in the NCI Cancer Research Data Commons (CRDC) containing related data that have been harmonized and stored in a format that is ready for analysis by the research community. The data are brought together with infrastructure for security, interoperability, and elastic compute capability.
The CRDC is a network of nodes researchers, tool developers, clinicians, and patients can use to access and contribute tools and data across scientific domains, such as genomics, proteomics, and imaging, for example.
Each CRDC node will have a submission and curation process that harmonizes the data and applies standardized metadata to enable sharing and analysis. Nodes will be centered around a scientific domain and analytic and visualization tools will be developed based on the needs and feedback of the scientific community.
Nodes will be populated with data generated by NCI-funded programs. Through the Data Commons Framework, CRDC nodes will interoperate with data from commons being developed by other NIH institutes and organizations.
Current Status of Nodes in the CRDC
There are currently two accessible nodes within the CRDC.
The Genomic Data Commons (GDC), developed as a unified repository for cancer genomic data, is currently available. The GDC is populated with data from The Cancer Genome Atlas (TCGA), its pediatric equivalent the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program, Foundation Medicine (FMI), the Cancer Cell Line Encyclopedia (CCLE), and a growing number of other sources.
The Proteomic Data Commons (PDC) launched a pilot node in October 2018. The PDC is populated with data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program and will grow to include other sources over time. The data, including protein expression data, can be browsed interactively using a series of filters and accessed by API.
Future Development of Nodes in the CRDC
Several nodes are currently planned or under development, including nodes of imaging, canine, immuno-oncology, and epidemiological data.
Data available through the CRDC will come from many sources, including:
- NCI grant programs, such as the Human Tumor Atlas Network
- Third-party programs, such as American Association for Cancer Research (AACR) Genomics Evidence Neoplasia Information Exchange (GENIE), Applied Proteomics Organizational Learning and Outcomes (APOLLO) network, and the Multiple Myeloma Research Foundation
- NCI labs and intramural programs
- The Cancer Imaging Archive (TCIA)
The Cancer Data Service
A large number of NCI-funded programs are generating cancer genomic data types that are not currently accepted by the Genomic Data Commons, so NCI is establishing a Cancer Data Service to broaden sharing of cancer genomic data.
This node will accommodate data that:
- Do not fit the current data type criteria for GDC submission.
- Do not meet the minimum metadata standards for GDC submission.
Access to Data in CRDC Nodes
Data in the CRDC nodes will be publicly accessible, but some of the data will be controlled and require approval for access, depending on the Data Use Agreements in place.
Each node will provide information about the data it hosts and instructions on how to gain access.
dbGaP access is required for each researcher who wishes to access controlled data.