Cancer Research Data Commons (CRDC)

About the CRDC

NCI’s Cancer Research Data Commons (CRDC) is a secure, cloud-based, data science infrastructure that accelerates cancer research by helping researchers like you submit, share, access, and analyze data cost-effectively.

CRDC makes more than 10 petabytes of data accessible from hundreds of NCI-funded programs and studies, as well as data from select external cancer research programs. For example, you can work with any of the following NCI landmark data sets:

  • Applied Proteogenomics OrganizationaL Learning and Outcomes Network (APOLLO)
  • Childhood Cancer Data Initiative (CCDI)
  • Clinical Proteomic Tumor Analysis Consortium (CPTAC)
  • Human Tumor Atlas Network (HTAN)
  • The Cancer Genome Atlas (TCGA)

Additionally, you can leverage CRDC’s cloud computation, analytics, and visualization tools to help make discoveries that accelerate progress to prevent, diagnose, and treat cancer.
 


CRDC has been online for 10 years!

Check out these opportunities to hear from the CRDC Team as they reflect on their decade of empowering the cancer research community:

CRDC Components

CRDC currently consists of six data commons, three cloud resources, and several infrastructure components.

Data Commons

Explore and seamlessly work with data.

Currently, you can access six data commons through the CRDC:

  • Genomic Data Commons (GDC)—includes DNA methylation and whole genome, whole exome, RNA-seq, miRNA-seq, and ATAC-seq data.
  • Proteomic Data Commons (PDC)—includes mass-spectrometry-based proteomic data.
  • Imaging Data Commons (IDC)—includes de-identified radiology and pathology data.
  • Integrated Canine Data Commons (ICDC)—includes genomic and clinical data from canine patients with spontaneously occurring cancer.
  • Cancer Data Service (CDS)—includes data that do not fit in other CRDC data commons.
  • Clinical and Translational Data Commons (CTDC)—includes clinical, biospecimen, and molecular characterization data from NCI-funded studies.

The PDC now includes over 160 proteomic data sets from global cancer research programs spanning 19 cancer types. With data sets, tools, and capabilities constantly added, the PDC is an integral part of the cancer research ecosystem. Learn more about the PDC and access its resources via the PDC portal.

Introducing the CTDC
NCI’s CRDC has recently launched the CTDC, providing a dedicated resource for storing NCI-funded clinical and translational data. Learn more about the resource!

Each data commons portal allows you to explore the files for a particular data set. For most data commons, you can also add the files you’d like to access to the cart and download a manifest. You can upload the manifest to one of the NCI Cloud Resources for further exploration.

Cloud Resources

Analyze data with hundreds of publicly available analytical tools.

CRDC supports three, NCI Cloud Resources with distinct capabilities and connections to the data sets:

  • Broad Institute FireCloud
  • ISB Cancer Gateway in the Cloud (ISB-CGC)
  • Seven Bridges Cancer Genomics Cloud (SB-CGC), powered by Velsera

Learn more about what you can do within each.

Infrastructure

Securing access and aggregated search with core standards and services.

Behind the scenes, several infrastructure teams ensure the CRDC data are secure, harmonized, and queryable:

  • Data Commons Framework (DCF): This team guides CRDC’s commitment to security, access, scalability, and accessibility.
  • Data Standards Services (DSS): This team enables semantic interoperability across the data commons.
  • Cancer Data Aggregator (CDA): This team provides a search engine to query data across the data commons.

Learn more about the CDA and how you can use it to query data.

NCI’s Role

NCI launched the CRDC in 2014 and continues to enhance it for cancer researchers and data scientists. In 2016, the CRDC became the key component supporting the Cancer MoonshotSM Blue Ribbon Panel’s call for a national cancer data ecosystem that would allow researchers, clinicians, and even patients to share and analyze cancer data.

Cross-NCI Collaboration

NCI’s Center for Biomedical Informatics and Information Technology (CBIIT) leads the core strategy and infrastructure development for the CRDC. Specifically, CBIIT staff:

  • coordinate teams to determine scientific and data science strategy, which includes (1) developing the CRDC-harmonized standard data model that allows data to interoperate; (2) planning the creation of new, data-specific commons; and (3) implementing streamlined data submission process to serve as a central intake point for new data and working with CRDC users to improve functionality.
  • oversee the technical implementation of the commons, including compliance with security, privacy, and other requirements for federal IT systems.
  • advise data commons teams to adhere to NIH data sharing policies and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.
  • developed and manages the infrastructure and strategy for the IDC, CDS, and CTDC.

CBIIT also collaborates with several other NCI divisions, offices, and centers to lead data management and operations of specific data commons, including:

  • Center for Cancer Genomics – GDC
  • Division of Cancer Treatment and Diagnosis – PDC and ICDC

Cross-Community Collaboration

In addition to collaborating within and outside of NCI, CRDC partners with federal and international organizations interested in developing standards, secure connections, federation, and infrastructure for similar cloud-based, data sharing resources. Some of these projects include:

  • NIH Cloud Platforms Interoperability program (NCPI): CRDC is one of five partnership platforms developing and implementing technical standards to enable interoperability and facilitate a federated data ecosystem.
  • Advanced Research Projects Agency for Health (ARPA-H): CRDC provides the ARPA-H Biomedical Data Fabric toolbox with data, use cases, lessons learned, workflows, and expertise.
  • Global Alliance for Genomics and Health (GA4GH): CRDC serves as one of the driver projects piloting GA4GH standards.

Review other active CRDC collaborations.

Connecting the Cancer Community

If you’re interested in using this resource for your work, there are a few ways you can get started:

  • Explore CRDC data: You can access CRDC’s 10 petabytes of NCI- and NIH-funded research data from any of its data commons and NCI's Cloud Resources.
  • Analyze CRDC data: You can dive deeper into the data using one of hundreds of publicly available analytical tools or upload your own in NCI’s Cloud Resources.
  • Get help with the CRDC: You can get support from CRDC’s variety of different helpful resources (from training videos to weekly office hours).
  • Read the latest research: You can read and learn from the researchers who have published more than 100 articles about studies made possible by CRDC’s data and resources.

To stay up to date on opportunities for collaborating with and/or supporting the CRDC, subscribe to the weekly data science updates.

Additional Information

Updated:
Vote below about this page’s helpfulness.