NCI Cloud Resources
The NCI Cloud Resources are components of the NCI Cancer Research Data Commons that bring data and computational power together to enable cancer research and discovery.
These cloud-based platforms eliminate the need for researchers to download and store extremely large data sets by allowing them to bring analysis tools to the data in the cloud, instead of the traditional process of bringing the data to the tools on local hardware. The Cloud Resources also provide access to on-demand computational capacity to analyze these data. The Cloud Resources allow users to run best practice tools and pipelines already implemented or upload their own data or analysis methods to workspaces.
All three Cloud Resources provide support for data access through a web-based user interface in addition to programmatic access to analytic tools and workflows, and the capability of sharing results with collaborators. Each Cloud Resource is continually developing new functionality to improve the user experience and add new tools for researchers.
NCI Cloud Resources all provide access to:
- Genomic data from The Cancer Genome Atlas (TCGA) project and its pediatric equivalent, the Therapeutically Applicable Research to Generate Effective Therapies (TARGET) project
- Radiologic and pathology images from The Cancer Imaging Archive (TCIA)
- Proteomic data from the Clinical Proteomic Tumor Analysis Consortium (CPTAC)
The individual Cloud Resources also provide access to additional data sets including:
- Catalogue of Somatic Mutations in Cancer (COSMIC)
- Genomic Encyclopedia of DNA Elements (GENCODE)
- The Cancer Cell Line Encyclopedia (CCLE)
The NCI Cloud Resources are hosted by The Broad Institute, The Institute for Systems Biology, and Seven Bridges. Each Cloud Resource has developed a unique infrastructure with a variety of tools to access, explore, and analyze molecular data.
Available Cloud Resources
FireCloud is an open, standards-based platform for performing production-scale data analysis in the cloud. Built on the Google Cloud Platform, FireCloud empowers analysts, tool developers, and production managers to run large-scale analysis and to share results with collaborators. Users can upload their own analysis methods and data to workspaces or run the Broad’s best practice tools and pipelines.
The ISB Cancer Genomics Cloud, leveraging many aspects of the Google Cloud Platform, allows scientists to interactively define and compare cohorts, examine underlying molecular data for specific genes and pathways, and share insights with collaborators. For computational users, Aplication Program Interfaces (APIs) and Google Cloud Platform (GCP) resources such as BigQuery and Google Pipeline service, allow complex queries from R or Python scripts, or Dockerized workflows to run on data available in the Google Cloud Storage.
The Seven Bridges Cancer Genomics Cloud, hosted on Amazon, has a rich user interface that allows researchers to find data of interest and combine it with their own private data. Data can be analyzed using more than 200 preinstalled, curated bioinformatics tools and workflows. Researchers can also extend the functionality of the platform by adding their own data and tools via an intuitive software development kit.
Access the Cloud Resources
dbGaP access is required for each researcher who wishes to access controlled data. Learn more about how to access controlled data.
Evolving Cloud Resources
In late 2018, we initiated a project to provide Cloud Resource users direct access to a cloud-based instance of NCI’s Genomic Data Commons (GDC). As this transition continues, genomic data from NCI’s Genomic Data Commons may not be completely synchronized with data hosted on the Cloud Resources because of the timing of data downloads by each of the platforms and because the GDC hosts a broader set of data than the Cloud Resources, such as archived data.
Synchronizations issues will be resolved when the Cloud Resources begin to access GDC data through the cloud-based version of the GDC.
Both during and after the transition, the three Cloud Resources will continue to support data access through their web user interfaces and APIs, provide access to analytic tools and workflows, and enable the sharing of results with collaborators.