Cancer Data Science Pulse

Cancer Data and Computation in the Cloud: One Path to Affordable Genomics Research

The cost of DNA sequencing has dropped more than one million-fold over the last decade, making it increasingly possible to discover the genetic basis of cancer and response to treatment. Three challenges, however, impede this goal:

  1. Analysts lack the resources to download, store and compute on the data;
  2. Existing tools have not been designed to scale to handle petabytes or exabytes; and
  3. Sharing and collaboration are hindered by the current model of storing data locally.

Large-scale sequencing efforts such as The Cancer Genome Atlas (TCGA) have begun to elucidate the genetic pathogenesis of cancer, enabling the development of targeted therapies. To enter an era of true precision medicine, however, we need to create sophisticated information technologies to store, analyze, and share data. FireCloud, and other cloud-based analysis platforms, offer a solution.

FireCloud, one of three NCI-funded Cancer Genomics Cloud (CGC) Pilots, democratizes data access and facilitates collaboration by providing a robust, scalable platform accessible to the public. Cloud-based analysis platforms like FireCloud provide elastic compute capacity that will enable the cancer research community to perform powerful analyses and facilitate the discovery of new biological findings.

Much of the cost of genomic research lies in the massive computational resources required, as well as the need for huge amounts of storage. Some large institutions may have the resources to fund these activities and establish such an infrastructure, but many do not. FireCloud, and the other two pilots, Seven Bridges Genomics and the Institute for Systems Biology, co-locate the data and the computational power in the cloud, so researchers can access it and perform analyses from anywhere they have an internet connection. Eliminating the need for redundant, costly infrastructures drastically reduces the cost of genomics research. While there will still be some cost associated with using the Cloud Pilots, it is much more affordable for a wide variety of institutions and scientists, democratizing access to the data and the ability to compute on it.

Moving forward, the plan is for FireCloud and the other NCI CGC Pilots to support increasingly large datasets in the cloud, so that users will not need to download and store their own data locally. Currently, FireCloud provides curated TCGA data and will soon include data from the Cancer Cell Line Encyclopedia (CCLE) project, the Cancer Genome Characterization Initiative (CGCI), the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) initiative, and the Genotype-Tissue Expression (GTEx) project.

 

Cloud bubble diagram of FireCloud for Users. Within the cloud, Your TCGA Data, Tools, and Workflow flow into "User or team Workspace." User or team Workspace is comprised of: 1-Securely tracks and manages data, metadata, tools, job execution and results, 2-Captures provenance for each run (method versions, timestamps, input and output files). The FireCloud Platform: 1-Democratizes access to data and tools, 2-Facilitates collaboration, 3-Offers elastic compute capacity, 4-Empowers users to perform analysis.

Using the scalable cloud-compute infrastructure of FireCloud, my lab hopes to leverage these large datasets to obtain sufficient power to significantly enhance our understanding of driver genes and pathways, biomarkers associated with clinical outcome, molecular subtypes of cancer, mutational processes, and germline risk alleles.

My goal is that other cancer genomics labs will explore how cloud-based analysis platforms like FireCloud can drive breakthroughs in their own research. My hope is that the elastic compute capacity of FireCloud will provide a much more affordable alternative to a lab's internal computing capabilities.  Since much of the data will be housed in open access cloud buckets, researchers will not have to worry about downloading and storing data, and can thus focus more on the science and the discoveries that lead to new cancer treatments.

To learn more:

You can also provide feedback and ask questions in the FireCloud Forum. We are still actively developing FireCloud and would like to hear from you.

In addition, please use the FireCloud Forum to let us know if you are interested in contributing your own tools and pipelines. Let's work together to build a better system!

Gad Getz, Ph.D.
Broad Institute / MGH
Older Post
Learn more about the Genomic Data Commons
Newer Post
Cancer Genomics Cloud Pilots DREAM Challenge - Leveraging the Wisdom of the Crowd

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

CAPTCHA Image CAPTCHA