Cancer Data Science Pulse

Cancer Research in the Cloud: The New Normal?

Snapshot of creating cohorts on the ISB-CGC platform.

We have all been hearing about the benefits of cloud computing for a long time now. At the Institute for Systems Biology (ISB), we believe that 2016 is the year in which a cloud-based approach will truly become "the new normal" for cancer research. The benefits of cloud computing are perfect fits for academic and research environments. Working in the cloud allows you to rapidly scale up and down as your computing and storage needs fluctuate, paying as you go, and paying only for what you actually need and use. You can work from anywhere, and you can collaborate with colleagues around the world.

The ISB Cancer Genomics Cloud (ISB-CGC) is one of three NCI-funded pilot projects which were designed to democratize access to The Cancer Genome Atlas (TCGA) data. The ISB-CGC is being built by scientists and software engineers at ISB, SRA International (now CSRA), and Google. The ISB-CGC sits on top of the Google Cloud Platform and leverages a wide range of cloud technologies and services that provide access to a large-scale data repository, the computational infrastructure, and the interactive exploratory tools to support and drive cancer genomics research.

The ISB-CGC serves the needs of a range of cancer researchers, including:

  • scientists or clinicians who prefer to use an interactive, web-based application to explore the rich TCGA data set
  • computational scientists who want to write their own custom scripts using languages such as R or Python and access the data through APIs
  • algorithm developers who wish to spin up thousands of virtual machines to analyze hundreds of terabytes of sequence data.

What are ways the ISB-CGC can benefit researchers? One example is studying copy-number aberrations across the entire TCGA data set, which previously would have required downloading and parsing thousands of individual files before you could even begin asking the question you wanted to ask. Those data are now in an open-access BigQuery table and can be queried in seconds, using just a few lines of SQL.

We think that moving cancer research to the cloud will occur in two stages.

  • In the first stage, existing tools will continue to be used on existing file formats: data files are directly accessible in the cloud instead of being downloaded from data repositories, and virtual machines can be configured with the same operating systems and tools as local, on-premise hardware.
  • In the second stage, both the tools and the data storage will become "cloud-aware," meaning that they will be optimized for the massively parallel cloud environment.

The ISB-CGC platform brings a mixture of these two stages to the cancer research community.

To learn more:

Requests for information and feedback are welcome! We are eager to interact with the cancer research community to better understand the needs so that we can continually work to improve our platform.

Ilya Shmulevich, Ph.D.
Professor at the Institute for Systems Biology, where he directs a Genome Data Analysis Center, part of The Cancer Genome Atlas (TCGA) project
Older Post
FY 2017: NCI Looks to the Clouds
Newer Post
Informatics Community Crucial to the Cancer Moonshot

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

CAPTCHA Image CAPTCHA