Scalable, Collaborative, Reproducible, and Extensible Analysis of TCGA Data in the Cloud
The Seven Bridges Cancer Genomics Cloud pilot is one of three pilot projects funded by the National Cancer Institute. The overarching goal of the project is to explore how co-localizing large genomics datasets, like The Cancer Genome Atlas, with dynamic compute infrastructure to analyze them, can make learning from these data faster, and ultimately enable precision medicine.
In this seminar we’ll highlight four guiding principles that have driven development of the Seven Bridges CGC:
Making data available isn’t enough to make it usable: We’ve built a dynamic query engine that allows fast search of more than 140 clinical and biospecimen properties to enable finding interesting TCGA data faster and easier. Importantly, data are immediately available for analysis at scale using both pre-defined and custom workflows.
The best science happens in teams: A fine-grained permissions model allows transparent collaboration; in a secure and compliant manner.
Reproducibility shouldn’t be hard: Each analysis, including all parameters, files, and software versions is fully logged and can be perfectly replicated days or months later.
The impact of TCGA is amplified by new data and tools: Researchers can readily bring their own data, and their own tools to analyze alongside TCGA data. Native implementation of the Common Workflow Language (CWL) specification enables portability of tools and workflows to and from other CWL-compliant systems.
The seminar will include a demo of the system and interested researchers can visit www.cancergenomicscloud.org to get involved.
Brandi Davis-Dusenbery is the Scientific Program Manager for the Seven Bridges CGC. She received her Ph.D. in Biochemistry from Tufts University and completed her postdoctoral studies at Harvard University. She is passionate about enabling the use of biomedical data to ultimately improve patient care.