Application of Genomics Big Data on the Cancer Cloud: Making Use of Difficult Data
The Institute for Systems Biology-Cancer Genomics Cloud (ISB-CGC), one of three NCI Cloud Resources, has enabled researchers to share large genomic data sets stored by the NCI Cancer Research Data Commons. This data gets too big to scale, so a blend of computer science and biology is needed to properly access and appropriately run computations. The large size of the data necessitates use of the cloud and puts in place custom processes to manage and queue computations, as well as to parallelize and reconstruct them.
Dr. John Torcivia will present an application on the ISB-CGC where whole genome sequencing (WGS) was used to generate variant calls for downstream research. Custom processes were put into place to manage and queue the computations and parallelize and reconstruct them properly. The resulting open source workflow allows for adaptation to other pipelines, and the WGS variant data are being made available to qualified researchers in the ISB-CGC.
This webinar is part of the Containers and Workflows Interest Group webinar series hosted by NCI CBIIT.
Dr. John Torcivia is the director of artificial intelligence deployment at Clarifai, Inc. and is a part of the department of biochemistry and molecular biology at George Washington University.
Upcoming Events
- Ctrl+Alt+Cure: Driving Smarter Cancer CareJune 11, 2025NCI Emerging Technologies Seminar: Programs and Resources to Support Technology Development for Cancer ResearchJune 17, 2025Data Jamboree: Enhancing Childhood Cancer Data Sharing and UtilitySeptember 29, 2025 - September 30, 2025NCI Office of Data Sharing’s Annual Data Sharing Symposium 2025: How Data Advances the Impact of Cancer ResearchSeptember 30, 2025 - October 01, 2025