Application of Genomics Big Data on the Cancer Cloud: Making Use of Difficult Data
The Institute for Systems Biology-Cancer Genomics Cloud (ISB-CGC), one of three NCI Cloud Resources, has enabled researchers to share large genomic data sets stored by the NCI Cancer Research Data Commons. This data gets too big to scale, so a blend of computer science and biology is needed to properly access and appropriately run computations. The large size of the data necessitates use of the cloud and puts in place custom processes to manage and queue computations, as well as to parallelize and reconstruct them.
Dr. John Torcivia will present an application on the ISB-CGC where whole genome sequencing (WGS) was used to generate variant calls for downstream research. Custom processes were put into place to manage and queue the computations and parallelize and reconstruct them properly. The resulting open source workflow allows for adaptation to other pipelines, and the WGS variant data are being made available to qualified researchers in the ISB-CGC.
This webinar is part of the Containers and Workflows Interest Group webinar series hosted by NCI CBIIT.
Dr. John Torcivia is the director of artificial intelligence deployment at Clarifai, Inc. and is a part of the department of biochemistry and molecular biology at George Washington University.
- HLA Class II Across The Cancer Genome Atlas Cancer DatasetOctober 27, 2021Cloud Computing Workshop: Cancer Genomics Cloud (CGC)October 28, 2021Cloud Computing Workshop: FireCloudNovember 04, 2021NCI Childhood Cancer Data Initiative Annual Symposium 2021November 09, 2021