Application of Genomics Big Data on the Cancer Cloud: Making Use of Difficult Data
The Institute for Systems Biology-Cancer Genomics Cloud (ISB-CGC), one of three NCI Cloud Resources, has enabled researchers to share large genomic data sets stored by the NCI Cancer Research Data Commons. This data gets too big to scale, so a blend of computer science and biology is needed to properly access and appropriately run computations. The large size of the data necessitates use of the cloud and puts in place custom processes to manage and queue computations, as well as to parallelize and reconstruct them.
Dr. John Torcivia will present an application on the ISB-CGC where whole genome sequencing (WGS) was used to generate variant calls for downstream research. Custom processes were put into place to manage and queue the computations and parallelize and reconstruct them properly. The resulting open source workflow allows for adaptation to other pipelines, and the WGS variant data are being made available to qualified researchers in the ISB-CGC.
This webinar is part of the Containers and Workflows Interest Group webinar series hosted by NCI CBIIT.
Dr. John Torcivia is the director of artificial intelligence deployment at Clarifai, Inc. and is a part of the department of biochemistry and molecular biology at George Washington University.
Upcoming Events
- Social Determinants of Health with Large/Moderate Language Models on EHR Data: AI in Immuno-oncologyJuly 30, 2024CCDI Federated Data: Enhancing Data DiscoverabilityAugust 13, 2024Leveraging High-Performance Computing Resources and Using QIIME 2 to Advance Your Microbiome ProjectsAugust 27, 2024 - August 29, 2024NCI Office of Data Sharing’s Annual Data Sharing Symposium: Driving Cancer Advances Through Impactful ResearchOctober 16, 2024The Cancer Research Data Commons 2024 Fall Symposium: Ten Years of Empowering Cancer ResearchersOctober 16, 2024 - October 17, 2024