Developing Scalable Bioinformatics Workflows on the Cancer Genomics Cloud
Dr. Jeffrey Grover will explain how NCI's Cancer Genomics Cloud (CGC) lets researchers develop efficient workflows using best practices like version control and integrating code changes from multiple contributors into a single software project.
The CGC is a powerful tool for cancer researchers, allowing them to access many large NCI data sets across the Cancer Research Data Commons, including genomics, proteomics, and data from multiple species, as well as bring in their own data quickly and easily. The CGC also provides a curated library of more than 450 tools and workflows that are optimized for the cloud. This cloud resource was also designed to handle common workflow language (CWL) — a standard for describing computational data-analysis workflows for data-intensive fields like bioinformatics — to run complex computational models on a variety of cloud-based public or private data sets. In addition to the hundreds of publicly available bioinformatics workflows in the CGC Public Apps Gallery, users can employ a variety of methods to develop their own. These include an integrated graphical user interface for creating workflows, as well as an ecosystem of tools enabling local development and automated deployment of workflows to the CGC.
This webinar is part of the monthly Containers and Workflow Interest Group (CWIG) webinar series. CWIG brings together data scientists, bioinformaticians, computer scientists, and researchers to learn more about cloud computing and container technologies, workflows, and pipelines that could drive cancer data science.
The webinar series features a variety of presenters from across NIH, industry, and academia. Though cancer research is the focus of the series, unrelated data science and cloud computing topics are still welcome. In the last year, the CWIG webinar speakers have discussed:
- NIH cloud programs like the CGC, its fellow NCI Cloud Resources, and NIH STRIDES.
- commercial cloud platforms for biomedical data storage and computing.
- pipelines and tools for deep learning and various omics analysis.
This event is open to the public.
Dr. Grover is a bioinformatics scientist at Seven Bridges, a biomedical data company that is contracted with NCI to develop the CGC on its cloud platform. He has a Ph.D. in molecular and cellular biology. He also has extensive experience integrating results from multi-omics data analysis, data visualization, and in bioinformatics workflow automation. Dr. Grover works to improve bioinformatics workflow offerings, provide technical expertise across public programs, and prototype internal technical solutions.