Addressing Big Data Challenges in the ICGC Project
The Ontario Institute for Cancer Research (OICR) has been leading several large-scale international collaborations in cancer genomics with a focus on big data management and high-throughput computational analyses. These projects include: the International Cancer Genome Consortium (ICGC) whose goal is to categorize the genomes of 25,000 tumors by 2018; the Pan-Cancer Analysis of Whole Genomes (PCAWG) with the goal to uniformly analyze the whole genomes of over 2,800 ICGC patients; and the Cancer Genome Collaboratory which is a newly built compute cloud to facilitate computational analyses on the ICGC data set estimated at 5PB by project completion. In this presentation, Junjun will describe how OICR addresses the big data challenges in these projects, and how OICR will leverage the established infrastructure, expertise and partnership to tackle its next challenge: ICGG-ARGO which is an international collaboration to catalog cancer genome alterations and link them to therapeutic outcome in 100,000 patients.
Junjun Zhang leads the bioinformatics and data curation team that is an integral part of the software development group at OICR. He has extensive experience in designing/building automated computational workflow system and integrated biological databases, such as the Database of Genomic Variants (DGV), the International Cancer Genome Consortium (ICGC) data portal, and the NCI Genomic Data Commons (GDC) data portal. Prior to joining OICR in 2008, he worked as a bioinformatics developer in the Centre for Applied Genomics at Toronto's SickKids hospital developing bioinformatics tools/algorithms for biological data management, assembling/annotating human genomes, and identifying genomic variants from large-scale microarray and NGS datasets.