“Democratize Data-Driven Biology by Tackling Incomplete Data, Unstructured Metadata, and Hidden Curricula”
While there is much enthusiasm about using omics and biomedical data collections to fuel research on complex traits and diseases, there are still some well-known fundamental challenges in seamlessly and effectively using these data to drive research. For instance, there are more than 1.5 million human gene expression profiles that are publicly available. Still, depending on the technology/platform used to record each profile, different subsets of genes in the genome are measured in these transcriptomes, leading to thousands of unmeasured genes in many of these profiles.
These gaps in data are major hurdles for integrative analysis. Critical problems also exist with data descriptions--the majority of more than 2 million publicly available omics samples lack structured metadata, including information about the tissue of origin, disease status, and environmental conditions. Thus, discovering samples and datasets of interest is not straightforward.
In this seminar, Dr. Arjun Krishnan will present recent work from his group on developing machine learning approaches to address these fundamental challenges. He will also discuss the need for improving advanced research training in biological data analysis by formalizing concepts in statistical procedures, study design, data/code management, critically consuming data-driven findings, and reproducible research.
This webinar is part of the monthly NIH Data Sharing and Reuse Seminar Series hosted by the NIH Office of Data Science and Strategy.
Dr. Arjun Krishnan is an assistant professor in the departments of Computational Mathematics, Science, and Engineering and Biochemistry and Molecular Biology at Michigan State University.
- Cancer Research Data Commons and Other NCI Infrastructures in Support of Data ScienceSeptember 19, 2021AttentiveChrome: Deep-learning for Predicting Gene Expression from Histone ModificationsSeptember 22, 2021“Le Grand et Le Petit”: Splicing Factors SF3B1 and SUGP1 and Their Cancer Mutations Leading to Aberrant Acceptor UsageSeptember 22, 2021The Future of Clinical Trial Data Sharing.... The Art of The PossibleSeptember 23, 2021Genomic Data Commons Single Cell RNA-Seq SupportSeptember 27, 2021Virtual Workshop on Next-Generation Sequencing and Radiomics: Resource Requirements for Acceleration of Clinical Applications Including AISeptember 29, 2021 - September 30, 2021