“Democratize Data-Driven Biology by Tackling Incomplete Data, Unstructured Metadata, and Hidden Curricula”
While there is much enthusiasm about using omics and biomedical data collections to fuel research on complex traits and diseases, there are still some well-known fundamental challenges in seamlessly and effectively using these data to drive research. For instance, there are more than 1.5 million human gene expression profiles that are publicly available. Still, depending on the technology/platform used to record each profile, different subsets of genes in the genome are measured in these transcriptomes, leading to thousands of unmeasured genes in many of these profiles.
These gaps in data are major hurdles for integrative analysis. Critical problems also exist with data descriptions--the majority of more than 2 million publicly available omics samples lack structured metadata, including information about the tissue of origin, disease status, and environmental conditions. Thus, discovering samples and datasets of interest is not straightforward.
In this seminar, Dr. Arjun Krishnan will present recent work from his group on developing machine learning approaches to address these fundamental challenges. He will also discuss the need for improving advanced research training in biological data analysis by formalizing concepts in statistical procedures, study design, data/code management, critically consuming data-driven findings, and reproducible research.
This webinar is part of the monthly NIH Data Sharing and Reuse Seminar Series hosted by the NIH Office of Data Science and Strategy.
Dr. Arjun Krishnan is an assistant professor in the departments of Computational Mathematics, Science, and Engineering and Biochemistry and Molecular Biology at Michigan State University.