Data Quality for LLMs: Building a Reliable Data Foundation
If you use large language models (LLMs) in your cancer research, register for this seminar to hear Elucidata’s Dr. Abhishek Jha discuss how data quality impacts LLM performance.
A reliable foundation that is well annotated and accessible to an LLM plays a major role in the value of its results.
You’ll see examples of how LLM-powered artificial intelligence (AI) agents query across three versions of the same gene expression corpus with differing results, including:
- unstructured data from the public repository Gene Expression Omnibus.
- structured data from the Crowd Extracted Expression of Differential Signatures project (tool developed by the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai).
- clean, linked, and harmonized data.
Dr. Jha will use these examples to discuss how the different quality in these data sources impacts LLM performance.
Dr. Jha is the co-founder and CEO at Elucidata. Previously, he was a senior scientist at Agios Pharmaceuticals, which has successfully brought three first-in-class drugs for acute myeloid leukemia patients to the market. His experience at Agios inspired him to build Elucidata. Elucidata’s mission is to help scientists save time on routine data and machine learning operations tasks so they can shift their focus to high-value research. This eventually helps patients receive drugs sooner. Elucidata is building technology and solutions for R&D teams to better leverage data and reduce their drug development timelines. Dr. Jha is an alumnus of Massachusetts Institute of Technology, the University of Chicago, and the Indian Institute of Technology Bombay.
Upcoming Events
- Social Determinants of Health with Large/Moderate Language Models on EHR Data: AI in Immuno-oncologyJuly 30, 2024CCDI Federated Data: Enhancing Data DiscoverabilityAugust 13, 2024Leveraging High-Performance Computing Resources and Using QIIME 2 to Advance Your Microbiome ProjectsAugust 27, 2024 - August 29, 2024NCI Office of Data Sharing’s Annual Data Sharing Symposium: Driving Cancer Advances Through Impactful ResearchOctober 16, 2024The Cancer Research Data Commons 2024 Fall Symposium: Ten Years of Empowering Cancer ResearchersOctober 16, 2024 - October 17, 2024