Data Quality for LLMs: Building a Reliable Data Foundation
If you use large language models (LLMs) in your cancer research, register for this seminar to hear Elucidata’s Dr. Abhishek Jha discuss how data quality impacts LLM performance.
A reliable foundation that is well annotated and accessible to an LLM plays a major role in the value of its results.
You’ll see examples of how LLM-powered artificial intelligence (AI) agents query across three versions of the same gene expression corpus with differing results, including:
- unstructured data from the public repository Gene Expression Omnibus.
- structured data from the Crowd Extracted Expression of Differential Signatures project (tool developed by the Ma’ayan Lab at the Icahn School of Medicine at Mount Sinai).
- clean, linked, and harmonized data.
Dr. Jha will use these examples to discuss how the different quality in these data sources impacts LLM performance.
Dr. Jha is the co-founder and CEO at Elucidata. Previously, he was a senior scientist at Agios Pharmaceuticals, which has successfully brought three first-in-class drugs for acute myeloid leukemia patients to the market. His experience at Agios inspired him to build Elucidata. Elucidata’s mission is to help scientists save time on routine data and machine learning operations tasks so they can shift their focus to high-value research. This eventually helps patients receive drugs sooner. Elucidata is building technology and solutions for R&D teams to better leverage data and reduce their drug development timelines. Dr. Jha is an alumnus of Massachusetts Institute of Technology, the University of Chicago, and the Indian Institute of Technology Bombay.
Upcoming Events
- HLA-Arena: Enabling Structure-Guided Pipelines for Personalized Cancer Immunotherapy DesignApril 30, 2025AI-Driven Spatial Transcriptomics Unlocks Large-Scale Breast Cancer Biomarker Discovery from HistopathologyMay 07, 2025Co-clinical Imaging Research Resource Program (CIRP) Annual Virtual Meeting 2025—Celebrating A Ten-Year MilestoneMay 07, 2025 - May 08, 2025Agentic AI in Cancer ResearchMay 27, 2025BTEC 2025 Annual Conference: The Epidemiology of Brain Metastases for Adult and Pediatric Brain TumorsMay 29, 2025Ctrl+Alt+Cure: Driving Smarter Cancer CareJune 11, 2025