Big Data Meets Healthcare: The Case for Comparability and Consistency
The well-known phenomenon of "information explosion" has impacted virtually all areas of human enterprise, and healthcare has become no exception. While one might quibble whether more information is actually being created, there is no disagreement that vastly more information is being electronically captured and stored. Latent within the proliferation of such machine readable archives of information lays previously impractical metrics, capabilities for linkages and association, and ultimately new knowledge. The over-used moniker of "big data" is applied to the rise of vast, potentially-federated data sources, analytic methods for their interpretation, and emergent findings. Despite this non-precision, most observers agree that there is something new and different emergent in the opportunistic mining of disparate data on an unprecedented scale.
Examples of impressive inferences from big data abound in finance, marketing, education, social sciences, and economics. More focused, "big science" opportunities are self-evident in astronomy, physics, and arguably the discovery of the Higgs Boson (which really was inferred from perturbations observed across Exabytes of experimental particle-accelerator data). In biology and medicine the sweet spot has historically been in the human genome, where genotype-phenotype associations emerge from "genome-wide association studies" done at massive scale — more so in the present era of whole-genome sequencing.
The promise of best-evidence discovery, comparative effectiveness research, new outcomes analyses, adverse event discovery, and improved clinical care in general that might emerge from big-data methods applied to large, federated, clinical data repositories is intriguing. There is "gold in them hills," and the potential benefits of well-conducted data mining must not be lightly dismissed.
However, caution must dominate an otherwise unfettered analyses of clinical information, as the consequences of skewed, biased, spurious, or otherwise "wrong" answers can have serious adverse impact. While most of us are quite content to have a target answer appear "on the page" of a Google search result, somehow having the right answer "on the list" but not chosen for healthcare interventions may be interpreted as malpractice in some litigious countries — not to mention likely sub-optimal outcomes for a patient. Clinical decision support resources may recommend a spectrum of options to a clinician — who presumably has the responsibility of synthesizing such advice and selecting the optimal path, though few would argue that the amount of information and the complexity of their interactions have long ago exceeded the unaided human capacity for cognition, reliable processing, or well-balanced interpretation.
The importance of comparable and consistently represented clinical information, either at entry or through normalization to a canonical form, must remain as a necessary step before big-data methods can be meaningfully or safely applied to clinical data repositories.
Christopher Chute, M.D. received his undergraduate and medical training at Brown University, internal medicine residency at Dartmouth College, and doctoral training in Epidemiology at Harvard University. He is Board Certified in Internal Medicine and Clinical Informatics, and is a Fellow of the American College of Physicians, the American College of Epidemiology, and the American College of Medical Informatics. He is presently the Bloomberg Distinguished Professor of Health Informatics at Johns Hopkins University, as well as professor of Medicine, Public Health, and Nursing. Additionally, he is Chief Health Research Information Officer for Johns Hopkins Medicine. He also chairs the World Health Organization (WHO) ICD-11 Revision.
Dr. Chute became founding Chair of Biomedical Informatics at Mayo in 1988, and retired as Professor of Biomedical Informatics and Section Head in 2014. He was PI on Mayo’s CTSA Informatics core, the eMERGE cooperative agreement on genotype to phenotype association, the Pharmacogenomics Research Network Ontology Resource, the LexGrid projects, and co-PI on the National Center for Biomedical Ontology. Recent grants as PI include the HHS/Office of the National Coordinator (ONC) SHARP (Strategic Health IT Advanced Research Projects) on Secondary EHR Data Use and the ONC Beacon Community (Co-PI). Dr. Chute chaired Mayo’s Data Governance Committee and served on Mayo’s enterprise IT Oversight Committee, and CTSA Executive Committee. Recently held external positions include Chair, ISO Health Informatics Technical Committee (ISO TC215), service as an index member on the Health Information Technology Standards Committee for the Office of the National Coordinator in the US DHHS, a member of the HL7 Advisory Council, and the initial Chair of the Biomedical Computing and Health Informatics study section at NIH.
- Cancer Research Data Commons and Other NCI Infrastructures in Support of Data ScienceSeptember 19, 2021AttentiveChrome: Deep Learning for Predicting Gene Expression from Histone ModificationsSeptember 22, 2021“Le Grand et Le Petit”: Splicing Factors SF3B1 and SUGP1 and Their Cancer Mutations Leading to Aberrant Acceptor UsageSeptember 22, 2021The Future of Clinical Trial Data Sharing.... The Art of The PossibleSeptember 23, 2021Genomic Data Commons Single Cell RNA-Seq SupportSeptember 27, 2021Virtual Workshop on Next-Generation Sequencing and Radiomics: Resource Requirements for Acceleration of Clinical Applications Including AISeptember 29, 2021 - September 30, 2021