Center for Cancer Data Harmonization (CCDH)

NCI's Center for Cancer Data Harmonization (CCDH) aims to facilitate the harmonization of data available through NCI's Cancer Research Data Commons (CRDC). As new data sets are added to the CRDC, harmonization is vital. Pooling data from numerous sources strengthens the power of the information, but only if it can be meaningfully connected. The standards promoted by CCDH will ensure that multi-modal cancer research data can be combined, analyzed, and turned into knowledge that's useful to the broader research community.  

The objectives of the CCDH are to: 

  • Create a model that can be used to help retrieve information across all CRDC data commons nodes and data coordinating centers to allow researchers to tap into repositories of data related to genomics, proteomics, clinical information, comparative oncology, and images.  
  • Offer online technical assistance on models, metadata, terminology, and supporting tools to help users submit, access, and analyze data within the CRDC.
  • Implement a web-based portal to aid in navigating a harmonized data model (CRDC-H), terminologies, and tools.
  • Create, adapt, and disseminate tools for use in data harmonization.
  • As needed, develop new terminology, metadata, mappings, and models to support data aggregation through the CRDC.

Working Groups

At the heart of CCDH's mission is collaboration. In developing harmonization models, CCDH relies on frequent and ongoing feedback from a wide range of users and subject matter experts as outlined below. 

Community Development

Lead: Samuel Volchenboum, M.D., Ph.D., M.S. (University of Chicago, Associate Professor of Pediatrics, Associate Chief Research Informatics Officer, Associate Director, Institute for Translational Medicine) 

Co-lead: Nicole Vasilevsky, Ph.D. (Oregon Health & Science University, Research Assistant Professor, Medical Informatics and Clinical Epidemiology, School of Medicine, Lead Biocurator, Oregon Clinical and Translational Research Institute) 

The CCDH serves a wide and diverse variety of communities, including: 

  • bioinformaticians and data scientists 
  • biospecimen collectors/aggregators/contributors
  • clinical researchers

The CCDH uses workshops, online forums, issue tracking, and personal communications to collaborate with users. By involving members of the community, the CCDH can better develop models, tools, and services that speak directly to the needs of its users. 

Data Model Harmonization

Lead: Christopher Chute, M.D., Dr.P.H. (Johns Hopkins University, Bloomberg Distinguished Professor of Health Informatics, Professor of Medicine, Public Health and Nursing, Chief Research Information Officer, and Deputy Director, Institute for Clinical and Translational Research) 

Co-Lead: Brian Furner, M.Sc. (University of Chicago, Director of Applications Development, Center for Research Informatics)  

To further help refine the CRDC, the CCDH performs landscape analyses to leverage existing data models within repositories and external models (including FHIR, BioLink model, and CDISC/BRIDG). Through this research, the CCDH can identify how to build on those models to support and implement the harmonization of the full range of CRDC data through the CRDC-H harmonized model.  

Ontology and Terminology Ecosystem

Lead: Harold Solbrig, M.Sc. (Johns Hopkins University, Assistant Professor of Medicine, School of Medicine) 

The CCDH works to find new and better ways of accessing and navigating data by providing online access to relevant terminologies; it also supports terminology content development and terminological extension as needed and provides algorithmic support to match and map existing data to new data sets to support the varied communities seeking to use CRDC data. 

Tools and Data Quality

Lead: James Balhoff (University of North Carolina, Senior Research Scientist, Renaissance Computing Institute [RENCI]) 

Building on the work from the "Data Model Harmonization" and "Ontology and Terminology Ecosystem" workstreams, the CCDH aims to develop tools and/or repurpose tools and models to help harmonize and apply structured metadata to data being submitted to the CRDC nodes. 

Program Management

Lead: Melissa Haendel, Ph.D. (Oregon State University, Director, Translational Data Science, Linus Pauling Institute) 

Co-lead: Monica Munoz-Torres, Ph.D. (Oregon State University, Program Manager, Translational and Integrative Sciences Lab)

The Program Management and Operations workstream oversees and supports the team's efforts through project management and cross workstream communication, coordination, and collaboration. 

For more information on CRDC's data harmonization, questions regarding the CCDH, or to schedule time to speak with one of the working group principal investigators, email the CCDH.