Population Level Pilot: Population Information Integration, Analysis, and Modeling for Precision Surveillance

The population level pilot of the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program is based on applying natural language processing (NLP) and deep learning algorithms to cancer statistics collected by the NCI Surveillance, Epidemiology, and End Results (SEER) program.

The goal of this pilot is to transform cancer care by applying advanced computational capabilities to population-based cancer data. This will lead to greater understanding about the impact of new diagnostics, treatments, and other factors affecting patient outcomes.

Pilot Leads

With shared expertise across the JDACS4C collaboration, this pilot is jointly led by:

  • Dr. Lynne Penberthy, NCI, Division of Cancer Control and Population Sciences
  • Dr. Georgia Tourassi, Oak Ridge National Laboratory

Aims of the Pilot

  • Information capture of unstructured clinical text to improve the capacity of the NCI’s Surveillance, Epidemiology, and End Results (SEER) program
  • Information integration and analysis of extreme-scale graph, visual, and in-memory data to understand drivers in patterns of cancer outcomes and predict clinical endpoints
  • Data-driven modeling of patient-specific and population level health trajectories to guide precision cancer care

Current Progress

Since the launch of the pilot in 2016, the team has:

  • Developed, deployed, and refined pathology report annotation to advance critical training data and validate computational models
  • Established data use agreements to access SEER cancer registry data
  • Applied the NLP tools to automatically extract and code key features in pathology reports
  • Established a process for obtaining real-time feedback from SEER state cancer registries through an application programming interface (API) software product

Future Development

In the future, together with the knowledge gained from the JDACS4C molecular and cellular level pilots, this pilot will offer:

  • Improved, real-time development of new, integrated sources of health and cancer surveillance information
  • Additional insights about the effect of real-world factors on patient health trajectories, eligibility for clinical trials, and prediction of cancer patient outcomes
Updated: