Modeling Outcomes Using Surveillance Data and Scalable Artificial Intelligence for Cancer (MOSSAIC)

As a part of the NCI-DOE Collaboration, the MOSSAIC project applies natural language processing (NLP) and deep learning algorithms to population-based cancer data. This data comes from NCI's Surveillance, Epidemiology, and End Results (SEER) program.

The goals of the MOSSAIC project are to:

  • deliver advanced computational and informatics solutions needed to support a comprehensive, scalable, and cost-effective national cancer surveillance program.
  • lay the foundation for an integrative data-driven approach to modeling cancer outcomes at scale and in real time.


With this knowledge, scientists may better understand the impact of new diagnostics, treatments, and other factors affecting patient trajectories and outcomes. The MOSSAIC team is developing end-to-end capabilitiesfrom scientific discovery to operationalizationas well as trustworthy, explainable, and secure artificial intelligence (AI) solutions that are extensible across a broad range of data sources.

MOSSAIC is co-led by: 

Aims of the Project

  • Create scalable NLP tools for deep text comprehension of unstructured clinical text to enable automated and accurate capture of reportable cancer surveillance data elements
  • Develop scalable graph, visual, and in-memory heterogeneous data analytics and inference methods for novel hypotheses generation to better understand how the exposome affects precision and population-level cancer outcomes
  • Build a data-driven modeling and simulation paradigm for predictive modeling of patient-specific health trajectories to enable in silico, large-scale evaluation, and recommendation of precision cancer therapies and prediction of their impact

You can find MOSSAIC resources (and previous “Pilot 3—Population Level Pilot” resources) on the NCI-DOE Collaboration AI/ML Resources page, on GitHub, and through the NCI Predictive Oncology Model and Data Clearinghouse.



Vote below about this page’s helpfulness.