NCI-DOE Collaboration Capabilities

The NCI-Department of Energy (DOE) partnership enables future research by making computational tools, algorithms, data sets, and other capabilities resulting from this collaboration available to the broader research community. NCI has established a Capability Transfer Team to help researchers understand, access, implement, and extend the capabilities offered under this initiative. To explore ways you can collaborate, contact our team at NCI-DOECapabilities@nih.gov.

The Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program is a focal point of the strategic, interagency collaboration between NCI and DOE to simultaneously accelerate developments in precision oncology and advanced scientific computing.

Based on a multidisciplinary team science approach, JDACS4C’s three research pilots were co-designed by NCI and DOE, align with several existing NCI and DOE programs, and are jointly led by NCI- and DOE-supported scientists. These teams include scientists from NCI and Frederick National Laboratory for Cancer Research (FNLCR); and experts from DOE national laboratories, principally Argonne, Lawrence Livermore, Los Alamos, and Oak Ridge.

Below is a list of NCI-DOE Collaboration capabilities available for public use. This list will be updated as new capabilities are released. For those capabilities not yet transferred, the original location is provided and will be updated with the final location when available. To submit a new capability, please fill out the Capability Sharing and Publication Tracking Form.

A full listing of the latest publications is also available. To share a new publication, please fill out the Publications and Outreach Tracking Form.

 

Capabilities as of April 16, 2021:

Key:
  • Finalized: software, data set, or model has been completed by a research group and, in their opinion, is ready to be seen by the public.
  • Transferred: software, data set, or model has been moved from the research group site to a public site by the Capability Transfer Team of FNLCR, defined as Mirrored, Validated, and Released.
  • Enhanced: software, data set, or model that has been transferred and has additional components that 1) increase visibility and activity, 2) allow it to be adapted for broader use, or 3) enable it to be used by the extramural community.

 

Pilot 1 Predictive Modeling for Pre-Clinical Screening

The Predictive Modeling for Pre-Clinical Screening Pilot aims to develop predictive capabilities of drug responses in pre-clinical models of cancer to improve and expedite the selection and development of new targeted therapies for patients with cancer.

Capability Status Type Description Impact

ANS: Autoencoder Node Saliency

Finalized Software
  • Explains the unsupervised learning process in autoencoders
  • Measures the distribution of the latent representations generated by an autoencoder
  • Ranks and identifies specialty nodes that separate two given classes

Allows users to understand the importance of neural network nodes in autoencoders.

Combo: Combination drug response predictor

Finalized Model (untrained*) Predicts combinations of drug responses under different experimental configurations. Enables predictions of drug responses under different experimental configurations.

NT3: Normal-tumor pair classifier

Transferred Model (trained**) Classifies tumor type; augments existing data quality control methods. Offers a 1D-convolutional network for classifying RNA-seq gene expression profiles into normal or tumor tissue categories.

P1B1: Gene expression autoencoder

Finalized Model (untrained*) Given a sample of gene expression data, builds a sparse autoencoder that can compress the expression profile into a low-dimensional vector. Offers an autoencoder to collapse high-dimensional expression profiles into low-dimensional vectors without significant loss of information.

P1B2: Mutation classifier

Finalized Model (untrained*) Given patient somatic SNPs, builds a deep learning network that can classify the cancer type. Offers a means for classifying sparse data.

P1B3: Single Drug Response Predictor

Finalized Model (untrained*)

Sparse Classifier Disease Type Prediction from Somatic SNPs: Given drug screening results on NCI60 cell lines, builds a deep learning network that can predict the growth percentage from cell line gene expression data, drug concentration, and drug descriptors.

Enables prediction of growth percentage of a cell line treated with a new drug.

TC1: Tissue type classifier

Transferred Model (trained**)

Allows classification of tumor type based on sequence data; these augment existing data quality control methods.

Augments existing data quality control methods. 

Uno: Unified drug response predictor Finalized Model (untrained*) Predicts tumor dose response across multiple data sources. Enables predictions of drug responses under different experimental configurations.

*No coefficients (parameter values) established. Trained models will be added as they become available.

**Trained model is defined by combining untrained model + data + weights.

Pilot 2 Improving Outcomes for RAS-Related Cancers

Improving Outcomes for RAS-related Cancers aims to deliver a validated multiscale model of RAS biology on a cell membrane by combining the experimental capabilities at the FNLCR with the computational resources of the National Nuclear Security Administration (NNSA), a semi-autonomous DOE agency. The principal challenge in modeling this system is the diverse length and time scales involved.

Capability Status Type Description Impact
DynIm Transferred Software This is the first tool to perform “dynamic” sampling where the input distribution can change over time and the sampling adapts itself to the new distribution. Enables machine learning-based adaptive multiscale simulations for cancer biology.
MemSurfer Finalized Software Computes and analyzes membrane surfaces found in a wide variety of large-scale molecular simulations. MemSurfer works independent of the type of simulation, directly on the 3D point coordinates. Enables assessment of lipid membrane curvature and density; allows counting of normals lipids and area per lipid. Also provides a simple-to-use Python API to perform other types of analysis.
Crystal structure of KRAS bound with RAF1 RBDCRD Transferred Data Set Crystal structures of wild-type and oncogenic mutants of KRAS complexed with the RAS-binding domain (RBD) and the membrane-interacting cysteine-rich domain (CRD) from the N-terminal regulatory region of RAF1 are elucidated. Three structures related to Pilot 2 are listed: 6XI7, 6XHB, 6VJJ. This novel structure enables drug discovery of inhibitors against this complex.
Crystal structure of RBDCRD alone or bound to membrane mimetic Transferred Data Set Crystal structures of RBDCRD alone or bound to membrane mimetic. Three structures related to Pilot 2 are listed: 6VC8, 6VJJ, 5TB5. Detailed structure allows more accurate modeling of protein-membrane interactions.

Pilot 3 Population Information Integration, Analysis, and Modeling for Precision Surveillance

Population Information Integration, Analysis, and Modeling for Precision Surveillance aims to leverage high-performance computing and artificial intelligence to meet the emerging needs of cancer surveillance. Moreover, Pilot 3 NCI-DOE seeks to develop a fully integrated data driven modeling-and-simulation framework to enable meaningful translation of big SEER data.

Capability Status Type Description Impact
Active Learning for NLP Systems Finalized Software Offers an active learning framework for natural language processing of pathology reports. Enables rapid annotation of pathology reports via machine learning.
ML Ready Pathology Reports Transferred Data Set Machine learning ready pathology reports with the associated site and histology labels downloaded from the Genomic Data Commons. Enables users to have a pathology report data set to use with many of the other capabilities.
MT-CNN Transferred Model (trained**) A convolutional neural network for natural language processing and information extraction from free-form texts. Allows automatic information extraction from free-form pathology report texts. Faster than HiSAN.
HiSAN Transferred Model (trained**) Hierarchical self-attention network for information extraction from cancer pathology reports. Allows automatic information extraction from free-form pathology report texts. More accurate than MT-CNN.

Accelerating Therapeutics for Opportunities in Medicine (ATOM)

The ATOM Consortium is a public-private partnership whose mission is to transform drug discovery by accelerating the development of more effective therapies for patients.

Capability Status Type Description Impact
ATOM Modeling PipeLine Transferred Software Offers an open source, modular, extensible software pipeline for building and sharing models to advance in silico drug discovery. Extends the functionality of DeepChem and supports an array of machine learning and molecular featurization tools. AMPL benchmarks on a wide range of parameters are currently available for several pharmaceutical data sets.

CANcer Distributed Learning Environment (CANDLE)

CANDLE is an open source, collaboratively developed software platform that provides deep learning methodologies. Driven by scientific challenges in cancer research, as defined by JDACS4C pilot efforts, CANDLE capabilities build on advanced computing support from DOE’s Exascale Computing Project (ECP). CANDLE is deployed on NIH's Biowulf supercomputer.

Capability Status Type Description Impact
CANDLE Software Stack Enhanced Software Improves machine/deep learning models by performing hyperparameter optimization. Enables hyperparameter optimization on machine/deep learning models.

The Predictive Oncology Model and Data Clearinghouse (MoDaC)

MoDaC is a data-sharing repository developed to transition resources to the broader research community. These resources include data sets and software models from computational capabilities developed within NCI and in collaboration with programs such as JDACS4C and ATOM. Annotated data sets stored in the repository are publicly available and can be searched against their metadata and downloaded for analysis.

Capability Status Type Description Impact
MoDaC: Predictive Oncology Model and Data Clearinghouse   Software Platform

Offers a public-facing repository to enable sharing of JDACS4C data sets with the cancer research community. Provides a web-based interface for NCI–DOE researchers to upload large, annotated data sets, which then can be searched by metadata and downloaded. The web application leverages the Data Services API core in the backend to provide access to an S3 object store. Salient features include:

  • Generic, expandable data hierarchy and metadata structure.
  • Metadata-based searches of files and collections
  • Multi-level data access policy for open (without user registration), registered, or controlled access
  • Ability to keep data sets private or restricted (group-level access) until ready for sharing (useful for pre-publication data)
  • Support for data transfers to/from Globus and AWS S3 endpoints.
Enables storage and sharing of annotated data sets.
Updated: