Cancer Data Science Pulse

Data Commons

This new blog installment shines a spotlight on the staff who are working to turn data and IT resources into solutions for addressing data-driven cancer research. This spotlight features Sherri de Coronado, program manager in the CBIIT Cancer Informatics Branch.

NCI initiatives are accumulating a wealth of data from the fields of genomics, proteomics, single-cell, radiology, molecular imaging, clinical findings, and more. The newly awarded Cancer Data Aggregator (CDA) is currently being designed and developed to allow scientists to crosstalk among these very diverse data sets, facilitating interoperability not only within the Cancer Research Data Commons but throughout the larger data ecosystem.

Pooling data from numerous sources strengthens the power of the information, but only if it can be meaningfully connected. Dr. Melissa Haendel, Director of the Translational and Integrative Sciences Laboratory, Oregon State University (OSU), and Principal Investigator for the NCI Center for Cancer Data Harmonization, and Julie McMurry, Associate Director of the Translational and Integrative Sciences Laboratory, OSU, describe the basics of harmonization and how it can help in wrangling massive amounts of data to make them more valuable to research.

NCI’s Dr. Erika Kim and Dr. Chris Kinsinger discuss how the Proteomic Data Commons (PDC) aids cancer researchers in accessing and analyzing proteomic data. The PDC is an integral part of NCI’s Cancer Research Data Commons (CRDC) as it gives researchers access to three types of proteomic data: mass spectra, identified peptides, and protein reports, as well as clinical, biospecimen, and other metadata. The PDC is available for queries and analysis of publicly accessible datasets.

The Imaging Data Commons (IDC) has been awarded to a consortium led by investigators from the Department of Radiology at Brigham and Women’s Hospital and Harvard Medical School. The IDC will house multi-modal imaging data and make them available for use by the broader cancer research community.

In an era of unprecedented growth in the size and variety of datasets and the number of software tools, there is an ever-increasing need for frameworks that connect and integrate data and tools within a secure and easy-to-use research ecosystem.

NCI is initiating the development of an Imaging Data Commons (IDC) supported by funding provided through the Cancer MoonshotSM. Imaging plays a pivotal role in studying cancer, from diagnosis to fundamental research. Like the NCI Genomic Data Commons (GDC) and Proteomic Data Commons (PDC), the IDC will be a data node, a domain-specific repository, in the CRDC.

For this interview, the Center for Biomedical Informatics and Information Technology Communications Team interviewed Dr. Robert L. Grossman of the University of Chicago Center for Data Intensive Science to discuss the Data Commons Framework, a component of the NCI Cancer Research Data Commons.

The data science community is awash with "FAIRness." In the past few years, there has been an emerging consensus that scientific data should be archived in open repositories, and that the data should be Findable, Accessible, Interoperable, and Reusable.

I recently joined NCI to help support strategic data sharing and informatics projects within the Center for Biomedical Informatics and Information Technology (CBIIT). Having worked on information management at another Institute for five years and the trans-NIH Big Data to Knowledge (BD2K) initiative since its inception, this is an exciting opportunity for me to continue to contribute to enhancing data science across the biomedical community.