Cancer Data Science Pulse

Data Commons

“Count Me In” (CMI) is a unique project that gives patients an opportunity to share their cancer-related data directly with scientists. According to Corrie Painter, associate director of CMI, this is a largely untapped but vital part of data science. Here she describes the project and what it could mean for future research efforts.

On October 20th, NCI launched the Imaging Data Commons (IDC), the latest data repository to be offered within the Cancer Research Data Commons (CRDC) infrastructure. Through the IDC, both researchers and clinicians will have access to a wide range of cancer-related images, including radiology and pathology imaging data, as well as their accompanying metadata.

Dr. Tony Kerlavage, director of NCI’s Center for Biomedical Informatics and Information Technology (CBIIT), sat down to discuss one key component of racial inequality, the issue of health disparities, as it relates to Big Data. As noted by Dr. Kerlavage, representing our diverse U.S. population in research and in the workforce are key, but we also need better data.

Naturally occurring cancers in dogs share similarities with cancer that occurs in humans. The Integrated Canine Data Commons (ICDC), a cloud-based repository of canine cancer data, includes a variety of molecular, clinical, pharmacological, and medical imaging information from pet dogs. Such comparative oncology findings offer researchers greater insight into how best to diagnose, treat, and prevent cancer—in both people and pets.

This new blog installment shines a spotlight on the staff who are working to turn data and IT resources into solutions for addressing data-driven cancer research. This spotlight features Sherri de Coronado, program manager in the CBIIT Cancer Informatics Branch.

NCI initiatives are accumulating a wealth of data from the fields of genomics, proteomics, single-cell, radiology, molecular imaging, clinical findings, and more. The newly awarded Cancer Data Aggregator (CDA) is currently being designed and developed to allow scientists to crosstalk among these very diverse data sets, facilitating interoperability not only within the Cancer Research Data Commons but throughout the larger data ecosystem.

Pooling data from numerous sources strengthens the power of the information, but only if it can be meaningfully connected. Dr. Melissa Haendel, Director of the Translational and Integrative Sciences Laboratory, Oregon State University (OSU), and Principal Investigator for the NCI Center for Cancer Data Harmonization, and Julie McMurry, Associate Director of the Translational and Integrative Sciences Laboratory, OSU, describe the basics of harmonization and how it can help in wrangling massive amounts of data to make them more valuable to research.

NCI’s Dr. Erika Kim and Dr. Chris Kinsinger discuss how the Proteomic Data Commons (PDC) aids cancer researchers in accessing and analyzing proteomic data. The PDC is an integral part of NCI’s Cancer Research Data Commons (CRDC) as it gives researchers access to three types of proteomic data: mass spectra, identified peptides, and protein reports, as well as clinical, biospecimen, and other metadata. The PDC is available for queries and analysis of publicly accessible datasets.

The Imaging Data Commons (IDC) has been awarded to a consortium led by investigators from the Department of Radiology at Brigham and Women’s Hospital and Harvard Medical School. The IDC will house multi-modal imaging data and make them available for use by the broader cancer research community.

In an era of unprecedented growth in the size and variety of datasets and the number of software tools, there is an ever-increasing need for frameworks that connect and integrate data and tools within a secure and easy-to-use research ecosystem.