Cancer Data Science Pulse

The Cancer Data Science Pulse blog provides insights on trends, policies, initiatives, and innovation in the data science and cancer research communities from professionals dedicated to building a national cancer data ecosystem that enables new discoveries and reduces the burden of cancer.

NCI initiatives are accumulating a wealth of data from the fields of genomics, proteomics, single-cell, radiology, molecular imaging, clinical findings, and more. The newly awarded Cancer Data Aggregator (CDA) is currently being designed and developed to allow scientists to crosstalk among these very diverse data sets, facilitating interoperability not only within the Cancer Research Data Commons but throughout the larger data ecosystem.

The quest to harmonize data has ushered in a new way of thinking about standardization. Now, rather than expecting everyone to adopt a particular model or standard, we’re seeking to leverage technology that can do some of this work for us. The DREAM Challenge was designed to make aggregating and mapping data to the correct lexicon of terms and metadata a nearly seamless step for researchers. Read more about the Challenge that’s currently underway and how we hope to address harmonization in the future.

This new blog installment shines a spotlight on the staff who are working to turn data and IT resources into solutions for addressing data-driven cancer research. Here we feature Mervi Heiskanen, Ph.D., program manager in the Cancer Informatics Branch at CBIIT. Much of her work focuses on data sharing and creating the tools and resources that help to make open data a reality.

Pooling data from numerous sources strengthens the power of the information, but only if it can be meaningfully connected. Dr. Melissa Haendel, Director of the Translational and Integrative Sciences Laboratory, Oregon State University (OSU), and Principal Investigator for the NCI Center for Cancer Data Harmonization, and Julie McMurry, Associate Director of the Translational and Integrative Sciences Laboratory, OSU, describe the basics of harmonization and how it can help in wrangling massive amounts of data to make them more valuable to research.

NCI’s Dr. Erika Kim and Dr. Chris Kinsinger discuss how the Proteomic Data Commons (PDC) aids cancer researchers in accessing and analyzing proteomic data. The PDC is an integral part of NCI’s Cancer Research Data Commons (CRDC) as it gives researchers access to three types of proteomic data: mass spectra, identified peptides, and protein reports, as well as clinical, biospecimen, and other metadata. The PDC is available for queries and analysis of publicly accessible datasets.

CBIIT Director, Tony Kerlavage, discusses his role, CBIIT’s responsibilities and opportunities within the realm of cancer research, and his vision for expanding informatics, IT, collaboration, and data sharing to find treatments, improve outcomes, and make the lives of cancer patients and their families better.

Dr. Nathalie Pochet highlights the Informatics Technology for Cancer Research Program and the support it provides for informatics tools development, including the *AMARETTO framework that is being leveraged to identify novel mechanisms of viral carcinogenesis.

NCI funding opportunity announcement for an Informatics Technology for Cancer Research (ITCR) Education Resource. The ITCR Education Resource will be a new, overarching component of the ITCR program, with the overall mission to conduct activities that engage the research and informatics community to use and extend the ITCR technologies.