Cancer Data Science Pulse

Data Sharing

This new blog installment shines a spotlight on the staff who are working to turn data and IT resources into solutions for addressing data-driven cancer research. This spotlight features Sherri de Coronado, program manager in the CBIIT Cancer Informatics Branch.

NCI initiatives are accumulating a wealth of data from the fields of genomics, proteomics, single-cell, radiology, molecular imaging, clinical findings, and more. The newly awarded Cancer Data Aggregator (CDA) is currently being designed and developed to allow scientists to crosstalk among these very diverse data sets, facilitating interoperability not only within the Cancer Research Data Commons but throughout the larger data ecosystem.

The quest to harmonize data has ushered in a new way of thinking about standardization. Now, rather than expecting everyone to adopt a particular model or standard, we’re seeking to leverage technology that can do some of this work for us. The DREAM Challenge was designed to make aggregating and mapping data to the correct lexicon of terms and metadata a nearly seamless step for researchers. Read more about the Challenge that’s currently underway and how we hope to address harmonization in the future.

This new blog installment shines a spotlight on the staff who are working to turn data and IT resources into solutions for addressing data-driven cancer research. Here we feature Mervi Heiskanen, Ph.D., program manager in the Cancer Informatics Branch at CBIIT. Much of her work focuses on data sharing and creating the tools and resources that help to make open data a reality.

Pooling data from numerous sources strengthens the power of the information, but only if it can be meaningfully connected. Dr. Melissa Haendel, Director of the Translational and Integrative Sciences Laboratory, Oregon State University (OSU), and Principal Investigator for the NCI Center for Cancer Data Harmonization, and Julie McMurry, Associate Director of the Translational and Integrative Sciences Laboratory, OSU, describe the basics of harmonization and how it can help in wrangling massive amounts of data to make them more valuable to research.

Just as numerous role models can shape, foster, and guide a child into adulthood, so can the various stakeholders within the broader cancer research community play a pivotal role in the success of data sharing efforts. Your input is critical as NCI seeks to make the most of the federal investment to collect, analyze, and share data to address the burden of cancer in children, adolescents, and young adults.

Biomedical knowledge is typically centered around the variety of biological entity types, such as genes, genetic variants, drugs, diseases, etc. Collectively, we refer to them as "BioThings." The volume of biomedical data has grown explosively, thanks to the efforts of many different researchers and consortia. This explosive growth includes many different types of data using many different formats and standards, making it difficult to unify the disparate sources of data.

Broad and equitable data sharing can be interpreted in many ways. For NCI's Office of Data Sharing, this means balancing the support of exciting science and innovation and the needs of research and participant communities with privacy and realistic expectations. This balance is possible when the policies we create acknowledge the benefits and challenges the public, research, and participant communities experience as they share their information to advance disease knowledge and improve healthcare.

Dr. Jaime M. Guidry Auvil serves as the director of the newly-launched NCI Office of Data Sharing (ODS). Headquartered at the Center for Biomedical Informatics and Information Technology, ODS is creating a comprehensive data sharing vision and strategy for NCI and the cancer research community.

I recently joined NCI to help support strategic data sharing and informatics projects within the Center for Biomedical Informatics and Information Technology (CBIIT). Having worked on information management at another Institute for five years and the trans-NIH Big Data to Knowledge (BD2K) initiative since its inception, this is an exciting opportunity for me to continue to contribute to enhancing data science across the biomedical community.