Cancer Data Science Pulse

The Cancer Data Science Pulse blog provides insights on trends, policies, initiatives, and innovation in the data science and cancer research communities from professionals dedicated to building a national cancer data ecosystem that enables new discoveries and reduces the burden of cancer.

In an era of unprecedented growth in the size and variety of data sets and the number of software tools, there is an ever-increasing need for frameworks that connect and integrate data and tools within a secure and easy-to-use research ecosystem.

NCI is initiating the development of an Imaging Data Commons (IDC) supported by funding provided through the Cancer MoonshotSM. Imaging plays a pivotal role in studying cancer, from diagnosis to fundamental research. Like the NCI Genomic Data Commons (GDC) and Proteomic Data Commons (PDC), the IDC will be a data node, a domain-specific repository, in the CRDC.

Broad and equitable data sharing can be interpreted many ways. For NCI's Office of Data Sharing, this means balancing the support of exciting science and innovation and the needs of research and participant communities with privacy and realistic expectations. This balance is possible when the policies we create acknowledge the benefits and challenges the public, research, and participant communities experience as they share their information to advance disease knowledge and improve healthcare.

Dr. Jaime M. Guidry Auvil serves as the director of the newly-launched NCI Office of Data Sharing (ODS). Headquartered at the Center for Biomedical Informatics and Information Technology, ODS is creating a comprehensive data sharing vision and strategy for NCI and the cancer research community.

One of the most exciting developments of the past decade has been the success of methods broadly described as deep learning. While the roots of deep learning date back to early machine learning research of the 1950s, recent improvements in specialized computing hardware and the availability of labeled data have led to significant advances and have shattered performance benchmarks in tasks like image classification and language processing.

This blog post, the fifth, concludes our series that discusses the principles underlying the collaborative project "Joint Design of Advanced Computing Solutions for Cancer (JDACS4C)."

NCI continues to identify and link external data sources with SEER data to enable the expansion of longitudinal data to form patient trajectories and to support modeling efforts. To inform the incorporation of those additional sources, NCI compiled an extensive breast cancer recurrence data dictionary to identify recurrence-related data elements across multiple sources, including pathology, radiology, pharmacy, biomarkers, procedures, comorbidities, patient-generated information, and radiation oncology.

For this interview, the Center for Biomedical Informatics and Information Technology Communications Team interviewed Dr. Robert L. Grossman of the University of Chicago Center for Data Intensive Science to discuss the Data Commons Framework, a component of the NCI Cancer Research Data Commons.

This is the third in a series of posts that discuss the principles underlying the three-year collaborative program "Joint Design of Advanced Computing Solutions for Cancer (JDACS4C)."

This is the second of a series of posts that discuss the principles underlying the three-year collaborative program “Joint Design of Advanced Computing Solutions for Cancer (JDACS4C).”