New Data and Updated Features at NCI's Imaging Data Commons
The newest update from NCI’s Cancer Research Data Commons (CRDC) Imaging Data Commons (IDC) includes new features and more than 16 terabytes of DICOM-standardized medical imaging data for cancer researchers and imaging informaticists.
With this release, researchers can:
- explore the first images in the IDC’s digital pathology collection. The IDC now provides whole slide images and microscopy data from NCI studies, starting with 400 cases of lung cancer from the Clinical Proteomics Tumor Atlas Consortium (CPTAC). Those interested in viewing the slides for a particular case can click the "eye" icon next to the case file to launch the slide microscopy (SliM) viewer.
- use pre-made notebooks to replicate popular IDC use cases. Currently, researchers can explore four use cases; one depicts the 2-year survival score replication of non-small cell lung cancer patients, and another builds on the data from one of the most cited pathomics publications, “Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.” In addition to use cases, the IDC also has demos of tools, including how to use Google BigQuery to access 36 million IDC metadata for downstream analysis in the cloud.
- connect and pull cohort data through the Application Programming Interface. The IDC API complements BigQuery and cloud storage APIs that are available to query metadata tables and retrieve files hosted in IDC.
The IDC’s Discourse forum lists all the new enhancements and improvements.
It’s an exciting resource for those leveraging Artificial Intelligence (AI) and machine learning to grapple with the growing volume of imaging data. “The days when a researcher could download data to the computer under their desk are rapidly fading,” explained Dr. Bradley Erickson, professor of radiology, Medical Director for AI at the Mayo Clinic, and the Chair of the IDC External Advisory Board. “With its connections to the other data types (genomics, proteomics, clinical), IDC provides an efficient means to solve important multimodal AI problems using cloud-scale resources that will advance the care of patients.” As part of the CRDC, IDC’s data has complementary genomics and proteomics data available through the NCI Genomic Data Commons and Proteomic Data Commons allowing researchers to perform integrative analysis across the components.
With these new enhancements, the IDC Team feels poised to meet the evolving needs of the imaging informatics community. “We invite users to explore the IDC’s advanced data visualization. IDC offers incentives through free cloud credits so that researchers can try exploring and analyzing image cohorts by setting up Google projects,” shared IDC’s federal lead, Dr. Keyvan Farahani. “We also encourage IDC users to share feedback and experience with our team and join the IDC community conversation happening on the IDC Discourse forum. IDC is an advanced medical imaging repository, and through a partnership with the research community, we can truly make it one of the most cutting-edge resources for imaging informatics and AI.”
Interested in using the IDC in your digital radiology and pathology research? To begin, watch the IDC’s short, four-part video series to walk you through the features of the portal, building a cohort, and analyzing data through the Google cloud platform.