NCI’s Imaging Data Commons Helps Address Challenges in Computational Pathology
If you’re conducting computational pathology studies using machine learning (ML), you know that reproducing results can be challenging. Whatever the reasons (e.g., insufficient information on analysis methods and computing environments, problems accessing and selecting data, etc.), this lack of reproducibility is a key obstacle in translating research solutions into clinical practice.
The NCI’s Imaging Data Commons (IDC), which is part of NCI’s Cancer Research Data Commons (CRDC), offers access to more than 120 cancer image collections, and works with the cloud ML services. One of IDC’s central goals is to improve the reproducibility of data-driven cancer imaging research. Now, thanks to findings from a recent study, there’s evidence that the IDC is achieving that goal.
The researchers, who also were involved in building and refining the IDC, used the platform to test the reproducibility of a representative task in computational pathology—classifying tumor tissue.
In two separate experiments, the researchers found that their results were largely reproducible. What’s important, said Ms. Daniela Schacherer, is that the IDC helped us overcome key challenges in computational pathology reproducibility.
Specifically, the IDC:
- offers imaging data collections that adhere to FAIR principles (i.e., the data are findable, accessible, interoperable, and reusable), so it’s more likely you’ll generate the same results, using the exact same data, in follow-up studies.
- facilitates the use of cloud ML services, which means others can run the same experiment in identically configured computing environments.
It’s these built-in attributes that will help you in reproducing your pathology studies.
The authors did observe occasional, small outliers. According to first author, Ms. Schacherer, “There appears to be a practical limit to reproducibility when using cloud ML services because of variations in the underlying host hardware and software, which is simply outside the user’s control.”
New advances also are underway. As noted by Ms. Schacherer, “NCI is expanding the IDC to better meet today’s computational pathology needs, such as adding images from the Human Tumor Atlas Network, which offers even more detail on subcellular processes. We’re also finding new ways to integrate the IDC with other repositories within the CRDC so researchers can blend their tissue image data with other types of molecular cancer data.”