Cancer Data Science Pulse
The Promise and the Challenge of Deep Learning in Pathology
One of the most exciting developments of the past decade has been the success of methods broadly described as deep learning. While the roots of deep learning date back to early machine learning research of the 1950s, recent improvements in specialized computing hardware and the availability of labeled data have led to significant advances and have shattered performance benchmarks in tasks like image classification and language processing.
The implications of these advances could be profound for how cancer is diagnosed, treated, and understood.¬†The most immediate promise of deep learning is perhaps in diagnostic specialties like radiology and pathology, where visual interpretation is central in practice. Early results have shown algorithms rivaling or surpassing human performance in tasks like detection of lymph node metastases or predicting clinical outcomes from images and/or genomics. While we are still in the early stages of this era, it now seems apparent that these algorithms will achieve the performance needed for clinical utility and will improve diagnostic accuracy and clinical management in many applications. Freeing pathologists from routine and labor-intensive tasks can help them to focus more on exceptional cases and will shift their focus from pattern recognition tasks to reasoning and integrating increasingly complex diagnostic findings. Realizing this future requires addressing new challenges, with emphasis on how we collect and curate datasets that link clinical outcomes, treatment, histology, molecular assays, and imaging, and also improving how we interpret and validate predictive deep learning models.
To understand why deep learning performs so well in prediction tasks, and the future challenges related to deep learning, it's important to understand the paradigm shift towards data-driven learning and how it has transformed the way we think about prediction problems. In the previous feature engineering paradigm, we sought to design algorithms using a priori knowledge to transform raw data into a more meaningful representation where prediction becomes easier. This transformation is necessary because it is difficult to create accurate predictive models using data that is high-dimensional (having a large number of observations, e.g., gene expression profiles), or that is unstructured (e.g., images where an individual pixel lacks intrinsic meaning). A feature engineering approach might address this by using biochemical pathway databases to transform gene expression data into pathway activation scores, or by using image processing to delineate structures of interest within an image so that they can be measured and described with quantitative features. Feature engineering has produced many successes but has important limitations. Our knowledge is imperfect in most problems, and while our clinical collaborators can suggest what features they think are important, they often acknowledge that data likely harbor important latent content. Feature engineering systems are also very specific to an application and are difficult to adapt to new problems and datasets.
In the deep learning paradigm, algorithms also transform raw data into more meaningful representations, but these representations are learned from data in an unbiased manner. The process of "learning" exposes a naive model to the data and iteratively adapts its parameters to maximize prediction accuracy. The ability of these models to rival or surpass human performance is fascinating and seems to embody the concept of learning. The deep learning paradigm is also flexible, and similar models can be easily adapted to address a wide variety of applications or disease domains, provided adequate data. Deep learning models can also be made more robust to data variations, producing models that generalize better across institutions or batches. All of this translates into prediction models that are more accurate and that take less time and energy to develop.
Realizing the benefits of deep learning typically requires datasets containing many hundreds or thousands of samples. Without the benefit of a priori knowledge to see through the noise, numerous diverse examples are needed to teach models to distinguish generalizable patterns. Rare or exceptional cases are very important in cancer diagnosis, which highlights an important limitation of deep learning and an important difference between machines and human experts who may recognize something that they have only ever encountered in a textbook. Another common critique relates to the role of interpretation in validation and clinical adoption of deep learning models. While many are concerned about challenges in characterizing the failure modes of black box predictive models, others hold the view that it is not necessary to explain prediction mechanisms as long as large validation cohorts are carefully constructed to include examples that may challenge or confuse the model.
How deep learning will impact pathology largely depends on our ability to aggregate, curate, and annotate large and high-quality datasets. This could become the next fundamental challenge by those in the industry who are mobilizing to scale data aggregation and curation. Scalable production of molecular and digital pathology image data has improved greatly, but the annotation of images and the acquisition of the clinical and outcomes data still present significant challenges. Annotating medical images requires significant time commitments from experts who are busy with clinical responsibilities. This is especially true for histology images, which are notably difficult to annotate due to their size and complexity. Addressing logistic and user interface challenges can help alleviate some of the annotation burden, and web-enabled software can help us engage broader audiences to scale up annotation efforts. Addressing challenges in collecting clinical outcomes and treatment information does not have easy solutions, particularly in diseases with long horizons or when patients move between healthcare systems.
The inclusion of whole-slide histology images in The Cancer Genome Atlas (TCGA) had a significant impact on computational pathology research. The size of TCGA cohorts and the ability to link histology to genomic data and clinical outcomes have provided a testbed for the development of new algorithms including deep learning algorithms and will continue to provide new insights into cancer histology. As rich as TCGA is, larger datasets and more complete outcomes are needed to investigate the most important and difficult prediction problems. New data sharing agreements can play an important role in addressing the challenge of scale, and efforts like the Oncology Research Information Exchange Network (ORIEN) could have a similar impact on machine learning research. Predicting treatment response is a critical problem where deep learning can make important contributions, but access to datasets where treatment is standardized is needed. Finally, creating infrastructure to share large datasets with the community, enabled by cloud computing and open benchmarking, will also play an important role in advancing development of the next generation of predictive cancer models.