Cancer Data Science Pulse

Seminar Series

I have been involved in the design and implementation of cancer research information systems throughout my entire 30-year career. My father was the principal designer of the Apollo Lunar Descent Guidance and Navigation software that landed the first men on the moon in the late 1960's. Growing up in the Boston area, I became intensely interested in his work and spent many weekends tagging along with him in the MIT mainframe computer laboratory.

Biomedical knowledge is typically centered around the variety of biological entity types, such as genes, genetic variants, drugs, diseases, etc. Collectively, we refer to them as "BioThings." The volume of biomedical data has grown explosively, thanks to the efforts of many different researchers and consortia. This explosive growth includes many different types of data using many different formats and standards, making it difficult to unify the disparate sources of data.

One of the most exciting developments of the past decade has been the success of methods broadly described as deep learning. While the roots of deep learning date back to early machine learning research of the 1950s, recent improvements in specialized computing hardware and the availability of labeled data have led to significant advances and have shattered performance benchmarks in tasks like image classification and language processing.

In the past year, the use of Artificial Intelligence (AI) in radiology, also called "radiomics," has been getting a lot of attention, mainly because of the progress Deep Learning (DL) has made from a sub-human performance to performance that is similar, or in some cases superior, to that of humans.

Now is the time for researchers across domains to ideate together, share data, and maximize the utility of those data. This is "the urgency of now" according to former Vice President Joe Biden, who delivered the keynote address to those in attendance at the September 2017 Human Proteome Organization (HUPO) Annual World Congress.

The data science community is awash with "FAIRness." In the past few years, there has been an emerging consensus that scientific data should be archived in open repositories, and that the data should be Findable, Accessible, Interoperable, and Reusable.

Biomedical research is evolving with an increasing emphasis on data science, e.g., data integration and storage, data privacy and security, data analytics and data representation, driven by the transformative technologies that have become the currency of genomics in precision medicine. In spite of numerous "beachhead" successes, however, the gap between data and clinical utility continues to grow.

The Seven Bridges Cancer Genomics Cloud (CGC) is one of three pilot systems funded by the National Cancer Institute with the aim of co-localizing massive genomics datasets, like The Cancer Genomics Atlas (TCGA), alongside secure and scalable computational resources for analysis.

Armed with sufficient data across very large populations, it seems plausible that a learning healthcare system can emerge. But what do we have to do to get there?

Researchers are using 3D printing to gain insights that contribute to advances in basic biomedical research and the development of precision medical therapies by creating 3D models of pathogens, tumors, normal tissues, cells, and biomolecules. Dr. Sriram Subramaniam, principal investigator in the Laboratory of Cell Biology at the NCI Center for Cancer Research (CCR), uses 3D printing as both an educational and a research tool.