Cancer Data Science Pulse

Seminar Series

CBIIT’s May 19 Data Science Seminar Series speaker, Dr. Kristen Naegle, took the speed of computational biology, blended it with basic science know-how, and developed an algorithm that is proving to be remarkably effective in predicting kinase activity. Understanding kinases in oncology may help identify people who are more likely to respond (or not respond) to certain medications, further advancing precision medicine.

Dr. Charles Wang offers a sneak peek at his upcoming Data Science Seminar presentation, scheduled for April 7. His recent study provides guidance for choosing an appropriate scRNA-seq platform and software tool for a scRNA-seq study. Using these guidelines, scientists can select the workflow that will yield the most meaningful results.

I have been involved in the design and implementation of cancer research information systems throughout my entire 30-year career. My father was the principal designer of the Apollo Lunar Descent Guidance and Navigation software that landed the first men on the moon in the late 1960's. Growing up in the Boston area, I became intensely interested in his work and spent many weekends tagging along with him in the MIT mainframe computer laboratory.

Biomedical knowledge is typically centered around the variety of biological entity types, such as genes, genetic variants, drugs, diseases, etc. Collectively, we refer to them as "BioThings." The volume of biomedical data has grown explosively, thanks to the efforts of many different researchers and consortia. This explosive growth includes many different types of data using many different formats and standards, making it difficult to unify the disparate sources of data.

One of the most exciting developments of the past decade has been the success of methods broadly described as deep learning. While the roots of deep learning date back to early machine learning research of the 1950s, recent improvements in specialized computing hardware and the availability of labeled data have led to significant advances and have shattered performance benchmarks in tasks like image classification and language processing.

In the past year, the use of Artificial Intelligence (AI) in radiology, also called "radiomics," has been getting a lot of attention, mainly because of the progress Deep Learning (DL) has made from a sub-human performance to performance that is similar, or in some cases superior, to that of humans.

Now is the time for researchers across domains to ideate together, share data, and maximize the utility of those data. This is "the urgency of now" according to former Vice President Joe Biden, who delivered the keynote address to those in attendance at the September 2017 Human Proteome Organization (HUPO) Annual World Congress.

The data science community is awash with "FAIRness." In the past few years, there has been an emerging consensus that scientific data should be archived in open repositories, and that the data should be Findable, Accessible, Interoperable, and Reusable.

Biomedical research is evolving with an increasing emphasis on data science, e.g.,

The Seven Bridges Cancer Genomics Cloud (CGC) is one of three pilot systems funded by the National Cancer Institute with the aim of co-localizing massive genomics