Cancer Data Science Pulse

Genomics

Dr. Jaime M. Guidry Auvil serves as the director of the newly-launched NCI Office of Data Sharing (ODS). Headquartered at the Center for Biomedical Informatics and Information Technology, ODS is creating a comprehensive data sharing vision and strategy for NCI and the cancer research community.

One of the most exciting developments of the past decade has been the success of methods broadly described as deep learning. While the roots of deep learning date back to early machine learning research of the 1950s, recent improvements in specialized computing hardware and the availability of labeled data have led to significant advances and have shattered performance benchmarks in tasks like image classification and language processing.

This blog post, the fifth, concludes our series that discusses the principles underlying the collaborative project "Joint Design of Advanced Computing Solutions for Cancer (JDACS4C)."

In 2016, a Blue Ribbon Panel (BRP) was established, as part of the Beau Biden Cancer Moonshot, to make key recommendations that would support the Moonshot goals of accelerating progress in cancer research and breaking down barriers to developing new treatments.

In the past year, the use of Artificial Intelligence (AI) in radiology, also called "radiomics," has been getting a lot of attention, mainly because of the progress Deep Learning (DL) has made from a sub-human performance to performance that is similar, or in some cases superior, to that of humans.

Now is the time for researchers across domains to ideate together, share data, and maximize the utility of those data. This is "the urgency of now" according to former Vice President Joe Biden, who delivered the keynote address to those in attendance at the September 2017 Human Proteome Organization (HUPO) Annual World Congress.

I recently joined NCI to help support strategic data sharing and informatics projects within the Center for Biomedical Informatics and Information Technology (CBIIT). Having worked on information management at another Institute for five years and the trans-NIH Big Data to Knowledge (BD2K) initiative since its inception, this is an exciting opportunity for me to continue to contribute to enhancing data science across the biomedical community.

In recent years, genomics has been described as a big data science on par with the likes of Twitter, YouTube, and the scientific pursuit of understanding the universe.

These days there seems to be a lot of talk about atlases for cancer. Most of us are familiar with The Cancer Genome Atlas (TCGA), the long-running effort which, over the past decade, sequenced genomes from thousands of tumor samples covering dozens of cancer types. TCGA catalogued the complex patterns of gene mutations underlying tumors, implicated numerous new cancer genes, and is generally viewed as a resounding success.

In recent years, Challenges have become a popular way to engage and motivate the research and innovation communities to solve difficult problems. Challenges are open competitions where communities are presented with specific and often difficult problems to solve. Participants are given guidelines and test data, and are challenged to compete to find the best solution. Open competition encourages innovative thinking, provides for broad participation, allows funders to set ambitious goals, and is a cost-effective way to encourage collaboration and generate novel solutions.