Cancer Data Science Pulse

For the Love of . . . Data! Dr. Sharpless Shares His Story

Headshot of Dr. Ned Sharpless

At NCI’s Center for Biomedical Informatics and Information Technology (CBIIT), our love for data runs deep. Here, in celebration of NCI’s 50th anniversary and recognizing the Power of Data, we asked NCI and CBIIT staff to tell us what data means to them and to the field of cancer research. This is the first in the series, “For the Love of…Data!” Fittingly, we asked Dr. Ned Sharpless to lead us off as we pay tribute to our love of data!

What made you fall in love with data?

Between my second and third year of med school, I took about a year off to work at NIH in the Howard Hughes Medical Institute-NIH Scholars program. I worked in a lab on campus where I used a cell culture model to study HIV infection in the brain. 

In one of the experiments, I added live virus to cultured human brain cells and then harvested and froze the media aliquots from the cultures daily over the course of the week. Then I’d take all the aliquots collected from multiple experiments and, in a single day, do an analysis on each one to find the amount of viral protein in the media at each timepoint post-inoculation.  

The purpose of the research was to identify which viral strain isolates were able to productively infect the brain, as well as the specific brain cells that were infected. So each experiment would have many different conditions, timepoints, and replicates; and, because I did all the assays in batch, I would get protein levels from hundreds of aliquots all at once.  

Then, by hand using graph paper, I would “crack the code,” matching the viral protein level for each numbered aliquot with the record of when and how that aliquot was collected.  

It was like magic. Some of the viral strains showed high levels of viral protein in the media within a few days of inoculation, and other strains didn’t. We called the brain-infecting strains “neurotropic” and eventually proved they were infecting microglia in the cultures. 

I remember how wonderful it was when I got those protein level results. Even though the strains were analyzed in near random order from the aliquots, the viral protein levels would be high or low in just the right way, according to the experimental condition. This made me appreciate that data could be really powerful and teach you things you didn’t expect.

MicroRNAs, a class of non-coding RNAs that are important for regulating gene expression. Inside of graph shows yellow and blue colors thoughout. Left side reads "743 microRNAs" and "10,170 tumors." This is a heatmap showing the intensity of the color correlating to the presence of different microRNA groups.

Today, scientists can examine data “heat maps,” which show genetic activity profiles from thousands of tumors at one time. This one shows MicroRNAs, a class of non-coding RNAs that are important for regulating gene expression.

What do you think has been the single greatest accomplishment in data science over the past 50 years of cancer research? 

It’s hard to say what has been the greatest accomplishment, but I do know what has really transformed the field of cancer data science: RNA transcriptomics. 

Prior to RNA transcriptional analysis, it was possible to be a cancer data scientist and have only a modest understanding of statistics, while working with Excel on a 1990s desktop computer. Post-RNA expression profiling, everything got much harder. We had to learn (or re-learn) complex data visualization (e.g., heat maps), dimensionality reduction (e.g., PCA), and use much more sophisticated statistics. We also moved analysis from our desktops to the University server (or the cloud). We quickly went from working with one megabyte of data to petabytes. 

As NCI embarks on the next 50 years, can you offer any practical tips or advice that should be considered? 

The only constant is change. Cancer data science is moving so fast. There’s no time to get too attached to any specific research approach or methodology. How we analyze data today, and the type of data we analyze today, will be very different in just a few years. So be flexible and stay prepared for what comes next.
 

Share your story! Tell us why YOU fell in love with data too. Use the comment box below to tell us why you love data.
Dr. Norman "Ned" Sharpless
NCI Director
Older Post
CBIIT Welcomes Dr. Jill Barnholtz-Sloan as the New Associate Director for Informatics and Data Science
Newer Post
Microbiome Bioinformatics Offers New Insight Into Bacteria’s Impact on Cancer

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

On Oct 4, 2017 I presented to Ned Sharpless the framework for a cancer database, including the rational and identification of the challenges. Since then I started a foundation to advocate and support a pediatric brain tumor database. I am on the executive council of CBTN, lead sponsor of PCDC's CNS commons initiative, board member of St. Baldrick's and the chair of PBTC, Ira Dunkel is on my board - Bridge to a Cure Foundation. I am happy to provide more information.