Cancer Data Science Pulse
For the Love of . . . Data! CBIIT Director Tony Kerlavage Looks at Advances in Data and Technology
This blog by CBIIT Director, Dr. Tony Kerlavage, wraps up our look at some of the reasons why people love data and concludes NCI’s special feature on the power of data. But we’re not done yet!
If you have a story to tell about why you love data, please let us know in the comment box below. We hope to continue featuring special blogs to show how data have impacted our lives, and how Big Data stands to revolutionize cancer and biomedical research.
What made you fall in love with data?
In 1985, in my first stint at NIH, I worked with the National Institute of Neurological Disorders and Stroke studying the structure and function of neurotransmitter receptors. Before too long, the first of those receptors was cloned and its DNA was sequenced—all using manual methods. Shortly after that, the lab completely shifted its focus, from biochemistry and pharmacology to molecular biology, and we started cloning and sequencing whole families of receptor genes.
Then, in 1987, the first automated DNA sequencer, the ABI 370A, was made commercially available by Applied Biosystems, and our lab got one of the very first units. This greatly accelerated the sequencing process and also started generating lots of data (for that time).
Because I loved to solve puzzles, I was drawn to the task of putting together the short segments of DNA sequences produced by the 370A to form a complete gene. Even though we had some very primitive software tools and computers at the time, I was thrilled that we could go so quickly from a bit of biological material to a digital readout of a complete sequence of a gene. This truly changed my career path and inspired me to move from biochemistry to the relatively young field of bioinformatics.
What do you think has been the single greatest accomplishment in data science over the past 50 years of cancer research?
As far as applying data science to biological research, I believe it started with our ability to understand DNA and RNA sequence data. From those first gene sequences obtained with automated instruments, the field accelerated rapidly. The instruments became faster, more accurate, and were able to work with increasingly smaller amounts of DNA.
The data science methods also changed to keep pace. Soon, we were comparing sequences of entire families of genes, revealing their similarities and changes in function over time. New algorithms were developed, which greatly accelerated the field of evolutionary biology. We began sequencing thousands of small bits of mRNA from dozens of different cell types, giving us the ability to examine the expression patterns of genes in different tissues. From there, we were able to sequence the entire genomes of small bacteria and viruses and, ultimately, the complete human genome.
Each technological advance in our study of biological materials led to a comparable innovation in the software tools required to understand the data we were generating. That parallel advancement continues to keep pace today, with high-performance and cloud computing, as well as software tools and methods to deploy them. Technology is adapting to our ever-changing needs to better manage, analyze, and visualize increasingly larger amounts of biological data.
As NCI embarks on the next 50 years, can you offer any practical tips or advice that should be considered?
My experience in this field shows me there are no limits to the processes and tools that can be developed for tackling new frontiers in science. I think that’s very exciting. And I’m optimistic that there is not a problem too hard to solve and no question that we shouldn’t bother asking. Dedicated people will continue to find new ways of using technology to find solutions.