Cancer Data Science Pulse
Founding Director of Genomic Center Shares His Story “Chasing the Technology” to Advance Cancer Research
At the upcoming Data Science Seminar Series, you’ll be discussing your study comparing a variety of bioinformatics methods on single-cell RNA-seq (scRNA-seq) data. Those data were generated from four popular technologies using reference samples (a cancer cell line vs. a match normal B-lymphocyte line). Can you give some background on the topic and how it relates to cancer research?
RNA sequencing gives us a way to look at genome-wide changes in gene expression (collectively called the transcriptome), which can be linked to cancer’s development, progression, and treatment. It’s restricted, however, because it tends to involve bulk cell analysis of pooled data from highly diverse (heterogeneous) mixtures of cancer cells.
Sampling at a single-cell level gives us a much better way to identify and characterize very distinct subpopulations and states of cancer cells. This higher resolution allows us to look deeper into the tumor itself, enabling us to better project clinical outcomes, inform treatment approaches, and identify new ways of treating individual cancers.
The challenge with scRNA-seq is that it has developed so rapidly and has many different platforms and a variety of bioinformatics algorithms. It wasn’t clear how reproducible these different scRNA-seq technologies would be. Standard reference materials also were lacking in the single-cell community, including reference scRNA-seq data and bioinformatics methods. Without such standards, it’s difficult to reproduce and share the results obtained from different labs.
I’ll be reporting on findings from a large multicenter study benchmarking scRNA-seq technologies, recently published in Nature Biotechnology. We compared some of the most popular bioinformatics methods to show how and in what circumstances they work best. Our study provides guidance for the field by establishing well-characterized reference samples, 20 openly available scRNA-seq data sets, and reference methods. We discovered that scRNA-seq data characteristics (e.g., sample composition and cellular heterogeneity, platform, etc.) were critical in determining the optimal bioinformatic methods.
Who should attend the webinar?
Anyone working with diverse cell types and studying single-cell technologies, including cancer researchers. Our findings can help scientists identify which of the best-ranked bioinformatics methods will be most useful in reducing the variability typically encountered because of technical factors or sampling ambiguity. Our findings offer guidance for choosing an appropriate scRNA-seq platform and software tool for a scRNA-seq study. Using these guidelines, scientists can select the workflow that will yield the most meaningful results.
You’ve published numerous articles on technological advances in microarray and RNA-seq. And you’ve studied a broad range of research topics, such as cancer, heart disease, aging, and longevity. How did your background help prepare you for this work?
I had a very humble start. I was born Changhong Wang to a poor family in the Chibi, Hubei province of China. My parents were farmers.
Thanks to Deng’s reforms, education gave me a way out. I was accepted into a top-notch medical school in China. I received my medical degree, but I really liked mathematics, especially biostatistics. That led me to choose preventive medicine as my major, in addition to pursuing a master’s in public health in environmental epidemiology.
I was lucky to be offered a visiting scholar position with the Department of Energy’s Argonne National Lab, where I continued my research specializing in environmental health. This experience really opened my eyes to the science. Molecular biological technologies were developing rapidly. I wanted to learn more, so I decided to pursue a doctoral degree in the United States. I was accepted by the University of Washington (UW) to study environmental toxicology—a topic that blended my interest in biostatics with the environment and public health. As luck would have it, UW was one of the few places offering classes in bioinformatics at that time. Those classes gave me skills that were critical to my career development.
While at UW, just as I was completing my Ph.D. training, gene chip technology was invented. I was fascinated by the idea of doing biology on a chip. A chance meeting at a conference led to a position at the U.S. Food and Drug Administration’s (FDA’s) National Center for Toxicological Research, which had a gene chip project aimed at developing a chip for cancer research. I basically jumped into the gene chip field right out of school.
My passions for the biotechnologies are what led me to become deeply involved in the Microarray Quality Control (MAQC) consortium studies. Other positions brought me across the country, from the Department of Defense’s Air Force Research Laboratory to the Cedars-Sinai Medical Center and David Geffen School of Medicine at UCLA, then to City of Hope, and, ultimately, to Loma Linda University (LLU). There, I had the unique opportunity to be the founding director of the Center for Genomics.
You mention you helped to get the Center for Genomics at LLU off the ground. LLU isn’t a name generally connected to some of the larger bioinformatics efforts in this country. Was this a challenge?
Yes, it was. The Center for Genomics was established in 2014, and I was recruited to set it up. I had experience directing similar efforts at my previous positions. LLU gave me the chance to apply that knowledge to a completely new initiative, essentially building a state-of-the-art center from the ground up. This meant recruiting and training staff, purchasing equipment, and establishing all the components—from genomics to bioinformatics—needed to run a center.
At the same time, I was working to get our new center recognized, both nationally and internationally. Our center was selected as one of the official testing sites for the Sequencing Quality Control Phase 2 (SEQC2), the fourth stage of the FDA’s MAQC studies. I was intrigued by the fast-growing single-cell sequencing technologies, so we pushed the envelope and initiated the multicenter study, where we quickly found our niche. We were able to make a name for ourselves among the larger genomics and bioinformatics labs.
Looking back, it seems like we did the impossible. Getting the machines ordered and set up, and then, getting the staff up to speed was an incredible effort. Honestly, if asked now to do this again, I probably wouldn’t have the drive to do it. But you never know . . . .
Were there any surprises (things you didn’t expect) that you encountered in your work on this topic?
I think the most surprising thing is how fast this area of research has advanced. I sometimes feel as if I’ve spent decades just chasing the technology. In 2004, microarray technology started to gain popularity, but it wasn’t being used consistently. It was hard to keep up. That’s been a similarity across many of the technologies I’ve worked with over the years. That is, the methods advance at such a rapid speed, and it becomes very difficult to know what method is best, and in what instance.
As researchers, we strive for quality, but without strong, reproducible references for comparison, it’s not possible to have control over that quality. Having a benchmark is key. Our consortium studies offer some guidelines to help with that.
Where do you see this field headed in the next 5–10 years? What is your hope for this technology?
I guess I’ll continue chasing the technology. I’m excited to see where it might take us as we work to decode the mysteries of life and health. Thanks to the advances in microfluidic techniques and the progress in computational power, single-cell technologies will continue to advance. There’s so much more to learn from a single cell that will ultimately help solve the puzzle of life, given that life starts from a single cell. It is foreseeable that massive amounts of data from genomics, epigenomics, proteomics, transcriptomics, and metabolomics will be generated simultaneously from a single cell. By that time, we’ll understand Omics data much better, especially with the advent of new and more powerful computing and bioinformatics tools.
Where can people go for more information?