Cancer Data Science Pulse

Unraveling the Complexity of Cancer Using New Technologies and Algorithms

The March 23 webinar has passed, but a recording is now available on the event page.

On March 23, Dr. Ben Raphael will present the next Data Science Seminar, “Quantifying tumor heterogeneity using single-cell and spatial sequencing.” In this blog, Dr. Raphael describes how he’s using this technology to dig deeper into the complexity of cancer.

You’ll be discussing the topic, “Quantifying tumor heterogeneity using single-cell and spatial sequencing,” in the upcoming webinar. Can you tell us what first interested you in this topic?

Most cancers, particularly sporadic cancers that emerge unexpectedly during a lifetime, are caused by mutations that appear in the genome after an individual is born. These somatic mutations occur spontaneously as cells copy their genomes during cell division, but the rate and type of mutations can also be influenced by environmental exposures, such as cigarette smoke or sunlight. 

We’ve known for decades that cancer genomes contain numerous somatic mutations. They vary in scale from single-nucleotide mutations to large genome rearrangements. Some of these mutations are now routinely measured in cancer patients and used to guide treatment decisions.

Now, thanks to rapid advances in DNA and RNA sequencing technologies over the past 20 years, we’re able to dig even deeper into the mutational complexity of cancer genomes. For example, early analyses identified somatic mutations that were shared by all cancerous cells in a tumor. Today, we can identify how a tumor is a mixture of cells with different collections of somatic mutations. Some of these small populations of cells may contain mutations that lead a tumor to evade treatment. 

My interest in this topic began nearly 20 years ago when I was just starting as a postdoctoral fellow. My postdoc advisor, Dr. Pavel Pevzner, received an email from a trio of cancer researchers (Colin Collins, Stas Volik, and Joe Gray) asking if our team could help identify genome rearrangements in some new cancer sequencing data they had generated. During our next group meeting, Dr. Pavel asked if anyone was interested in looking at the data. At this time the Human Genome Project had just completed a draft human genome, so it was quite exciting to hear about cancer sequencing data. I volunteered to help, thinking this would be a one-off project. I had no idea that this collaboration would launch my career in computational cancer biology and cancer genomics.

Who should attend the webinar? What can they expect to learn from this hour with you?

I hope that anyone who is interested in computational biology/bioinformatics, cancer genomes, cancer evolution, and new sequencing technologies will enjoy and learn something new from the talk.

One of the themes I’ll touch on is how computation and new algorithms can help overcome some of the limitations of DNA/RNA measurement technologies. I’ll include data and results from new technologies, including 10X Genomics’ single-cell DNA sequencing and Visium spatial transcriptomics platforms. Single-cell technologies allow researchers to measure molecular changes—including somatic mutations—in thousands of individual cells from a tumor. Spatial transcriptomics technologies measure the expression of genes at many locations in a tumor, recording the coordinates of each measurement in the sample.

I couldn’t imagine having these types of data when I started working in the field nearly 20 years ago. At that time, we were delighted to have a single measurement that contained an aggregate signal from all the cells in the tumor. An aggregate signal is useful to define the “average” cell in a tumor but doesn’t tell us much about the rare population of cells within that tumor. Those rare cells might be the reason therapy isn’t successful or the cancer continues to spread.

With the advances in technologies, we can now address some of these limitations, but many questions remain. Taking drug resistance as an example. Is it possible to identify populations of cells within a tumor that are resistant to specific drugs? If so, what are the mechanisms of resistance (e.g., mutations in the genome or some other molecular changes, such as altered expression of some genes)? Because tumors evolve independently in each individual, answering these questions requires that we apply single-cell and spatial technologies to many tumors so that we can find molecular changes that are shared across tumors from many individuals. 

Register for the March 23 Data Science Seminar, “Quantifying tumor heterogeneity using single-cell and spatial sequencing.”

Were there other technological challenges that you had to overcome?

Single-cell and spatial sequencing technologies are remarkable innovations that allow us to measure the molecular diversity within a tumor with much greater precision than was available with earlier technologies, which measured a bulk tumor sample in aggregate.

For example, the older technologies might identify two somatic mutations, A and B, present in approximately 40% of the cancer cells in the tumor. But we wouldn’t know whether mutations A and B were present in the same cancer cells or in different cancer cells within the tumor. Our ability to distinguish these two possibilities helps in understanding these mutations. For example, if A and B always occur in the same cell, it might indicate that they work together to perform a function. On the other hand, if A and B are always in different cells, this might mean that they operate independently to perform a function, or they may even work against each other.

Still, these new technologies have their own limitations—most prominently, the scarcity of data. Technical and financial limitations often mean that the amount of sequencing data obtained for each cell is very small. Continuing the example above, it might be the case that mutations A and B are always present together in cells. However, in sparse sequencing data, we might fail to measure A in some cells and fail to measure B in other cells. Thus, new analysis tools are needed to maximize the power of these new sequencing technologies. The computational biology and bioinformatics community has been very active in developing new analysis tools in tandem with the new technologies. This combination of new measurement technologies and new analysis methodologies has been propelling the field forward. 

Were there any surprises that you encountered in your work on this topic?

We encountered two amazing surprises. First, we didn’t expect to find so much genomic diversity in the breast tumor samples that we analyzed with single-cell DNA sequencing data. For example, we were able to observe combinations of mutations that were present in <1% of the cells in the tumor. Such combinations of mutations would never have been identified without single-cell sequencing.   

We also were surprised by how much heterogeneity was missed using the existing analysis tools for this type of single-cell data. Specifically, the human genome is diploid, with two copies of each chromosome: one inherited from our mother and one from our father. Somatic mutations can occur on either the maternal or paternal chromosome. We found it was important to track whether somatic mutations (specifically gains and losses of segments of the genome) occurred on the maternal or paternal chromosome. Existing tools didn’t consider the differences between mutations appearing on maternal vs. paternal chromosomes, and as a result, these tools gave an incomplete and inaccurate view of the diversity within the tumor. 

Where do you see this topic headed in the next 5–10 years? What is your hope for this technology/discovery/methodology?

I expect rapid progress in the field in the coming years. Single-cell and spatial sequencing technologies will advance in scope, measuring more types of molecular changes from more cells with higher spatial resolution. In addition, technologies for measuring tumor material that’s circulating in blood (e.g., circulating tumor DNA) are also rapidly improving. The combination of these technologies will mean earlier, non-invasive detection of many cancers; better understanding of the vulnerabilities of cancers once they develop; and continuous non-invasive monitoring of cancers, before, during, and after treatment.

Ben Raphael, Ph.D.
Professor of Computer Science, Princeton University
Older Post
An Introduction to Cloud Computing for Cancer Research
Newer Post
Using Bioinformatics to Solve the Neoantigen Puzzle

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.