Cancer Data Science Pulse

Visualizing RNA-seq Data—Pro-Tips From an NCI Bioinformatics Engineer

In this new blog series, we're posting samples and tips on visualizing cancer data. To kick us off, Dr. Alida Palmisano, a bioinformatics engineer in the Biometric Research Program in NCI’s Computational and Systems Biology Branch, Division of Cancer Treatment and Diagnosis, shares her ideas for visualizing complex single-cell RNA-sequencing (scRNA-seq) data.

Our hope is that these images will spark your imagination when showcasing your research data.

Samples of ways to visualize RNA-seq data. Includes (A) A two-dimensional t-SNE plot is one of the standards in the field for visualizing high-dimensional scRNA-seq data. Colored clusters show groups of cells that have similar gene expression patters; (B) Bar plots show how individual samples contribute to different clusters of cells; (C) Coloring the two-dimensional t-SNE with scores from well-characterized gene lists allows researchers to assign cell types to the cells' clusters; and (D) Using techniques, such as dimensionality reduction, clustering, and previously documented gene lists, gives us further insight into the tumor's cell-type composition.
Data from Dong R, et al. Single-Cell Characterization of Malignant Phenotypes and Developmental Trajectories of Adrenal Neuroblastoma. Cancer Cell. 9;38(5):716-733, 2020, PMID: 32946775.

 

What type of graphic is it?  

It’s a composition of various plots for single cell transcriptome sequencing data. scRNA-seq measures the RNA molecules within each cell of a given sample to provide a snapshot of the cells’ transcriptome (i.e., the genes that are being transcribed when the cells are collected).

Pro-TipHigh-dimensional data like scRNA-seq are challenging to show. Using visualization strategies that work together (like a puzzle) can help piece together helpful biological insights.

Why is the graphic important?

With scRNA-seq, we can use a single experiment to capture a moment in time in highly heterogenous tissues. We can see the genes that are being transcribed (i.e., the transcriptome), composing many dynamic biological processes. The high dimensionality of the data (e.g., large numbers of genes and cells, complex biological processes) is a challenge that requires a variety of visualization strategies, which have to work together (like a puzzle) to reveal helpful biological insights.

Pro-TipUse each visualization strategy to generate a hypothesis. Make sure that each “puzzle piece” fits together to fully address your research question.

How did you create it?

I generated the figures with R, using a popular package for single cell data analysis called Seurat. Additional visualization like bar plots were generated using ggplot.

What should I consider when visualizing this kind of data?

Remember that the underlying data are extremely high dimensional. Using a single visualization approach will give you a limited view of the information, which can be extremely biased! Select a visualization strategy that addresses your hypothesis and make sure that each “piece of the puzzle” fits together in a way that leads to useful information. Also, remember that all the tools have many tunable parameters that may greatly impact the way the figures look and ultimately the hypothesis you derive from them.

Pro-Tip: My favorite visualization type is bar charts. Charts can embed a lot of complexity without affecting their overall intuitive interpretation.

Whats your favorite and why?

In general, my favorite visualization type is bar charts because they can embed a lot of complexity without affecting their overall intuitive interpretation. You can use colors, patterns, order, orientation, and much more to convey both simple and complex messages. However, bar charts, as with any other visualization type, have their limitations. Always remember to visualize the same data in several different ways to see which combination of techniques tells the story you want to share. 

Bioinformatics Engineer, Biometric Research Program, Computational and Systems Biology Branch, Division of Cancer Treatment and Diagnosis, NCI
Older Post
Your NCI Guide to Supporting Global Cancer Prevention Research Through Data Science
Newer Post
Your Guide to the 2023 NIH Data Management and Sharing Policy

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

Enjoyed reading this content. Helpful.
We’re glad you found the blog helpful! We have several additional blogs on informatics tools and data, if you’d like to explore more https://datascience.cancer.gov/news-events/blog?blog_category_id=36