Visualizing Data: The Basics
What is Data Visualization?
Data visualization is the representation of data in the form of elements such as charts, pictures, networks, etc.
Whether you’re working with data that you’ve collected yourself (which you want to present in a way that makes sense) or you want to explore existing data to form new hypotheses, data visualization can help!
At this stage in the data science lifecycle, you choose from a variety of available tools to gain a deeper understanding of the data you’re working with and clearly communicate insights with others.
Why is Data Visualization Important for Cancer Research?
Effectively analyzing data and sharing research results is essential to advancing cancer research. By embracing data visualization, you and many other individuals across the cancer research and care continuum can better analyze and understand diverse data.
Data visualization can:
- clarify complex or large data.
- generate broader interest from the research community.
- improve analysis.
- tell a story.
- identify relationships between data.
- reveal trends.
- communicate results.
- enable insights.
- reveal outliers.
What Do I Need to Know?
Data Visualization Concepts
You’ll want to know the basics of some of the most popular charts and how they’re used in data visualization for cancer research!
Example Graphic | Definition | Example of Use in Cancer Research Data Visualization |
---|---|---|
Bar | Bar Chart: Uses horizontal or vertical bars to show discrete, numerical comparisons across categories. One axis of the chart shows the categories, and the other is a discrete value scale. | The bar chart is a frequently used chart type. Bar charts are good for use with genomics data where you want to know if the expression of a gene is up, down, etc. |
Gantt | Gantt Chart: Displays a list of activities or tasks with their duration over time for organizational purposes. | Gantt charts are useful when presenting timelines in a grant proposal or funding request. |
Heat | Heat Map: Visualizes data through variation in coloring applied to a tabular format. | Heat maps are good for showing value across multiple variables to reveal patterns. This graphic is common for visualizing genomics data. |
Histogram Diagram | Histogram: Visualizes the distribution of data categories within a continuous interval. | A histogram may be useful to compare age range data for your cancer research (e.g., adults 18–25, adults 26–35, etc.). |
Network Diagram | Network Diagram: Shows how things are interconnected by linking nodes of data with lines to represent their connections. | A network diagram can help analyze relationships between cancer occurrences in various communities. |
Pie | Pie Chart: Breaks a circle into segments to illustrate proportions and percentages between categories. | Pie charts can be helpful for showing population data. |
Scatter Plot | Scatterplot: Places points on a Cartesian Coordinates system to show the relationship between two sets of data. | The scatterplot is a frequently used chart type. Once you make a scatterplot, you can draw a curve through the datapoints using a mathematical formula. You might use this chart for dose response curves. |
Fundamental Tips for Effective Data Visualization
- Be clear about your purpose. What do you want to learn about the data? How will you use visualization to explore or explain data? Does the visual clearly depict your message?
- Know what resources are available. There are many platforms and/or tools you can use to perform your visualization, such as the tools available in the NCI Cancer Research Data Commons (CRDC).
- Selecting the right tool is key. The graph, chart, or image should clearly convey the results to the audience.
- Use more than one visualization approach to fully address your research question, when it makes sense. Make sure all the pieces fit together to give you useful information.
- Be mindful when preparing your data. Prepare the data according to the tool you’ve selected and the type of visual you’re creating.
- Keep it straightforward. Don’t overcomplicate the data or the visual.
NCI Data Visualization Resources and Initiatives
Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the data science lifecycle.
Recurring Events
- DataViz + Cancer: Supported by Cancer Moonshot℠, this event series explores the intersection of data visualization and cancer research. Check out past event recordings, as well as upcoming micro-labs.
- NCI Emerging Technologies Seminar Series: Discover novel technologies supported by NCI awards that seek to transform cancer research and clinical care. These seminars may spotlight additional tools that can be used in visualization efforts.
Resources and Tools
- NCI’s CRDC: This infrastructure allows you access to a comprehensive collection of cancer research data, as well as visualization tools to analyze the data within many of the data portals or by accessing the data through cloud resources. Within CRDC, researchers can access various types of data and relevant visualization tools (look specifically at the Genomic, Imaging, Integrated Canine, and Proteomic Data Commons).
- 3DVizSNP: Use this tool to visually evaluate large numbers of missense mutations in three-dimensional structural context. The tool enables rapid screening of mutations taken from a variant caller format file using the iCn3D protein structure and sequence viewing platform.
- GDC Data Analysis, Visualization, and Exploration (DAVE) tools: Use this web interface for analyzing genomic data in NCI’s Genomic Data Commons. You can utilize specialized graphs to help visualize genomic signatures of cancer and identify potential drivers of disease as well as create custom cohorts for analysis.
- ProTrack: The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) built this user-friendly web application using CPTAC clear cell renal cell carcinoma. In this interactive visualization heatmap, you can view multi-omics data.
- Minerva: This is a light-weight, narrative image browser for multiplexed tissue images.
- UpSetR: This is an R package to generate UpSetR plots. This technique visualizes set intersections in a matrix layout.
- UCSC Xena: Use this online exploration tool for public and private multi-omic and clinical/phenotype data.
Blogs
- Visualizing Genetic Mutations in Three-Dimensions—Pro-Tips from a Structural Biology Perspective: Learn how the 3DVizSNP tool can help you visualize genetic data in a three-dimensional format.
- Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips from NCI Researchers: Drs. Arashdeep Singh and Sridhar Hannehalli explain what circular heatmaps and biplots are and how to use them.
- Visualizing RNA-seq Data—Pro-Tips from an NCI Bioinformatics Engineer: NCI’s Dr. Alida Palmisano shares her ideas and examples for visualizing complex single-cell RNA-sequencing data.
- For the Love of…Data! Drs. Kibbe and Almeida Discuss How Data Help Reveal Our Natural World: Two researchers share why they’re excited about data for cancer research, including advancements in visualization.
Publications
- GenomicSuperSignature Facilitates Interpretation of RNA-seq Experiments Through Robust, Efficient Comparison to Public Databases. Nature Communications, 2022. | Explore this method for interpreting new transcriptomic data sets through comparison to public data sets without high-performance computing requirements.
- Longitudinal Collection of Patient-Reported Outcomes and Activity Data during CAR-T Therapy: Feasibility, Acceptability, and Data Visualization. Cancers, 2022. | See an example of how researchers integrated data visualization into their study.
Additional Data Visualization Resources
Keep an eye on the NIH Library Course Catalog for upcoming courses on data visualization. Here, we’ve highlighted some of the classes in which you may be interested.
- Principles of Effective Data Visualization: This class will provide an overview of how to construct data visualizations and how to create visualizations that are appealing and informative.
- NGS Visualization Tool: By the end of this training, you’ll be able to format data for OmicCircos in R Shiny to construct circular plots with biological features.
- Ready to start your project? Get an overview of the data science lifecycle and what you should do in each stage.
- Want to learn the basic skills for cancer data science? Check out our basics skills video course.
- Need answers to data science questions? Visit our Training Guide Library.