Visualizing Data: The Basics

Visualizing Data: The Basics

What is Data Visualization?

Data visualization is the representation of data in the form of elements such as charts, pictures, networks, etc.

Whether you’re working with data that you’ve collected yourself (which you want to present in a way that makes sense) or you want to explore existing data to form new hypotheses, data visualization can help! 

At this stage in the data science lifecycle, you choose from a variety of available tools to gain a deeper understanding of the data you’re working with and clearly communicate insights with others.

Why is Data Visualization Important for Cancer Research? 

Effectively analyzing data and sharing research results is essential to advancing cancer research. By embracing data visualization, you and many other individuals across the cancer research and care continuum can better analyze and understand diverse data.

Data visualization can:

  • clarify complex or large data.
  • generate broader interest from the research community.
  • improve analysis.
  • tell a story.
  • identify relationships between data.
  • reveal trends.
  • communicate results. 
  • enable insights. 
  • reveal outliers. 

What Do I Need to Know?

Data Visualization Concepts

You’ll want to know the basics of some of the most popular charts and how they’re used in data visualization for cancer research!

Example GraphicDefinitionExample of Use in Cancer Research Data Visualization

Bar
Chart

Bar chart icon outlined in blue and showing 5 bars. A single bar is outlined in orange to indicate a data point of interest.
Bar Chart: Uses horizontal or vertical bars to show discrete, numerical comparisons across categories. One axis of the chart shows the categories, and the other is a discrete value scale. The bar chart is a frequently used chart type. Bar charts are good for use with genomics data where you want to know if the expression of a gene is up, down, etc. 

Gantt
Chart

Gannt chart icon outlined in alternating blue and orange boxes showing different durations that overlap or start and end with each other.
Gantt Chart: Displays a list of activities or tasks with their duration over time for organizational purposes. Gantt charts are useful when presenting timelines in a grant proposal or funding request. 

Heat
Map

Heatmap icon showing a 4X4 grid of outlined boxes. 7 boxes are outlined in orange indicating an area of interest on the chart.
Heat Map: Visualizes data through variation in coloring applied to a tabular format. Heat maps are good for showing value across multiple variables to reveal patterns. This graphic is common for visualizing genomics data. 

Histogram Diagram

History chart icon showing an orange outline of an area on a graph. Scatter plot chart icon with multiple clustered blue circles. An orange trend line intersects the cluster identifying a possible pattern to the data.
Histogram: Visualizes the distribution of data categories within a continuous interval.  A histogram may be useful to compare age range data for your cancer research (e.g., adults 18–25, adults 26–35, etc.).

Network Diagram

Network chart icon showing a 3-level hierarchy of blue outlined boxes connected by orange lines. The hierarchy shows a single box branching to a second layer of 2 boxes that connect to a third layer of 3 boxes.
Network Diagram: Shows how things are interconnected by linking nodes of data with lines to represent their connections.A network diagram can help analyze relationships between cancer occurrences in various communities.

Pie
Chart

Pie chart icon outlined in blue with a orange outline of a wedge indicating a section of interest on the chart.
Pie Chart: Breaks a circle into segments to illustrate proportions and percentages between categories.Pie charts can be helpful for showing population data. 

Scatter Plot

Scatter plot chart icon with multiple clustered blue circles. An orange trend line intersects the cluster identifying a possible pattern to the data.
Scatterplot: Places points on a Cartesian Coordinates system to show the relationship between two sets of data. The scatterplot is a frequently used chart type. Once you make a scatterplot, you can draw a curve through the datapoints using a mathematical formula. You might use this chart for dose response curves.

Fundamental Tips for Effective Data Visualization

  • Be clear about your purpose. What do you want to learn about the data? How will you use visualization to explore or explain data? Does the visual clearly depict your message?
  • Know what resources are available. There are many platforms and/or tools you can use to perform your visualization, such as the tools available in the NCI Cancer Research Data Commons (CRDC). 
  • Selecting the right tool is key. The graph, chart, or image should clearly convey the results to the audience.
  • Use more than one visualization approach to fully address your research question, when it makes sense. Make sure all the pieces fit together to give you useful information.
  • Be mindful when preparing your data. Prepare the data according to the tool you’ve selected and the type of visual you’re creating. 
  • Keep it straightforward. Don’t overcomplicate the data or the visual.

NCI Data Visualization Resources and Initiatives

Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the data science lifecycle.

Recurring Events

  • DataViz + Cancer: Supported by Cancer Moonshot℠, this event series explores the intersection of data visualization and cancer research. Check out past event recordings, as well as upcoming micro-labs.
  • NCI Emerging Technologies Seminar Series: Discover novel technologies supported by NCI awards that seek to transform cancer research and clinical care. These seminars may spotlight additional tools that can be used in visualization efforts. 

Resources and Tools

  • NCI’s CRDC: This infrastructure allows you access to a comprehensive collection of cancer research data, as well as visualization tools to analyze the data within many of the data portals or by accessing the data through cloud resources. Within CRDC, researchers can access various types of data and relevant visualization tools (look specifically at the Genomic, Imaging, Integrated Canine, and Proteomic Data Commons). 
  • 3DVizSNP: Use this tool to visually evaluate large numbers of missense mutations in three-dimensional structural context. The tool enables rapid screening of mutations taken from a variant caller format file using the iCn3D protein structure and sequence viewing platform. 
  • GDC Data Analysis, Visualization, and Exploration (DAVE) tools: Use this web interface for analyzing genomic data in NCI’s Genomic Data Commons. You can utilize specialized graphs to help visualize genomic signatures of cancer and identify potential drivers of disease as well as create custom cohorts for analysis.
  • ProTrack: The NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) built this user-friendly web application using CPTAC clear cell renal cell carcinoma. In this interactive visualization heatmap, you can view multi-omics data.
  • Minerva: This is a light-weight, narrative image browser for multiplexed tissue images. 
  • UpSetR: This is an R package to generate UpSetR plots. This technique visualizes set intersections in a matrix layout. 
  • UCSC Xena: Use this online exploration tool for public and private multi-omic and clinical/phenotype data. 

Blogs

Publications

Additional Data Visualization Resources 

Keep an eye on the NIH Library Course Catalog for upcoming courses on data visualization. Here, we’ve highlighted some of the classes in which you may be interested.

  • Principles of Effective Data Visualization: This class will provide an overview of how to construct data visualizations and how to create visualizations that are appealing and informative.  
  • NGS Visualization Tool: By the end of this training, you’ll be able to format data for OmicCircos in R Shiny to construct circular plots with biological features.

 

Updated:
Return to the previous stage
Predictive Modeling
Continue to the next stage
Sharing Data
Vote below about this page’s helpfulness.