Training Guide Library
In a hurry and need quick instructions on the cancer data science lifecycle stages? Browse this list of guides and resources.
Data Generation and Collection
Identify and gather the data you need to address a problem.
- Beginner
- [Article] Generating and Collecting Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- [Training Video Recording] GARDE: Open-Source Platform for Population-based Genetic Testing of Hereditary Cancer Syndromes | Learn how to facilitate population-based genetic testing with the GARDE software platform, which harnesses algorithms and chatbots to identify individuals eligible for genetic testing for hereditary cancer syndromes.
- [Training Video Recording] The Cancer Proteome Atlas | Get an introduction to RPPA (a high-throughput, antibody-based technique for protein profiling); receive an overview of the bioinformatics resource, “The Cancer Proteome Atlas” (or TCPA); and learn more about TCPA’s chatbot, “TCPAplus.”
- [Training Video Recording] Unlocking Insights from Clinical Notes with the EMERSE Text Processing Tool | Use this tool to identify key data within free-text clinical notes from electronic health records systems.
- Advanced
- [Training Video Recording] WebMeV | Discover how to use this intuitive, web-based, bioinformatics analysis toolkit designed for non-bioinformaticians. This walkthrough includes steps on how to upload data files, run a single-cell analysis (using the tools available within the toolkit), and how to navigate/create public data sets available within WebMeV.
Data Cleaning
Fix discrepancies and handle missing values in your data.
- Beginner
- [Article] Cleaning Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
Data Exploration and Analysis
Study your data, then form a hypothesis.
- Beginner
- [Article] Exploring and Analyzing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- [Article] How to Use the Cancer Data Aggregator | Read about this resource and how it can help you in your search for data across NCI's Cancer Research Data Commons.
- [Training Video Recording] WebMeV | Receive a demonstration on how this web-based software for genomic data analysis can upload data and perform various analyses such as normalization, clustering, and principal component analysis.
- [Training Video Recording] XNAT | Learn about this open source imaging informatics software platform that enables data ingestion, curation, annotation, quality control, and computational workflows using Docker containers.
- [Training Video Recording] User-Friendly Analysis of Spatial Transcriptomics with spatialGE | Learn more about this user-friendly web application that integrates the spatial R package. This package, enhanced with additional spatial transcriptomics (ST) analysis methods (such as SpaGCN, STdeconvolve, and InSituType), makes it more valuable for the cancer research community.
- [Training Video Recording] RNAseq Data Analysis in Qlucore | Import and analyze RNA-sequencing data in Qlucore Omics Explorer—software that visualizes the data in 3D plots and can help you identify hidden structures and patterns.
- [Training Video Recording] An Introduction to Bioconductor for Genomic Data Science | Learn the basics of integrative data containers for genome-scale experiments and components of analytic workflows for transcriptomics and epigenetics. You will also discover resources for annotation of genomic data.
- [Training Video Recording] Introduction to FlowJo™ Software | Learn about FlowJo’s workspace—including how to load files, evaluate sample quality, draw gates, and generate tabular and graphical layouts to perform single-cell flow cytometry analysis.
- Advanced
- [Blog] An Introduction to Cloud Computing | Get tips on how to manage platform costs, access, and training. See also how NCI connects researchers to the cloud with the Cancer Research Data Commons.
- [Training Video Recording] An Introduction to Gene Set Enrichment Analysis (GSEA) and the Molecular Signatures Database (MSigDB) | Learn how the GSEA analysis tool operates, how to use MSigDB to compare your data against well annotated gene sets, and how to run GSEA with MSigDB.
- [Training Video Recording] TCIA Jupyter Learning Lab | Explore a variety of use cases for identifying The Cancer Imaging Archive (TCIA) data sets, and learn how to download them using Jupyter Notebooks. You’ll also learn how to utilize TCIA for data exploration and downloading data.
- [Training Video Recording] TumorDecon | Discover how digital cytometry methods and their applications assist in tumor research.
- [Training Video Recording] Decoding Epigenetic Complexity: Modeling Gene Regulation with the Cistrome Data Browser | Discover methods for integrating Cistrome Data Browser data into single cell ATAC-sequencing and RNA-sequencing multimodal analysis.
Predictive Modeling
Use computational tools like machine learning models to make predictions with your data.
- Beginner
- [Article] Predictive Modeling: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
Data Visualization
Communicate your data findings using interactive images, plots, and charts.
- Beginner
- [Article] Visualizing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- [Article] How to Use Circle Plots for Visualizing Multi-Omics Data | Get tips on ‘OmicCircos,’ an R-based program that lets you manage, analyze, and visualize your omics data.
- [Training Video Recording] MATLAB: Now What? Post-Processing AI Techniques for Enhanced Accuracy | Familiarize yourself with post-processing AI techniques.
- [Training Video Recording] The GenePattern Ecosystem for Cancer Genomics and Reproducible Research | Discover an environment for accessible and reproducible research that hosts hundreds of genomics analysis and visualization tools. GenePattern exists in a web-based format that requires no programming, along with extensive features for reproducibility and accessibility.
- Advanced
- [Blog] Visualizing Single-Cell RNA-sequencing Data—Pro-Tips From an NCI Bioinformatics Engineer | Discover how to approach visualization strategies and which chart can be useful in this context.
- [Blog] Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips From NCI Researchers | Discover how to use these plots and why they are valuable.
- [Training Video Recording] Data Visualization with R | Learn how to use the ggplot2 package in the programming language R to graph plots that can form the basis of analysis. Note: This video is one of six videos that make up a course series exclusive to NCI staff and provided by the Bioinformatics Training and Education Program. In this recording, R Studio is accessed via DNAnexus.
- [Training Video Recording] DNASTAR Lasergene Software | Learn about this software and its applications in molecular biology, including topics such as enzyme labels, primer design, cloning processes, construct analysis, and clone verification using Sanger sequencing.
- [Training Video Recording] Next-Generation Clustered Heat Maps (NG-CHMs) | Learn how NG-CHMs can help you navigate large omic databases, zoom in on patterns, access external metadata resources, produce high-resolution graphics, and save metadata for later use. NG-CHMs play a valuable role in NIH projects, encompassing phenotypic and genotypic data at DNA, RNA, protein, and metabolite levels in bulk and single-cell studies.
Data Sharing
Accelerate discovery by making your data available to others.
- Beginner
- [Article] Sharing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- [Article] How to Write a Data Management and Sharing (DMS) Plan | Learn the key elements you need to include in your DMS Plan, as well as tips for making sure your plan adheres to the latest policy.
- Advanced
- [Resources] Metadata Services for Cancer Research | Discover how to approach the Cancer Standards Registry and Repository (caDSR II) infrastructure, which helps investigators create a use data standards for cancer research.
- [Blog] Your Guide to the 2023 NIH Data Management and Sharing Policy | Understand how to properly observe sharing policies if you’re an NCI-funded investigator.
- [Blog] Semantics Primer | Get a snapshot of how semantics help create shared and consistent meanings for data so they can be used/reused and shared with researchers.
- [Blog] A Deep Dive into Common Data Elements (CDEs) | Discover how CDEs help researchers like you define, map, use, and share data more efficiently.
Submit Feedback
Help us help you! If you believe content is missing or needs modifying, please let us know. Leave us a comment below, or send an email to NCI CBIIT.