Training Guide Library
In a hurry and need quick instructions on the cancer data science lifecycle stages? Browse this list of guides and resources.
Data Generation and Collection
Identify and gather the data you need to address a problem.
- Beginner
- [Article] Generating and Collecting Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- Advanced
- [Training Video Recording] WebMeV | Discover how to use this intuitive, web-based, bioinformatics analysis toolkit designed for non-bioinformaticians. This walkthrough includes steps on how to upload data files, run a single-cell analysis (using the tools available within the toolkit), and how to navigate/create public data sets available within MebMeV.
Data Cleaning
Fix discrepancies and handle missing values in your data.
- Beginner
- [Article] Cleaning Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
Data Exploration and Analysis
Study your data, then form a hypothesis.
- Beginner
- [Article] Exploring and Analyzing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- Advanced
- [Blog] An Introduction to Cloud Computing | Get tips on how to manage platform costs, access, and training. See also how NCI connects researchers to the cloud with the Cancer Research Data Commons.
- [Training Video Recording] An Introduction to Gene Set Enrichment Analysis (GSEA) and the Molecular Signatures Database (MSigDB) | Learn how the GSEA analysis tool operates, how to use MSigDB to compare your data against well annotated gene sets, and how to run GSEA with MSigDB.
Predictive Modeling
Use computational tools like machine learning models to make predictions with your data.
- Beginner
- [Article] Predictive Modeling: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
Data Visualization
Communicate your data findings using interactive images, plots, and charts.
- Beginner
- [Article] Visualizing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- Advanced
- [Blog] Visualizing Single-Cell RNA-sequencing Data—Pro-Tips From an NCI Bioinformatics Engineer | Discover how to approach visualization strategies and which chart can be useful in this context.
- [Blog] Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips From NCI Researchers | Discover how to use these plots and why they are valuable.
- [Training Video Recording] Data Visualization with R | Learn how to use the ggplot2 package in the programming language R to graph plots that can form the basis of analysis. Note: This video is one of six videos that make up a course series exclusive to NCI staff and provided by the Bioinformatics Training and Education Program. In this recording, R Studio is accessed via DNAnexus.
Data Sharing
Accelerate discovery by making your data available to others.
- Beginner
- [Article] Sharing Data: The Basics | Get the fundamentals on what it is, why it matters, and how you can do it effectively.
- Advanced
- [Resources] Metadata Services for Cancer Research | Discover how to approach the Cancer Standards Registry and Repository (caDSR II) infrastructure, which helps investigators create a use data standards for cancer research.
- [Blog] Your Guide to the 2023 NIH Data Management and Sharing Policy | Understand how to properly observe sharing policies if you’re an NCI-funded investigator.
- [Blog] Semantics Primer | Get a snapshot of how semantics help create shared and consistent meanings for data so they can be used/reused and shared with researchers.
- [Blog] A Deep Dive into Common Data Elements (CDEs) | Discover how CDEs help researchers like you define, map, use, and share data more efficiently.
Submit Feedback
Help us help you! If you believe content is missing or needs modifying, please let us know. Leave us a comment below, or send an email to NCI CBIIT.