Graphic entitled “Overview of ISB-CGC Data Preparation Process.” The image displays a pipeline workflow showing the extraction, transformation, and loading of data through the ISB-CGC: 1. Deploy Custom VMs (Memory, Disk, Network) 2. Write code for data source-specific pipelines 3. Download data via multiple protocols (APIs, HTTPS, SFTP) 4. Cloud Storage/Local VM Disk 5. Convert & standardize file formats (CSV, TSV, XML, JSON) 6. QC & Normalize data (missing values, inconsistent value formats, deduplication)
Overview of ISB-CGC Data Preparation Process, Graphic credit: John Phan, Ph.D., ISB-CGC