News
New Notebook Demonstrates Machine Learning in Google BigQuery Using Updated Mitelman Database
A new Mitelman Database Jupyter notebook can help you combine information from multiple databases to run machine learning (ML) experiments using Google BigQuery. In this “Mitelman Gene Fusions in TCGA” notebook, the ISB Cancer Gateway in the Cloud (ISB-CGC) team conducted a query to identify the most common gene fusions in prostate adenocarcinoma.
Researchers used the list of gene fusions to analyze the gene expression data from similar adenocarcinoma cases available in The Cancer Genome Atlas. Leveraging the built-in ML capabilities of BigQuery, the team constructed a random forest classifier using the gene expression data to predict the Primary Gleason Grade of individual cases. This classifier achieved an accuracy of 62% and showcased the potential and ease of integrating information from multiple databases for ML experiments in BigQuery.
In addition to this notebook, check out the latest data update to the “Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer,” released on April 15, 2024. This database, supported in part by NCI and the ISB-CGC Cloud Resource, provides information on cytogenetic changes and their genomic consequences, particularly gene fusions, in relation to tumor characteristics.
The ISB-CGC stores the Mitelman data in Google BigQuery tables, allowing you to access and analyze it using a programmatic data science and ML approach. The ISB-CGC team created examples of how to extract and analyze this backend data via Jupyter notebooks written in Python.