Cancer Data Science Pulse

How the Mitelman Database Can Help You Explore Genomic Abnormalities

A previous blog discussed how one of NCI’s Cloud Resources, the Institute for Systems Biology Cancer Gateway in the Cloud (ISB-CGC), provides researchers with shortcuts to data analysis by making important clinical, genomic, and proteomic data available in Google BigQuery tables.
ISB-CGC is also home to some standalone databases of genomic importance. Today, we highlight one of these—the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer and the recent addition of genomic coordinates to its user interface. 

Screenshot of the ISB-CGC Cancer Gateway in the Cloud homepage, showing the Mitelman Database as the third panel in the Data Browser Section, "Chromosomal Aberrations & Gene Fusions DB." Full header text reads, "A Resource of the NCI Cancer Research Data Commons. ISB-CGC Cancer Gateway in the Cloud. Access, Explore and Analyze Large-Scale Cancer Data Through the Google Cloud." Data Browsers. First panel text reads, "BigQuery Table Search." Icon of calendar. "Browse BigQuery tables of metadata and molecular cancer data from the Genomic Data Commons and other sources. Jump directly to a table to perform discovery and computation via SQL." Icon of lightbulb. "Learn." Icon of rocket ship. "Launch." Second panel text reads, "Cancer Data File Browser." Icon of funnel. "Explore a comprehensive selection of cancer related data files in Google Cloud Storage Buckets, such as raw sequencing, cancer nucleotide variation, pathology or radiology images." Icon of lightbulb. "Learn." Icon of rocket ship. "Launch." Third panel text reads, "Chromosomal Aberrations & Gene Fusions DB." Icon of DNA hilux. "Browse the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer which relates cytogenetic changes, in particular gene fusions, to tumor characteristics. Icon of lightbulb. "Learn." Icon of rocket ship. "Launch." Fourth and final panel text reads, "The TP53 Database." Icon of a gene specimen. "Explore the TP53 Database that compiles various types of data and information from the literature and generalist databases on human TP53 gene variations related to cancer." Icon of lightbulb. "Learn." Icon of rocket ship. "Launch."
The Mitelman Database can be accessed through the ISB-CGC homepage.

The Mitelman Database: A Goldmine of Cytogenetic Data Linked to Cancer

Cytogenetic analysis is the process of examining chromosomes, especially to look for abnormalities such as missing, extra, broken, or rearranged chromosomes. When applied to tumors, cytogenetic analysis can provide crucial information about the genetic mechanisms of cancer.  
The Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer catalogs over 70,000 such acquired chromosome aberrations for multiple types of cancer. The database relates cytogenetic aberrations and their genomic consequences, in particular gene fusions, to tumor characteristics. The information has served the research community for decades, but only recently has lived online:

  • 1983: Began as a book
  • 2000: NCI made the information available online
  • 2019: ISB-CGC began hosting and supporting the database on its cloud platform 

Dr. Felix Mitelman, in collaboration with Drs. Bertil Johansson and Fredrik Mertens, manually culled all the data from the literature. NCI, the Swedish Cancer Society, and the Swedish Childhood Cancer Foundation support the Mitelman Database. These organizations update the database quarterly in January, April, July, and October.
You can query the database by parameters such as topography, morphology, gene characteristics, cytogenetic aberrations, and journal references. 

Dashboard showing the search function for finding cases of cytogenetics. Menu shows search options, including Cases Cytogenetics, Gene Functions, Clinical Associations, Recurrent Chromosome Alterantions, References, User Guide, About, Contact. Advanced search options include Sole abnormality, Breakpoint, Topography, Morphology, and Special Morphology.
The Mitelman Database Cases Cytogenetics Searcher

Adding Genomic Coordinates Increases Data Collaboration Opportunities

Until recently, the database only displayed the resulting genetic location information in karyotypes ; this tells where a gene is by referencing its physical location on a band of the arm of a human chromosome. 
Nowadays, much genomic research data describes gene locations in molecular terms instead of physical location; that is, by using precise nucleotide start and stop positions on the chromosome. Though the karyotype data in the Mitelman Database has been very useful, mapping it to molecular genomic coordinates increases the range of research data that it can be combined with to make significant scientific discoveries. 
As of June 2022, the Mitelman Database also displays genomic coordinates on the user interface. Thanks to procedures incorporated from the web-based tool CytoConverter, you can view the genomic coordinates translated from the chromosomal imbalances identified in the karyotype nomenclature.
You have the option of viewing the genomic coordinate information for either individual karyotypes or for multiple karyotypes in a search result providing the following information:

  • Corresponding chromosome
  • Start and end position
  • Type of imbalance (i.e., gain or loss). For multiple chromosomes, net imbalances across the selected group are displayed in chart, ideogram, or tabular format.
Screenshot of the overall chromosomal imbalances for chromosome 1. The view is broken into three tabs showing charts, ideograms, and data. Two charts on the left side plot the frequency of 1 and 1> extra copies on different positions. The chart on the right plots the frequency of Loss of 1 copy and Homozygous deletions on different positions.
An example of the Mitelman Database View Overall Chromosomal Imbalances screen. The abnormalities of the chromosomes and their genomic coordinates have been calculated by CytoConverter.

Mitelman Data on the Cloud

Like the rest of the data that ISB-CGC hosts, the Mitelman data, including the CytoConverter-generated genomic coordinates, are also publicly available in Google BigQuery, a cloud-based data warehouse formatted in data matrices. This allows researchers to analyze the data using Structured Query Language (SQL) and tools such as Python and R and to combine the data with other data sets. The ISB-CGC team has provided examples on Github, which researchers can use as templates for their own data exploration.

Deena Bleich
Bioinformatician, Institute for Systems Biology Cancer Gateway in the Cloud
Older Post
FireCloud: A Secure Platform For Data Analysis Powered by Terra
Newer Post
Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips From NCI Researchers

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.


Enter the characters shown in the image.