Cancer Data Science Pulse
Performing a CIViC Duty—A Community-Driven Resource for Interpreting Data on Cancer Variants
Perhaps you’re working with a new genetic variant and want to see which drugs are proving to be most effective and in what circumstance. Or maybe you’re seeking information on how a variant can be used in diagnosis or in predicting outcomes.
Staying atop of mounds of cancer research findings is an ongoing challenge. Interpreting those results within the context of the larger body of research findings is paramount for advancing scientific knowledge.
NCI’s Informatics Technology for Cancer Research supports tools to help address these challenges. One of these tools, CIViC, or “Clinical Interpretation of Variants in Cancer,” is a fully accessible and free resource on cancer-related genetic variants. CIViC offers an open-source knowledgebase and web interface to help researchers connect to the latest published findings on a full range of variant interpretations. You can view an introductory video on CIViC and learn more by visiting the website.
Below, Dr. Obi Griffith of Washington University School of Medicine and co-investigator of the CIViC tool gives a brief summary of how this unique resource works and what it means for cancer research.
Can you briefly describe what CIViC is and what it does?
The CIViC application has two primary functions. First, it serves as a dynamic interactive website where expert curators, moderators, and editors capture evidence from the biomedical literature on the clinical relevance of a wide range of cancer variants. These CIViC experts review individual papers to find both structured and unstructured data. They curate and document the findings on individual cancer variants (e.g., are they predictive, prognostic, diagnostic, predisposing, etc.). Next, the findings are synthesized into gene summaries, variant summaries, and variant assertions. Underlying evidence represents the “bottom line” findings. Evidence is rated according to type (e.g., preclinical, clinical) as well as strength (with a single star representing weaker evidence and five stars showing the strongest support).
Second, CIViC serves as a knowledgebase for disseminating these curated assertions and summaries. Users can access the evidence, along with expert interpretations of the data through the web interface.
As of today, CIViC has captured literature on nearly 500 genes, and more than 3,000 variants, from 9,000 lines of evidence.
How is CIViC different from other applications? What does it offer that other applications don’t?
The major distinguishing feature of CIViC is our strong commitment to open-access data and open-source software. The CIViC source code is available with a Massachusetts Institute of Technology (MIT) license. This allows developers to easily reuse or adapt the software, including both academic and commercial use, with virtually no restriction.
Similarly, access to expertly curated knowledge is available to the public completely free of charge through a Creative Commons copyright license (i.e., CC0 public domain attribution). This public domain attribution is essential for putting data directly into the hands of scientists, without the need to seek additional rights and permissions.
In addition, because of its open data model, CIViC can be integrated into existing clinical and research workflows. The variant and gene summaries, and supporting evidence, can be automatically incorporated into other resources and applications, such as clinical reports, using the Application Programming Interface(API) or by downloading bulk data releases.
CIViC also features an expert crowdsource curation model. Any member of the research community can make suggestions or submissions to the knowledgebase. Expert editors then assess these submissions and comment on their relevance. We feel this free, open, and public approach helps democratize variant interpretation.
How did you become interested in this topic?
I was fortunate to be involved in two of the earliest real-world applications of high-throughput sequencing in precision oncology. In both cases, patients with very advanced cancers had comprehensive (whole genome and whole transcriptome) sequencing of their tumors performed in a research setting. After extensive bioinformatics analysis and interpretation, we identified potential genetic pathways that could be targeted for treatment. Both patients were treated with experimental therapies under compassionate-use circumstances, and both saw significant responses.
These responses were anecdotal, but they clearly illustrated the potential of genome-guided precision oncology. This made me wonder if we could extend these early successes to help more patients.
A major bottleneck was at the variant interpretation stage. Variants are continuously and redundantly re-interpreted by the community in hundreds of siloed private databases. We created CIViC to serve as a public knowledgebase to respond to that problem.
Were there any surprises that you encountered in developing CIViC?
We have been surprised by how rapidly external/public contributions have outpaced our own contributions. It was remarkable how quickly this idea gained traction. Within a few years, the number of submitted evidence statements from external curators passed our own team’s submissions. Now, nearly all submissions are from the community and our team of editors is primarily focused on moderating these submissions.
Were there particular challenges that you had to overcome?
The needs of the field are continuously changing. We’ve regularly updated and extended our data model and interface to accommodate new complexities, standards, and guidelines related to variant interpretation.
For example, one problem we identified is the lack of standards in data collection, particularly in variant, disease, or drug naming. There’s overlap in how these are classified in different knowledgebases, and even within the same platforms. This leads to confusion because in many cases we’re talking about the same thing but referring to it by different names. In CIViC we’re standardizing these terms by adopting structured ontologies, such as the Sequence Ontology, Disease Ontology, and the NCI Thesaurus. This makes it easier to identify all the relevant information for each variant, disease, or drug.
Other challenges are more people focused. Similar to peer review in the publication field, we rely on a pool of skilled experts who serve as volunteer editors. Each CIViC entry requires agreement between at least two of these independent editors before it qualifies for acceptance.
Such expert contributors are what makes CIViC so valuable. In the beginning, we did much of the curation and assessments internally. Now, with greater outreach, we’re expanding our volunteer workforce. It can be difficult to find, recruit, and engage an interdisciplinary, international team of experts. However, we’re always seeking new ways to foster a greater sense of community, including collaborative publications, hackathons, and awarding badges and leaderboard recognition to people who go above and beyond in their CIViC contributions.
The data also present challenges. To be most relevant, we need studies based on data that reflect the full diversity of the general population. For example, we know that certain variants are unique to different subgroups in our population. We’d like to use CIViC to report on those differences and to ensure that these variants are accurately captured for all members of the general public. This is another area where we are seeing some initial growth, but more needs to be done.
Where do you see this application headed in the next 5–10 years?
I hope that the knowledgebase will continue to grow and more people (both consumers and contributors alike) will see it as a one-stop-shop for cancer variant interpretation. CIViC has the potential to truly democratize access to variant interpretation knowledge. We believe that our strong commitment to openness and transparency will lead to widespread adoption. By helping to support wide access to genome-guided precision oncology, we can make a difference in the lives of cancer patients around the world.
We’d also like to better address disparities in the adoption of CIViC and its integration into workflows across the research spectrum—from academic centers to community hospitals. Currently, CIViC is well accepted at major cancer research universities, but in the spirit of democratized data, we’d like to see this resource extend well beyond these centers to smaller hospitals and rural locations.
And finally, we’d like to be able to automate the process more. We’re currently using automation to help with general housekeeping tasks, such as flagging spelling errors and inconsistencies that can be auto-fixed. We’d like to be able to extend this beyond general housekeeping, eventually using machine learning to self-populate some fields and to call attention to problematic entries that might need further review by humans.
What’s been the response from the field? Are people using CIViC?
We have seen tremendous response from the field. Each month, CIViC is used worldwide by thousands of web visitors with millions of API accesses. It has been integrated into dozens of other resources and clinical workflows in both academia and industry.
In addition, CIViC is now the primary database and curation interface for the Clinical Genome Resource (ClinGen) for use in interpreting somatic variants (i.e., variants that appear during a person’s lifetime due to exposure to environmental factors or through errors in cell division). Through this partnership, CIViC is able to offer an interface for curation of structured data, link to formal ontologies, and map to additional databases—further strengthening the information available on somatic variants.
What satisfies you the most about this work? What makes you the most proud?
The community contributions are definitely the most satisfying part of this work. Hundreds of scientists around the world have donated their time and expertise to help build a knowledgebase that helps everyone. This project truly reflects the strengths of an open-source and open-access approach to science—that definitely makes me most proud.
ITCR supports informatics tools, like CIViC, to address challenges related to data-driven research. Currently, ITCR has aided in the development or refinement of more than 100 open-access informatics tools. Other recent blogs featuring ITCR tools include NCI’s ITCR Training Network Puts Cancer Research Tools and Training at Your Fingertips, Wrangling Data for Microbiome Research—Focus on QIIME 2, and Using Bioinformatics to Solve the Neoantigen Puzzle. Learn more about ITCR funding.