Vocabulary for Cancer Research
Controlled terminologies and ontologies provide the underlying foundation for data integration, sharing and re-use, and knowledge management. This common vocabulary is an essential part of making data consistent and interoperable. These efforts promote harmonization and shared standards across NCI's informatics infrastructure. NCI manages and distributes cancer vocabulary through NCI Enterprise Vocabulary Services (EVS). The EVS team creates and maintains the vocabulary, made up of several different terminologies and ontologies that are organized and connected in different ways. This process involves working with many partners to develop, license, and publish terminology and to jointly develop software tools.
New terms are added and analysis tools are created in response to the needs of the research community.
We are collecting feedback on the content or structure of existing items and contributions for new terms to be used by NCI, EVS, and partner terminology products.
Terminologies and Ontologies
Consistent vocabulary, in the form of individual pieces of terminology or mapped concept ontologies, is the core of EVS.
NCI EVS provides services and resources, including the NCI Thesaurus and NCI Metathesaurus, that facilitate the use and standardization of terminology across the Institute and the larger biomedical community.
NCI’s core reference terminology and biomedical ontology are collected in the NCI Thesaurus (NCIt). Published monthly by NCI, the NCIt is used in a growing number of NCI and other systems. It covers more than 150,000 concepts used in the coding of clinical care, translational and basic research, and public information and administrative activities. The NCIt provides 120,000 textual definitions; synonyms; over 400,000 inter-concept relationships; other information on more than 40,000 cancers and related diseases; 25,000 single agents and related substances; combination therapies; a Federal Consolidated Health Informatics (CHI) standard anatomy section; and a wide range of other topics related to cancer and biomedical research. NCIt is a broadly shared coding and semantic infrastructure resource – more than half of NCIt concepts include content explicitly tagged by one or more EVS partners.
NCIt is available for download in a variety of formats. For information on third-party access, visit EVS Web Downloads.
Other terminology sets and standards also are available, with frequent additions. Four commonly used methods are given below, along with information on specific licenses and terms for usage.
NCIt archived material can be found in the NCIt Archive.
The NCI Metathesaurus (NCIm) is a comprehensive biomedical terminology database that provides a broad, concept-based mapping of terms from over 101 biomedical terminologies, with 7,500,000 terms mapped to 3,200,000 concepts representing their shared meanings. NCIm contains most terminologies used by NCI for clinical care, translational and basic research, and public information and administrative activities, including most public domain terminologies from the National Library of Medicine's UMLS Metathesaurus, as well as a growing number of other cancer-related and biomedical terminologies. NCIm is updated twice yearly.
EVS currently maintains more than 22 standalone terminologies and ontologies of special interest to NCI and the research community. EVS has helped create and harmonize several of these terminologies, which originate from other agencies and standards development organizations such as the U.S. Food and Drug Administration and the Clinical Data Interchange Standards Consortium.
Terminology Value Sets
Value sets are a pre-curated, standard set of meanings that can be used for biomedical coding. EVS and its partners have developed and now maintain more than 1,000 user-specific and generalized value sets.
Several of the key value sets are:
The Clinical Data Interchange Standards Consortium is an international non-profit organization that develops and supports global data standards for medical research.
EVS works with the U.S. Food and Drug Administration to develop and support controlled terminology in several areas. More than 15,000 FDA terms and codes are stored in NCIt and tagged and published as value sets.
The NCIt Neoplasm Core value set provides a core reference set of NCIt neoplasm classification concepts that are designed to facilitate consistent coding, analysis, and data sharing across a broad range of NCI and related resources. These files provide a comprehensive collection of key terms, definitions, simplified hierarchies, mappings to dozens of other terminologies, and molecular characteristics, all linked to online EVS resources.
EVS provides semantic support to NCI’s Clinical Trials Reporting Program to enable consistent and accurate clinical trial abstraction, coding, reporting, and portfolio management for internal NCI usage and effective clinical trial search for patients and providers. EVS develops, maintains, and publishes more than 20,000 concepts covering diseases, drugs, interventions, biomarkers, and demographics.
The National Council for Prescription Drug Programs Terminology creates and promotes the transfer of data related to medications, supplies, and services through the development of standards and industry guidance. It uses NCIt in two of its standards.
The National Institute of Child Health and Human Development Terminology and EVS have worked with numerous contributors from national and international academic, clinical, and research institutions to provide standardized terminology for coding pediatric clinical trials and other research activities. This terminology is included in NCIt and tagged as value sets for viewing or download.
Terminology Mappings are curated, paired mappings between several supported terminologies to support data translation and cross-referencing. Mappings continue to be created based on the interests and needs of the research communities.
Terminology Mappings can be searched, browsed, and downloaded through the Mappings tab in the NCI Term Browser.
Accessing NCI Terminology Content
NCI Term Browser
The NCI Term Browser publishes all terminologies hosted by EVS in an integrated environment, providing search support, cross-links, and a user-friendly interface. The NCI Term Browser provides access to the International Classification of Diseases, Ninth Revision; Clinical Modification (ICD-9-CM); International Classification of Diseases, Tenth Revision; Clinical Modification (ICD-10-CM); the Common Terminology Criteria for Adverse Events (CTCAE); the Medical Dictionary for Regulatory Activities (MedDRA); the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT); the National Drug File Reference Terminology (NDF-RT); the Gene Ontology (GO); and many other terminologies and ontologies used by NCI and its partners. Cross-terminology mappings and more than 1,100 terminology value set coding standards are also provided.
Back-end Services and Open Source Software
LexEVS is the EVS terminology server. It comprises a collection of software and services for loading, publishing, and providing access to vocabulary and ontology resources. The Mayo Clinic developed LexEVS as an open-source tool with NCI support. Many NCI and external applications, including the Cancer Data Standards Registry and Repository, use the server's application programming interfaces (APIs).
LexEVS Downloads contains the necessary source code, web services, java client.jar files, programming interfaces, and documentation needed to use LexEVS services.
EVSRESTAPI is an API offered by EVS to a native triple-stored backend terminology server with Elasticsearch indexes. It allows searches that capture the complete logical semantics of underlying terminologies.
NCI Protégé is the primary EVS editing software application. It is based on Stanford University's open-source Protégé tool. NCI developed Protégé plug-ins to meet EVS requirements and business rules, then contributed the code back to the community to further foster Protégé adoption.
For a full list of tools and downloads, visit the EVS website.
The EVS Wiki provides a greater level of technical detail about terminology tools and resources.