Metadata Services for Cancer Data
NCI fosters the shared metadata standards for all cancer data that link together semantic meaning and data value. These standards promote sharing, re-use, and aggregation of cancer data among repositories.
To assist with this interoperability, NCI established the cancer Data Standards Registry and Repository (caDSR). The caDSR and its associated applications help the oncology research community manage and use data standards by providing the shared standards in various human and machine-readable contexts. The Metadata Content Development Team supports the oncology research community in developing and maintaining harmonized, standardized metadata for oncology research.
Common Data Elements
Common Data Elements (CDEs) use terminology as the foundation, precisely binding a complex research question and response set together, consistently conveying machine-readable meaning. Utilizing CDEs in a common, standard way promotes structured data collection, facilitating data sharing and aggregation of larger data sets for analysis and validation. NCI primarily derives its CDE registry from data collection forms, templates, and data dictionaries produced in NCI clinical trials.
cancer Data Standards Registry and Repository (caDSR)
caDSR is NCI’s registry and repository for oncology research CDEs and forms. caDSR has a database, application programming interfaces (APIs), and web-based tools for creating and using standards for cancer metadata. caDSR uses the international ISO/IEC 11179 Standard for Metadata Registries to represent metadata in the database. The International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) developed the model that defines fields and relationships for metadata registries so that registries across the world can talk to each other. caDSR uses Controlled Terminology (CT) provided by NCI Enterprise Vocabulary Services (EVS) from NCI Thesaurus (NCIt) to organize individual CDEs into collections to make them easier to find and use in logical ways. The collections can be specific (i.e., for a group of CDEs related to pediatric cancer genomics) or general (i.e., those relating to diagnoses or therapies).
The caDSR community of trained and mentored curators registers the use of CDEs in caDSR and promotes standardization and reuse of registered data elements across the community. Registering the use of CDEs enables consistent practices and structures electronic data collection so that data coming out of trials is structured for analysis, making it possible to aggregate like-data across data sets. CDEs can be browsed, searched, and exported with the CDE Browser. CDEs are developed by trained, expert curators using the caDSR Curation Tool.
Forms and Templates
CDEs can be bound together in caDSR to build a data collection form or template and are used to collect data for cancer protocols in clinical trials. Research protocols use many Case Report Forms (CRFs) to collect the data researchers are studying. These CRFs and Data Collection Templates (DCTs) are built within caDSR Form Builder and exported as Excel (.xlsx or .csv) and XML spreadsheets. caDSR forms include CRFs, Patient-Reported Outcome measures, Eligibility Criteria, DCTs, and other research instruments. caDSR forms are built using collections of CDEs that have both NCI internal data standards and externally controlled data standards. Similar to CDEs, caDSR Form Builder stores forms for reuse by the community.
NCI Data Collection Standard CDEs and Template Forms
NCI data collection standard CDEs and template forms were created in response to the Clinical Trials Working Group’s (CTWG’s) recommendations to improve information sharing among cancer researchers and optimize data requirements in collaboration with the U.S. Food and Drug Administration (FDA). The national community of oncology research stakeholders participated in developing standards that represent the minimum processes required to conduct cancer research. The NCI data collection standard CDEs and template forms are now embedded in all NCI clinical trials to improve the efficiency and accuracy of the routine review of safety, efficacy, and administrative data from ongoing NCI-funded clinical trials. This NCI standard core library allows faster initiation of new trials by reducing the time spent developing a data collection strategy per trial, improving safety and delivery speed of new and improved oncology treatments to cancer patients.
In 2017, the FDA announced national guidance that required all Investigational New Drug (IND) trial data be submitted using the Clinical Data Interchange Standards Consortium (CDISC) reporting standard: the Study Data Tabulation Model (SDTM). NCI engaged in a complex 5-year harmonization effort that included adopting the CDISC Clinical Data Acquisition Standards Harmonization (CDASH) model for all trials to ease the institutional burden of transformation to CDISC SDTM when submitting trial data to the FDA. As part of that effort, the NCI data collection standards and template forms were aligned with CDISC, CDASH, and SDTM models. The primary focus of this activity was not to change the existing NCI data collection standard CDEs and template forms but to create a second version aligned with the CDISC standards to enable study builders to easily map the NCI standard CDEs to CDISC variables for FDA submission of IND trial data sets in SDTM format.
An information model is a software engineering representation of the concepts about a particular domain, such as cancer research and clinical care. A model describes the concepts and relationships, constraints, rules, and operations that can be performed between formal entities, like a patient and medical history. Developers can use these models to design software systems. The Biomedical Research Integrated Domain Group (BRIDG) Model is an example of an information model providing a shared view of the dynamic and static semantics for basic, pre-clinical, clinical, and translational research and its associated regulatory artifacts.
Browse, search, and export CDEs in XML or Excel (.xls) formats.
Credentialed users can create, edit, and maintain caDSR content. Non-credentialed users can browse and search only.
Search and download forms that use CDEs. Forms can be downloaded in Excel (.xls) or XML.
caDSR Application Programming Interfaces (APIs)
APIs provide another way to access caDSR content.
This interface uses the caDSR HTTP API to retrieve all types of caDSR metadata. End users and application developers can use this interface to view content via a web browser, test HTTP calls, and retrieve content in HTML or XML formats. Java APIs are also available. Learn more about caDSR APIs and REST Examples.
Service and Support
caDSR is a publicly accessible, community-based resource with embedded best practices and governance processes. The Metadata Content Team provides metadata development services to the community and is available to support organizations with the curation process by contacting them at caDSR.RA@nih.gov.
For support with caDSR tools, submit a request through the NCI Application Support Team.
Trained and credentialed caDSR community curators are the stakeholders who make up the caDSR Content Team User Group. The group holds quarterly meetings to provide updates on practices that impact the caDSR metadata content, as well as share best practices, vet business rules, participate in governance processes, and consider user questions and issues. This group utilizes a caDSR listserv to communicate urgent issues and promote common projects.