Metadata for Cancer Data
NCI fosters the shared metadata standards for all cancer data. This helps foster sharing, re-use, and aggregation of cancer data among repositories, giving researchers a larger pool of data to study, and greater opportunity to validate their results.
To assist with this interoperability, NCI established the cancer Data Standards Registry and Repository (caDSR). The caDSR and its associated applications help the research community manage and use data standards by providing the shared standards in a variety of human and machine readable contexts.
About Cancer Data Standards Registry and Repository (caDSR)
caDSR has a database, application programming interfaces (APIs), and web-based applications for creating and using data standards for cancer metadata.
The metadata in caDSR describe elements used in cancer research and clinical trials:
- Data Elements
- Information Models
The caDSR captures data elements, including Common Common Data Elements (CDEs). Using CDEs promotes consistent practices for collecting and storing data, making it possible to aggregate data across data sets and facilitate understanding, interpretation, and sharing of cancer research data. NCI’s CDEs primarily derive from data collection forms, spreadsheets, researcher data dictionaries, and protocols used in clinical trials.
Individual CDEs can be grouped into collections to make them easier to find and use in logical ways. The collections can be specific, like a patient blood pressure panel, or general collections relating to demographics or treatment techniques.
For current large collections of CDEs, see the caDSR Hosted Data Standards, Downloads, and Transformations Utilities.
caDSR forms include standard questionnaires, measures, and other standard research instruments, as well as Case Report Forms (CRFs).
CRFs represent a set of questions used to collect data during clinical trials and other research studies. Each individual question can be represented using the details of a Common Data Element (CDE).
NCI worked with the clinical trials community to create Standard CRF Templates to support clinical trials data collection. Data managers can edit and develop CRFs for their specific research aims using the caDSR Form Builder.
An information model is a software engineering representation of the concepts about a particular domain, such as cancer research and clinical care. A model describes the concepts and relationships, constraints, rules, and operations that can be performed between formal entities, like a Patient and Medical History. Developers can use these models to design software systems.
These information models are translated into organized collections of CDEs by specialized caDSR tools.
The Biomedical Research Integrated Domain Group (BRIDG) is an example of a such a model. It defines the classes of information and attributes in clinical and health care, as well as how these classes of information are related. For example, Biological Entities, such as a Person or an Animal, have specific attributes such as a name, birth order, birth date, and sex genotype that distinguish one entity from another. This model is represented in caDSR as CDEs and was developed collaboratively with clinical research and healthcare stakeholders, the Clinical Data Interchange Standards Consortium (CDISC), Federal Drug Administration (FDA), Health Level Seven (HL7), and the International Standards Organization (ISO).
Organization of Metadata
NCI uses the international ISO/IEC 11179 Standard for Metadata Registries to represent caDSR metadata in the database. The model was developed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), and defines a number of fields and relationships for Metadata Registries so that registries across the word can talk to each other. In implementing the ISO/IEC 11179 model, NCI extended it to support representation of two additional types of content: forms and information models.
The caDSR metadata repository is a community-based resource and stays useful only if the whole community follows best practices and governance requirements. There are several ways to get support with projects related to the caDSR.
The Metadata Support Team is available to support organizations with the curation process. They work in parallel with the Enterprise Vocabulary Services (EVS) editors to provide guidance and assist with curating data collection forms in a standards-based way. caDSR staff can also provide tools to transform caDSR content into various formats, such as a CSV file in data dictionary format for REDCap™ systems.
For support with a caDSR application or to contact the Metadata Support Team, a request can be submitted through the NCI Application Support Team at NCIAppSupport@nih.gov.
Credentialed users of caDSR are also a part of the caDSR Community User Group. The group holds bi-monthly meetings for updates on content, sharing best practices, and user questions. There is also a listserv to connect to other members.
caDSR provides web-based, interactive applications for managing and sharing CDEs, CRFs, and information models. All caDSR tools and interfaces connect to the same central caDSR database. The tools are web-based and publicly accessible.
Some tools require a secure, role-based caDSR login for creating and editing content. To obtain an account, refer to the training requirements and course descriptions on the caDSR Training Wiki.
Required courses are both self-paced and instructor-led. Once you complete the necessary courses, you will be given access to the curation tools and assigned a mentor from the Metadata Support Team.
To see which trainings are required to support your role and take the self-paced courses, visit the caDSR Training Wiki. For questions or to register for courses, contact the NCI Application Support Team.
Browse, search, and export CDEs in XML or Excel formats. Predefined downloads of commonly requested NCI data standards and collections of CDEs are available on the Downloads wiki page.
Create, edit, browse, and customize downloads of caDSR content in Excel.
Build and download forms that use CDEs, ensuring questions are asked and recorded consistently. Forms can be downloaded in Excel or XML.
Perform targeted searches on caDSR content through a user interface or API, such as weighted string matches for specific attributes. Learn more on the caDSR Freestyle Search wiki page.
Use the SIW to add semantic annotations to an information model by matching the model’s classes and attributes with terminology from the NCI Thesaurus (NCIt) or by matching to existing caDSR content. The annotated file is sent to caDSR staff who use the caDSR UML Loader to extract and transform elements of the model into caDSR content.
Create and manage alerts to monitor changes to caDSR content. Login information is required. Learn more on the caDSR Sentinel Tool wiki page.
Update caDSR passwords and set up security questions. Users must change their passwords every 60 days.
caDSR Application Programming Interfaces (APIs)
Application programming interfaces (APIs) provide another way to access caDSR content.
This interface uses the caDSR HTTP API to retrieve all types of caDSR metadata. End users and application developers can use this interface to view content via a web browser, test HTTP calls, and retrieve content in HTML or XML formats. Java APIs are also provided. Learn more on the caDSR APIs and REST Examples wiki page.