Cancer Data Science Pulse
NCI Listens: Your Feedback on Data Sharing Challenges and Opportunities
Via a Request for Information (RFI), NCI’s Office of Data Sharing (ODS) invited researchers like you to share existing processes and workflows related to managing scientific data. This includes topics such as data sharing, the types of data you collect, data standards, and more.
Read on to learn the questions we asked, the insights we gained, and what we plan to do with your feedback. Email us to request a complete copy of the RFI findings.
Why did we ask about your data sharing processes?
Maximizing data utility is one of the eight goals of the National Cancer Plan, with a specific focus on secure data sharing and use of available data by researchers to achieve rapid progress against cancer. NIH’s 2023 Data Management and Sharing (DMS) Policy sets expectations for data management and sharing for its grantees and investigators. To understand the impact, we wanted to know the challenges you face when attempting to share and reuse shared data. Our goal is to establish a coordinated, sustainable, and efficient infrastructure for data sharing. This infrastructure is essential for storing, preserving, retrieving, and sharing data, as well as following the guidelines for ensuring that data is findable, accessible, interoperable, and reusable (FAIR).
What did we ask about data management and sharing?
Through the RFI, we collected information about how you promote data sharing, including:
- the services you use (i.e., guidance on data standards and file formats).
- the technologies you use (i.e., storage, compression, retrieval, archival, analysis tools, and workbenches).
- the processes you execute (i.e., governance, communication, education, and outreach).
We wanted input from a variety of sources, including:
- individual research laboratories.
- personnel from scientific instrumentation core facilities.
- offices of research or sponsored projects.
- offices of provosts.
- libraries.
- information technology and security departments.
- institutional review boards.
- bioinformaticians and data scientists contributing to data curation, formatting, and analysis.
What feedback did we receive from the RFI?
Respondents, like you, had concerns with the:
- significant costs associated with data curation, sharing and storing large data files, and egress fees (i.e., charges when transferring data between locations).
- lack of data management resources at many institutions. Many expressed the need for a community approach involving shared resources.
- misinterpretation of data due to the absence of active collaborations.
- inadequate security measures which pose a major problem for data sharing and collaborations.
- Need to rely on journal publications and direct collaborations instead of repositories, due to the lack of searchable metadata in many repositories.
Suggestions included:
- funding for the development of tools that enable data provenance, searching, and downloading data.
- addressing the necessity of standardization across repositories, including journal databases.
- instituting strategies to make data accessible and easily discoverable in manuscripts through a structured format (e.g., repository name, title of the data set, URL, etc.). This could enhance the reuse and citation of data.
What are we doing with this information?
Listening to the responses received, there are several areas where we are working to provide guidance and support to facilitate effective data sharing and management. We’re using your input to help decide what we focus on and ensure that ongoing initiatives align with the support you need. We are also discussing future projects to address what you shared in the RFI.
Engaging Shared Resources: It’s important to involve shared resources, such as instrumentation facilities, bioinformatic cores, research IT departments, and data librarians, in the process of generating and curating standardized data. We’re discussing your feedback and ways we could support this goal through funding for training and building data management services and infrastructure within institutions.
Federated Network of Shared Resources: Building a federated network of shared resources across institutions, anchored by data type-specific Data Coordination Centers, could contribute to standardizing data formats and creating minimum metadata or data attributes required for data discovery and reuse. We’re taking this collaborative approach to help establish consistent practices and enhance interoperability.
Metadata Guidance: We’re working on including specific metadata requirements in Notices of Funding Opportunity. This could involve requiring the inclusion of Digital Object Identifiers or Persistent Identifiers for data; Resource Research Identifiers for reagents and protocols; funding information; and minimum data attributes based on data types when sharing data.
Support for Data Format Conversion: We’re providing support for the conversion of proprietary data formats to aid in enhancing interoperability. This support will enable researchers to transform their data into standardized formats, facilitating seamless integration and reuse across different platforms and tools.
Landscape Analysis of Data Repositories: Conducting a comprehensive analysis of data repositories and engaging with their staff ensures that data within repositories are findable and interoperable. Our efforts involve collaborating with them to establish best practices, promote data discovery, and encourage adherence to standardized metadata attributes.
Strategies for Enhancing Data Findability and Accessibility: It is extremely important to develop supplemental strategies to make data findable and accessible. We’re taking steps to implement requirements that make data accessible and easily discoverable within manuscripts through a structured format and as part of the progress reports. By encouraging researchers to explicitly state the availability and accessibility of their data, data discoverability could be enhanced.
Categories
- Data Sharing (65)
- Informatics Tools (41)
- Training (39)
- Genomics (36)
- Data Standards (35)
- Precision Medicine (34)
- Data Commons (33)
- Data Sets (26)
- Machine Learning (24)
- Artificial Intelligence (23)
- Seminar Series (22)
- Leadership Updates (14)
- Imaging (12)
- Policy (9)
- High-Performance Computing (HPC) (9)
- Jobs & Fellowships (7)
- Semantics (6)
- Funding (6)
- Proteomics (5)
- Awards & Recognition (3)
- Publications (2)
- Request for Information (2)
- Information Technology (2)
- Childhood Cancer Data Initiative (1)
Leave a Reply