Sharing Data: The Basics
What is Data Sharing?
NIH expects that you make scientific data as widely and freely available as possible to facilitate re-use while safeguarding the privacy of patients and protecting confidential and proprietary data.
Data Sharing Important for Cancer Research?
Sharing is particularly important for unique data that cannot be readily replicated or are difficult to generate. NCI sees
What Do I Need to Know?
Fundamental Tips for Effective Data Management and Sharing
To practice good
- Organize your data so it can be readily accessed by you, your colleagues, and anyone that may need to utilize your data in your absence.
- Document the data type and format used when generating data. Be aware of data types and formats relevant to your research area.
- Example data types generated in cancer research include genomics and other omic data; imaging data; epidemiology/population-related data; pre-clinical data; biochemical data; immunological data; and clinical data
- Save data in a standardized format. Each data type may have different file formats. The NCI Genomic Data Commons (GDC) provides a list of file formats and templates for molecular characterization data types. When possible, the file(s) should be in a non-proprietary format (such as .txt, .jpeg) and not in proprietary formats (.xls, .doc, etc.). This gives those who use your data flexibility, because they can then use the data independently of any software platform. It may be helpful to find information about data formats and standards by consulting resources at your institution. You can visit the library, other investigators, shared core facilities, or consult external resources such as scientific journals, data repositories, and international standards bodies, including the GDC.
- Create informative file names so data users can understand the content and data type. File names should be specific enough to not clash with future or unrelated files. Avoid spaces and special characters.
- Store your data in a safe and secure location, like a server or backup system. Your institution may offer free access to commercial cloud-based backup systems. Avoid using flash drives or desktops/laptops, as these are not easily shareable and can be damaged or lost.
- Record your metadata in a timely manner so anyone interpreting the data can reuse and re-analyze your data with ease.
Metadatacan include experimental methods or procedures, data labels, variable definitions, and any other information necessary to understand and reproduce the conditions in which your data were generated.
- Plan ahead! You can easily maintain your efforts if plans are made ahead of time to consider
data managementthroughout the life of the research project.
Data Sharing Expectations
Expectations around data sharing have evolved, and the culture is moving toward a standard for broad sharing of scientific data generated by research activities. You may have already observed or been involved in sharing data among individual collaborators or large collaborative groups or consortiums. However, it is important to engage in broad
In short, keep these definitions in mind:
- Collaborator Sharing: Sharing upon publication or request to an author between investigators. This only helps the individual.
- Consortium Sharing: Sharing within large collaborative groups (e.g., collaborative networks/programs). This only benefits a focused group.
- Broad Sharing: Sharing with larger research communities, institutions, and the broader public. This helps the community and ensures fair and equitable data access.
Data Sharing In Practice
What data do I have to share?
You must share all scientific data necessary to reproduce your findings, which can include:
- primary data sets (i.e., generated by original work),
data sets(i.e., generated by re-use of primary data sets),
- qualitative data (e.g., from social and behavioral
data sets), and
- data from fundamental basic science techniques to validate and replicate research findings (e.g., western blots, electrophoresis gels, flow cytometry).
For a list of examples that are not considered scientific data by NIH, see “Research Covered Under the
Your funding opportunities may have additional expectations for what and how data should be shared.
How do I share?
You should share your findings in a public and accessible
Here's what you should consider when selecting a repository:
- Data Type: You should select the repository that is most appropriate for your data type and discipline. If your data set or project includes multiple data types or includes a data type not accepted by data type-specific
repositories, you can submit to generalist repositories.
- Data Security: When sharing your data in a public repository, consider factors such as protecting and assuring the confidentiality and privacy of all participants, as well as the size and complexity of the data set.
- Data Access: The two general categories of data shared in
- Public access data—Data made publicly available to everyone without access restrictions. NIH examples include Gene Expression Omnibus and GenBank.
- Controlled access data—Data made available for secondary research only after investigators have obtained approval to use the requested data for a particular project. Access to controlled data in the Database of Genotypes and Phenotypes will be granted by an NIH Data Access Committee. Consult this instructional video and tips document to see how to make a request.
- FAIR Data Standards: Share your data in public
repositoriesthat adhere to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Some repositoriesprovide a unique persistent digital identifier for a submitted data set, such as a DOI, so others can easily find your data set.
- Data Preservation and Availability: You should consider relevant requirements and expectations (e.g., repository, award, journal, and institutional requirements) as guidance for the duration for which scientific data must be preserved and made available. Please keep all of these factors in mind when selecting a repository to store and make it accessible for others to use.
When do I share?
You should share your data as soon as reasonably possible!
However, you will need to coordinate with your principal investigator, institution, and the NIH program officer who oversees your grant funding.
For example, the
- when you publish, or
- when your funding ends (specifically, the funding that supported data generation for your research project).
Data Sharing Resources and Initiatives
Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the data science lifecycle.
Resources and Tools
- National Cancer Plan: Discover how maximizing data utility is one of the eight goals of NCI’s comprehensive framework.
Data sharingis central to NCI’s mission to lead, conduct, and support cancer research nationwide to advance scientific knowledge and improve lives.
- Data Sharing: Check out this section of our website for sharing-oriented information on policies, genomic data preparation, and more!
- NCI Bioinformatics Training and Education Program Seminar: Watch a recording and learn how to keep your data FAIR.
- Breaking Down Barriers to Sharing Cancer Data—The NIH Generalist Repository Ecosystem Initiative: Discover how NIH is working to make generalist
repositories(GRs) part of the data sharingecosystem. The goal is to minimize sharing barriers while still taking advantage of GR convenience and usability. Data SharingAdvocacy—How a Cancer Survivor Seeks to Enhance Data Sharingto Better the Patient Experience: Read this personal testimony from Mr. Steve Friedman—a cancer survivor and NCI employee—who has witnessed firsthand the power of data science and sharing tools.
- Semantics Primer: Get the basics on cancer research “semantic” terminology, its influence on data interoperability, and why that’s so critical.
- Semantics Series—A Deep Dive Into Common Data Elements: Continue to learn about semantic terminology. In this blog, you’ll learn what a CDE is and why researchers need them.
- Your Guide to the 2023 NIH
Data Managementand Sharing Policy: Read it if you’re an NCI-funded investigator. You’ll learn what has changed from the 2003 policy, what you need to do, and where you can find help.
- Projects/programs that strive for efficient and effective
- NCI’s ITCR Program has training courses available via the ITCR Training Network. The course, “Ethical Data Handling for Cancer Research” has tips on data privacy, security, sharing and ethics.
Data Sharing Resources
- Visit the NIH Scientific
Data SharingWebsite for a full list of NIH-supported repositories.
- Watch training modules on enhancing data reproducibility from NIH.
- Applying for NIH funding? See what you need to include in your DMS Plan.
- Attend an NIH
Data Sharingand Reuse Seminar Series, and see how other researchers are finding ways to reuse their data or generate new findings from other data sets.
- Explore the NIH Common
Data Elements(CDEs) Repository where you can browse NIH-endorsed CDEs and Forms for standardizing your data. Live and on-demand trainings on CDEs and how to search this repository are also available.
- Read this blog on health data standards from NIH’s National Library of Medicine.
- Ready to start your project? Get an overview of the
data sciencelifecycle and what you should do in each stage.
- Want to learn the basic skills for cancer
data science? Check out our basics skills video course.
- Need answers to
data sciencequestions? Visit our Training Guide Library.