Sharing Data: The Basics

Sharing Data: The Basics

What is Data Sharing?

NIH expects that you make scientific data as widely and freely available as possible to facilitate re-use while safeguarding the privacy of patients and protecting confidential and proprietary data. Data sharing holds immense value in the scientific research field, enhancing your career as a successful scientist by providing recognition and credit for researchers’ work.

Why is Data Sharing Important for Cancer Research?

Sharing is particularly important for unique data that cannot be readily replicated or are difficult to generate. NCI sees data sharing as vital for scientific progress, aiding in research validity and data accessibility, promoting data combination and reuse, and ultimately accelerating the pace of biomedical discoveries.

What Do I Need to Know? 

Fundamental Tips for Effective Data Management and Sharing

You’ll find similar and additional tips in the corresponding article, “Generating and Collecting Data: The Basics.
 

To practice good data management and allow for efficient data sharing when it’s time for you to share your data, here are some tips:

  • Organize your data so it can be readily accessed by you, your colleagues, and anyone that may need to utilize your data in your absence.
  • Document the data type and format used when generating data. Be aware of data types and formats relevant to your research area. 
    • Example data types generated in cancer research include genomics and other omic data; imaging data; epidemiology/population-related data; pre-clinical data; biochemical data; immunological data; and clinical data
  • Save data in a standardized format. Each data type may have different file formats. The NCI Genomic Data Commons (GDC) provides a list of file formats and templates for molecular characterization data types. When possible, the file(s) should be in a non-proprietary format (such as .txt, .jpeg) and not in proprietary formats (.xls, .doc, etc.). This gives those who use your data flexibility, because they can then use the data independently of any software platform. It may be helpful to find information about data formats and standards by consulting resources at your institution. You can visit the library, other investigators, shared core facilities, or consult external resources such as scientific journals, data repositories, and international standards bodies, including the GDC.
  • Create informative file names so data users can understand the content and data type. File names should be specific enough to not clash with future or unrelated files. Avoid spaces and special characters.
  • Store your data in a safe and secure location, like a server or backup system. Your institution may offer free access to commercial cloud-based backup systems. Avoid using flash drives or desktops/laptops, as these are not easily shareable and can be damaged or lost.
  • Record your metadata in a timely manner so anyone interpreting the data can reuse and re-analyze your data with ease. Metadata can include experimental methods or procedures, data labels, variable definitions, and any other information necessary to understand and reproduce the conditions in which your data were generated.
  • Plan ahead! You can easily maintain your efforts if plans are made ahead of time to consider data management throughout the life of the research project.

Data Sharing Expectations

Expectations around data sharing have evolved, and the culture is moving toward a standard for broad sharing of scientific data generated by research activities. You may have already observed or been involved in sharing data among individual collaborators or large collaborative groups or consortiums. However, it is important to engage in broad data sharing when engaging with the larger scientific community and the public (as this ensures the maximum benefit for all involved).

In short, keep these definitions in mind:  

  • Collaborator Sharing: Sharing upon publication or request to an author between investigators. This only helps the individual.
  • Consortium Sharing: Sharing within large collaborative groups (e.g., collaborative networks/programs). This only benefits a focused group
  • Broad Sharing: Sharing with larger research communities, institutions, and the broader public. This helps the community and ensures fair and equitable data access.

Data Sharing In Practice

What data do I have to share? 

You must share all scientific data necessary to reproduce your findings, which can include:

  • primary data sets (i.e., generated by original work), 
  • secondary data sets (i.e., generated by re-use of primary data sets), 
  • qualitative data (e.g., from social and behavioral data sets), and 
  • data from fundamental basic science techniques to validate and replicate research findings (e.g., western blots, electrophoresis gels, flow cytometry). 

For a list of examples that are not considered scientific data by NIH, see “Research Covered Under the Data Management & Sharing Policy.” 

Your funding opportunities may have additional expectations for what and how data should be shared.

How do I share?

You should share your findings in a public and accessible repository. For certain programs and data types, NIH/NCI policy may specify designated data repositories for use.

Here's what you should consider when selecting a repository:

  • Data Type: You should select the repository that is most appropriate for your data type and discipline. If your data set or project includes multiple data types or includes a data type not accepted by data type-specific repositories, you can submit to generalist repositories.
  • Data Security: When sharing your data in a public repository, consider factors such as protecting and assuring the confidentiality and privacy of all participants, as well as the size and complexity of the data set.
  • Data Access: The two general categories of data shared in repositories are:
  • FAIR Data Standards: Share your data in public repositories that adhere to the FAIR (Findable, Accessible, Interoperable, Reusable) data principles. Some repositories provide a unique persistent digital identifier for a submitted data set, such as a DOI, so others can easily find your data set.
  • Data Preservation and Availability: You should consider relevant requirements and expectations (e.g., repository, award, journal, and institutional requirements) as guidance for the duration for which scientific data must be preserved and made available. Please keep all of these factors in mind when selecting a repository to store and make it accessible for others to use.

When do I share? 

You should share your data as soon as reasonably possible!

However, you will need to coordinate with your principal investigator, institution, and the NIH program officer who oversees your grant funding. 

For example, the Data Management and Sharing policy states scientific data should be shared by the earlier of two points in your research:

  • when you publish, or
  • when your funding ends (specifically, the funding that supported data generation for your research project).

Review the data sharing policies that might impact your timeline.

NCI Data Sharing Resources and Initiatives

Now that you have a sense of the basics, use the following resources to discover more about the topic and understand NCI’s investment in this stage of the data science lifecycle. 

 

Resources and Tools

  • National Cancer Plan: Discover how maximizing data utility is one of the eight goals of NCI’s comprehensive framework. Data sharing is central to NCI’s mission to lead, conduct, and support cancer research nationwide to advance scientific knowledge and improve lives.
  • Data Sharing: Check out this section of our website for sharing-oriented information on policies, genomic data preparation, and more!
  • NCI Bioinformatics Training and Education Program Seminar: Watch a recording and learn how to keep your data FAIR.

Blogs

Projects

Additional Data Sharing Resources

 

Updated:
Return to the previous stage
Visualizing Data
Vote below about this page’s helpfulness.