Cancer Data Science Pulse

Improving Access to NCI’s Individual-Level Genomics and Other Omics Data

NCI supports scientific exploration through streamlined access to data within the database of Genotypes and Phenotypes (dbGaP). This process expedites data sharing and potentially accelerates the discovery process.

Research using large-scale genomic and other omics data is no longer limited by storage space or slow downloading speed for individual data sets. Innovations like cloud infrastructure enable computing over many data sets at multiple locations all at once. These developments have increased the need for faster, more efficient processes for data access and sharing.  

The dbGaP archives and distributes data and results from studies that illustrate the interaction between human genotypes and phenotypes. The type of data (genomic, demographic, imaging, etc.) dbGaP stores makes defining “identifiability” complicated. Even though the data in dbGaP contains none of the typical identifiers, it’s necessary to go a step further to protect data confidentiality and patient privacy. The dbGaP shares potentially sensitive data in a manner consistent with the research participants’ informed consent. The informed consent given at the time of sample or data collection determines how controlled-access data can be shared for secondary research use.

Responsible stewardship of controlled-access data within dbGaP is the responsibility of 18 Data Access Committees (DAC) across NIH and NCI. NCI’s DAC provides guidance and oversight of controlled-access data housed within repositories, including dbGaP, the Genomic Data Commons, and other NCI-maintained data repositories

In the spirit of data sharing to support scientific exploration, NCI will allow researchers streamlined access to the broad-use data sets. This approach for streamlined access reduces redundancies for submitting data access requests for most controlled-access data sets within dbGaP.

To meet the demand for biomedical data, NCI created two large collections of their broad-use studies. These collections comply with data use limitations for how secondary access to studies are determined.

Investigators can access these broad-use collections by submitting a new project request or modifying an existing approved project.

After registering new genomic studies for release through dbGaP, NCI automatically adds them to these collections. NCI anticipates these broad-use collections will grow with an additional 575 studies to be released in the future.

NCI’s Office of Data Sharing is committed to developing improvements to NCI-managed data access processes that allow for faster exploration of scientific questions. Visit dbGaP Collections to explore these new resources.

 

Footnotes

  • phs-accession numbers for collections:
    • NCI’s Collection of Datasets for General Research Use phs003014
    • NCI’s Collection of Datasets for Health, Medical, and Biomedical Research Purposes phs003044
  • Instructions for modifying existing projects in dbGaP to add the collections
Health Science Administrator, Office of Data Sharing, Center for Biomedical Informatics and Information Technology, NCI
Older Post
Data Set 411: The National Lung Screening Trial
Newer Post
Growing the Field—NCI Fellowship Opportunities in Data Science

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

Thank you for sharing this informative article about NCI's efforts to streamline access to genomic and phenotypic data through dbGaP. It's heartening to see how advancements in data sharing and responsible stewardship are accelerating scientific discovery while respecting patient privacy and consent. These initiatives are crucial for advancing health, medical, and biomedical research. Kudos to NCI's Office of Data Sharing for their commitment to improving data access processes and supporting scientific exploration.
Thank you for sharing your thoughts on the blog, and on NCI’s efforts in data sharing for advancing research. We hope you’ll subscribe to our weekly update email, where we share relevant news, upcoming data science events, and our most recent blogs. If you have anything additional you'd like to share about your experiences with data sharing, you can also contact nciofficeofdatasharing@mail.nih.gov directly!