Cancer Data Science Pulse
Improving Access to NCI’s Individual-Level Genomics and Other Omics Data
NCI supports scientific exploration through streamlined access to data within the database of Genotypes and Phenotypes (dbGaP). This process expedites data sharing and potentially accelerates the discovery process.
Research using large-scale genomic and other omics data is no longer limited by storage space or slow downloading speed for individual data sets. Innovations like cloud infrastructure enable computing over many data sets at multiple locations all at once. These developments have increased the need for faster, more efficient processes for data access and sharing.
The dbGaP archives and distributes data and results from studies that illustrate the interaction between human genotypes and phenotypes. The type of data (genomic, demographic, imaging, etc.) dbGaP stores makes defining “identifiability” complicated. Even though the data in dbGaP contains none of the typical identifiers, it’s necessary to go a step further to protect data confidentiality and patient privacy. The dbGaP shares potentially sensitive data in a manner consistent with the research participants’ informed consent. The informed consent given at the time of sample or data collection determines how controlled-access data can be shared for secondary research use.
Responsible stewardship of controlled-access data within dbGaP is the responsibility of 18 Data Access Committees (DAC) across NIH and NCI. NCI’s DAC provides guidance and oversight of controlled-access data housed within repositories, including dbGaP, the Genomic Data Commons, and other NCI-maintained data repositories.
In the spirit of data sharing to support scientific exploration, NCI will allow researchers streamlined access to the broad-use data sets. This approach for streamlined access reduces redundancies for submitting data access requests for most controlled-access data sets within dbGaP.
To meet the demand for biomedical data, NCI created two large collections of their broad-use studies. These collections comply with data use limitations for how secondary access to studies are determined.
- NCI’s Collection of Data Sets for General Research Use contains all of NCI’s authorized individual-level data sets, which permit approved users to explore broad research interest, including methods and tool development.
- NCI’s Collection of Data Sets for Health, Medical, and Biomedical Research Purposes is permitted for research interests specific to any health, medical, or biomedical research only. This includes methods and tool development. Research interests involving ancestry/populations studies must be dependent on a health/medical condition.
Investigators can access these broad-use collections by submitting a new project request or modifying an existing approved project.
After registering new genomic studies for release through dbGaP, NCI automatically adds them to these collections. NCI anticipates these broad-use collections will grow with an additional 575 studies to be released in the future.
NCI’s Office of Data Sharing is committed to developing improvements to NCI-managed data access processes that allow for faster exploration of scientific questions. Visit dbGaP Collections to explore these new resources.