Cancer Data Science Pulse

Your Guide to NCI Data Science Resources for Breast Cancer Research

In the United States, breast cancer is the second most common type of cancer for women. In recognition of Breast Cancer Awareness Month, we invite you to explore our list of NCI data science resources designed to help advance your research.

Find Data to Support Your Breast Cancer Research

Get Tools You Can Use

magnifying last icon Search biomarkers, data sets, and collections with the Early Detection Research Network.
hand and key icon

Request access to controlled data through the database of Genotypes and Phenotypes, which has 82 breast cancer studies with 112 data sets.

download icon Download medical images of breast cancer through The Cancer Imaging Archive.
eye icon Keep an eye out for the Confluence Project, which is developing a large research resource to uncover breast cancer genetics through genome-wide association studies. Data from the project is set to be available for request in 2023.

Keep Up with NCI’s Data Science Contributions to Breast Cancer Research

Subscribe to receive weekly NCI Data Science updates in your inbox, including upcoming events, the latest blog posts, and recent news releases.

NCI Multi-omic Data Helps Identify Resistance to Chemotherapy in Patients with Breast Cancer

  • Read how researchers studied resistance to chemotherapy in triple-negative breast cancer using NIH’s database of Genotypes and Phenotypes and the Proteomic Data Commons.

Machine Intelligence/Data Science in Medical Imaging of Breast Cancer and COVID-19

  • Listen to the recording of Dr. Maryellen L. Giger at the April 2022 NCI Imaging and Informatics Community Webinar.  She spoke on the development, validation, database needs, and future implementation of artificial intelligence in the clinical radiology workflow, including case studies of breast cancer and COVID-19.

Breast Cancer Data Collections Released from the I-SPY2 Clinical Trial

  • Explore two breast cancer imaging data collections that The Cancer Imaging Archive has released and made publicly available. The collections are affiliated with the I-SPY Trial, an ongoing clinical trial.

Updated Data in Proteomic Data Commons: Breast, Ovarian, and Pediatric Cancer Studies

  • Download 1,400 files of open-access proteomic data related to breast, ovarian, and pediatric cancer studies.

Read a CBIIT Spotlight

Daoud MeerzamanDaoud Meerzaman, Ph.D., the Computational Genomics and Bioinformatics Branch chief for CBIIT, contributed to a study focusing on bioinformatic benchmarks for genetic variants in two distinct cell lines linked to breast cancer. This study used data from multiple next-generation sequencing to detect and confirm germline and somatic variants in two distinct cell lines linked to breast cancer.

We’ve reached out to Dr. Meerzaman since the 2021 publication, and he tells us the paper has been cited more than 100 times.

“Like other cancer types, breast cancer is a disease that involves gene mutations. Robust and accurate mutation caller algorithms are of critical importance. Benchmarking of newer algorithms is required to identify and implement the best practices in the identification of mutations. This is not only crucial for computational scientists to identify mutation callers that are accurate and trustworthy, but it is more critical for clinicians to trust these mutations because, eventually, physicians base their treatment decision using these mutations.”

“Previous research projects focused on using a single variable such as DNA, RNA, or protein. Fortunately, the paradigm is shifting, and scientists are using artificial intelligence and machine learning to carry out integrative research approaches utilizing radiological and pathological images with genomic variation to improve cancer diagnosis and treatment,” he adds.

Apollo logoDr. Meerzaman also works with the Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) network, a collaboration between NCI, the Department of Defense, and the Department of Veterans Affairs. APOLLO’s first phase is focused on the full proteogenomic profiling of cancers of the lung, ovary, endometrium, prostate, and breast.

The APOLLO network is featured in NCI’s Data Science Time Capsule, along with several other themes showcasing the status of cancer research and data.

Older Post
Your Guide to the 2023 NIH Data Management and Sharing Policy
Newer Post
A Tail-Wagging Good Time—Working on the Integrated Canine Data Commons

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.


Enter the characters shown in the image.