Cancer Data Science Pulse

Federated Learning—A Solution for Democratizing Data for Cancer Research?

Federated learning (FL) might well be the next paradigm shift in democratizing vast amounts of data from many data sources for use in cancer research. At its most basic level, FL offers a decentralized, but collective, approach to accessing, analyzing, and interpreting data. Without moving any data from its place of origin, researchers still can conduct advanced analytical modeling, such as artificial intelligence (AI) and machine learning (ML), to better understand cancer and its effects.

Artists conception of federated learning. Shows four separate buildings linked together to form a center "brain." The idea is to show how people can be in different places and still contribute equally to research.

  FL has many advantages, such as:

  • breaking downs silos, which means decentralizing data so scientists around the globe have equal opportunities for using data to address their novel questions.
  • increasing data diversity and decreasing data bias, giving researchers access to data that represents a large, more diverse population.
  • expanding data sets, especially in cases of rare cancers, thereby enabling researchers to access government, academia, and industry data that may have been previously inaccessible or difficult to find.
  • maintaining patient privacy, allowing scientists to develop and apply algorithms to anonymized patient-information databases without jeopardizing institutional secure systems or transferring raw data.

This federated approach has the potential to catapult cancer research forward by unlocking previously unavailable data sets.  

Lessons Learned from Federated Learning

At its most basic level, Federated Learning offers a decentralized, but collective, approach to accessing, analyzing, and interpreting data.
I recently had the opportunity to support an FL approach to research on glioblastoma. Glioblastoma is the most common type of primary malignant brain tumor in adults. It’s also rare, impacting about 12,000 people each year in the United States.

In our study, “Federated Learning Enables Big Data for Rare Cancer Boundary Detection,” published in Nature Communications, a network of researchers from 6 continents and 71 geographically distinct sites collaborated to apply the Federated Tumor Segmentation (FeTS) platform to data from 6,314 glioblastoma patients. Our goal was to develop an FL approach for an ML model that could detect tumor boundaries for neurosurgical and radiotherapy planning in patients with glioblastoma. We also wanted to create a blueprint for future FL studies that could be adapted for use in real-world clinical settings, not only for glioblastoma but for other diseases as well.

Setting the Stage for Future Studies

Using an FL approach, we were able to tap large and diverse data sets on glioblastoma without needing to physically move, integrate, or harmonize those data, and without compromising patient privacy. Each institution worked independently, and then results were pooled to fine tune the FeTS model. Our final consensus model was significantly better than earlier models when applied to data from collaborators as well as when we used completely “out-of-sample data” (i.e., all available data on this type of tumor).

Using an FL approach, we were able to tap large and diverse data sets without needing to physically move, integrate, or harmonize those data, and without compromising patient privacy.

This FL approach is a departure from current methods. Typically, with ML, the people developing the model are the only ones with access to all the data needed for training and fine tuning the model, and they perform their work at one location.

In a federated approach, the creator only sees the updates to the system that help improve the model’s quality and accuracy, without being able to see the actual individual level data. This FL approach is particularly important as we continue to use data from Electronic Health Records and other patient-owned information contained in hard-to-access sources to impact cancer care.

Today, FL is being applied to a wide range of cancers. This technology is helping to interpret many types of medical images to aid in cancer diagnosis and track clinical outcomes. Researchers studying diseases such as Alzheimer’s Disease and COVID-19 also are using FL to help build models to predict clinical outcomes.

With FL, the creator only sees the updates to the system that help improve the model’s quality and accuracy, without being able to see the actual individual level data.

The future looks bright for FL, but there are some limitations to this approach. Communication among the groups who are collaborating on an FL model is paramount. It’s vital that each group communicates their results to the larger collective group in a way that’s timely, accurate, and clear. Computational power and bandwidth also may be a hindrance, as ML training tends to require more from local resources, which could potentially slow down the learning cycle. In addition, the larger collective group needs to comply with agreed-upon guidelines to ensure the model is protected from malicious actors. Last, but not least, is the issue of intellectual property. We strive for open data, data sharing, and open technology, but these concepts aren’t fully embraced across the cancer research spectrum. If FL is to occupy a central role in biomedical research, we’ll need to look for new ways of recognizing and compensating contributions from individuals, industry, and institutions.

In summary, FL has the potential to create a streamlined data infrastructure, giving researchers from both public and private industry equal access to data that have been previously unavailable, and without many of the costs related to security, computational power, staffing, and other resources. This “digital expressway” offers a sensible solution for promoting data equity and facilitating scientific discovery in cancer research.   

Looking for More FL Resources?

FL has the potential to create a streamlined data infrastructure, giving researchers from both public and private industry equal access to data that have been previously unavailable, and without many of the costs related to security, computational power, staffing, and other resources.
The FeTS tool used in our glioblastoma study was funded by NCI’s Informatics Technology for Cancer Research (ITCR) program. FeTS is an open-source toolkit with a user-friendly graphical user interface (GUI). The Center for Biomedical Image Computing and Analytics at the University of Pennsylvania developed FeTS and is currently maintaining the tool, in collaboration with Intel Labs, Intel AI, and the Intel Internet of Things Group. Learn more about ITCR’s funding opportunities.

Other funding opportunities for researchers working with FL include the National Library of Medicine’s Research Grants in Biomedical Informatics and Data Science (RO1 Clinical Trials Optional).

NCI also is seeking applications for projects to support the secondary use of data to predict abdominal cancers. See NCI’s Notice of Special Interest on this topic, which expires in January 2024.

Associate Director for Informatics and Data Science Program, CBIIT Senior Investigator, Division of Cancer Epidemiology and Genetics, National Cancer Institute
Older Post
Visualizing Data Using Circular Heatmaps and Biplots—Pro-Tips From NCI Researchers
Newer Post
NCI Brings Cancer Data to Life

Leave a Reply

Vote below about this page’s helpfulness.

Your email address will not be published.

CAPTCHA

Enter the characters shown in the image.