Cancer Data Science Pulse
NCI Workshop Addresses Current Challenges in De-Identifying Medical Images
For decades, oncologists have relied on information from non-invasive imaging, not only to help diagnose many forms of cancer, but also to plan and assess cancer treatment and care.
Now, with advanced technologies, such as artificial intelligence (AI) and machine learning, imaging data are rapidly transforming how we may diagnose, treat, and predict outcomes for a variety of cancers. To a large extent, imaging AI depends on the availability of substantial amounts of data to support its development.
Sharing medical imaging data requires “de-identification,” which is the removal of protected health information (e.g., the patient’s name, date of birth, and medical record number) from an image file. Strategies for image de-identification aim to reduce the risk of patient identification and comply with the requirements set forth by the Health Insurance Portability and Accountability Act. There are also de-identification profiles that correspond to the DICOM (Digital Imaging Communication in Medicine) standard and best practices, which ensures we preserve research-critical information that are not patient identifiers.
We use imaging data in a variety of domains (e.g., radiology, digital pathology, and multi-spectral fluorescent microscopy) and we create those images using a variety of imaging modalities and manufacturers. Patient identifiers may be visible, or hidden, which further complicates the de-identification processes. We need to be certain we remove any potential for re-identification—that is, information that links the image back to the original patient.
We addressed many of these de-identification challenges in an NCI-hosted virtual workshop held in May. The workshop brought together more than 600 developers, researchers, and data scientists from the United States, Canada, and the European Union (EU). Over the course of eight sessions, attendees heard from expert panels on key aspects of de-dentification. Although it’s not possible to capture every aspect of these discussions in this short blog, readers can now view the archived meeting.
The bottom line is that there is not a turnkey, off-the-shelf solution that works in every situation and for every image. Still, we have made a lot of progress. De-identification continues to receive substantial research interest, as evidenced by the workshop presentations.
The Medical Imaging De-Identification (MIDI) Workshop Overview
Session 1: Report of the MIDI Task Group—Best Practices and Recommendations
- Summary: David Clunie, MBBS, Chairperson of NCI’s MIDI Task Group, kicked off the event with a summary of best practices and recommendations from the Task Group Report, followed by a Q&A session.
- Key Takeaway: Sharing medical images to stimulate data science AI research requires due diligence in de-identifying images, both to protect the patient’s privacy and to preserve research-critical information. The MIDI Task Group recommends best practices for image de-identification.
Session 2: Tools for Conventional Approaches to De-Identification
- Summary: Fred Prior, Ph.D., University of Arkansas for Medical Sciences (UAMS), chaired a discussion on current methods for de-identifying images. Michael Rutherford, M.S., also of UAMS, spotlighted the tools used for images in The Cancer Imaging Archive, a repository that adheres to zero-tolerance de-identification. Stephen Moore, M.S., Washington University School of Medicine in St. Louis, shared about XNAT, an open-source imaging informatics platform.
- Key Takeaway: Established methods and protocols are available to help you with imaging de-identification.
Session 3: International Approaches to De-Identification
- Summary: William Parker, M.D., University of British Columbia, chaired this panel focusing on de-identification practices outside of the United States. Haridimos Kondylakis, Ph.D., Institute of Computer Science, Foundation of Research & Technology, described his experiences applying AI tools to medical imaging in EU projects. Christian Ludwigs, M.Sc., Aigora GmbH, shared the legal framework and best practices for de-identification in the EU.
- Key Takeaway: The tools, resources, and infrastructure needed for de-identification must comply with local regulations. The EU, in particular, has made significant strides in finding solutions to de-identify data that comply with EU regulations.
Session 4: Industry Panel on Image De-Identification
- Summary: Juergen Klenk, Ph.D., Deloitte Consulting, chaired this panel featuring five industry representatives. Each panelist gave a flash presentation on their approaches to de-identification. Panelists included Abraham Gutman, M.S., AG Mednet; Dan Marcus, Ph.D., Flywheel; Bob Lou, M.D., Google; Lawrence “Tony” O’Sullivan, M.S., IBIS; and Jiri Dobes, Ph.D., John Snow Labs.
- Key Takeaway: De-identifying data is a topic of interest to industry today, and several innovative solutions have come from large and small companies.
Session 5: Pathology Whole Slide Image De-Identification
- Summary: Adam Taylor, Ph.D., Sage Bionetworks, chaired this discussion on de-identifying whole slide images. Tom Bisson, Ph.D., Charité Universitätsmedizin Berlin, showed how whole slide images can be de-identified for research and education. David Gutman, M.D., Ph.D., Emory University, gave examples of open-source tools for de-identifying histology images.
- Key Takeaway: Histology images may be among the easier image types to de-identify, and some open source tools are readily available.
Session 6: De-Facing
- Summary: Ying Xiao, Ph.D., Hospital of the University of Pennsylvania, chaired a panel on how removing facial information (de-facing) impacts the usefulness of head and neck cross-sectional or brain images. Christopher Schwarz, Ph.D., Mayo Clinic, presented on risks of face recognition and de-identification of head images using the tool, “mri_reface.” Douglas Greve, Ph.D., Massachusetts General Hospital/Harvard, discussed his experience using MIDEFACE, a minimally invasive de-facing approach.
- Key Takeaway: We need to be sure that when we remove facial information, we don’t compromise the usefulness of the image for secondary analyses.
Session 7: The Role of AI in Image De-Identification
- Summary: Judy Wawira Gichoya, M.D., Emory University, chaired this panel examining AI algorithms in de-identification. George Shih, M.D., Weill Cornell Medical College, discussed pixel de-identification using AI. Adrienne Kline, M.D., Ph.D., Northwestern University, gave an example of PyLogik, an open-source resource for medical image de-identification.
- Key Takeaway: AI-enabled tools may offer promising solutions for de-identification, including a new, open source tool called PyLogik. However, AI may also carry risks for re-identifying certain pseudo identifiers, such as race, body shape, etc.
Session 8: NCI MIDI Data Sets and Pipeline
- Summary: This session, chaired by Keyvan Farahani, introduced NCI’s MIDI initiative, which aims to develop a cloud-based pipeline for image de-identification, as well as medical image data sets with synthetic (i.e., simulated) patient identifiers. Presenters included Ben Kopchick, Ph.D., Deloitte Consulting, and Fred Prior, Ph.D., University of Arkansas for Medical Sciences.
- Key Takeaway: Although we’ve made a lot of progress, we still need to find more semi-automated and scalable methods for de-identifying medical images. Through the MIDI Pipeline project, we’re working to develop a cloud-based, semi-automated, and scalable workflow for de-identifying images using synthetic and real patient identifying data.
Categories
- Data Sharing (65)
- Informatics Tools (41)
- Training (39)
- Genomics (36)
- Data Standards (35)
- Precision Medicine (34)
- Data Commons (33)
- Data Sets (26)
- Machine Learning (24)
- Artificial Intelligence (23)
- Seminar Series (22)
- Leadership Updates (14)
- Imaging (12)
- Policy (9)
- High-Performance Computing (HPC) (9)
- Jobs & Fellowships (7)
- Semantics (6)
- Funding (6)
- Proteomics (5)
- Awards & Recognition (3)
- Publications (2)
- Request for Information (2)
- Information Technology (2)
- Childhood Cancer Data Initiative (1)
Leave a Reply