Artificial intelligence (AI) is increasingly being adopted in health care. Maintaining privacy when using patient data for machine learning is a critical factor impacting AI adoption rates in the health care industry. Recently a global research collaboration led by scientists from the Technical University of Munich (TUM) created an open-source AI framework that provides end-to-end privacy for deep learning on multi-institutional health care imaging data and published their peer-reviewed study in Nature Machine Intelligence.
AI machine learning is gaining traction in many industries, especially in health care. Globally, private investment in AI for improving human health totaled USD 13.8 billion in 2020, a 4.5 times increase from the year prior according to The AI Index 2021 report by Stanford University’s Human-Centered Artificial Intelligence Institute (HAI). Innovative solutions are needed in health care to preserve patient privacy when training machine learning without requiring data transfer.
A global team of scientists from Germany, the United Kingdom, the United States, France, and Brazil jointly created an open-source software framework for end-to-end privacy-preserving decentralized deep learning using multi-institutional medical imaging data.
Why Data Anonymization Alone is Not Sufficient
“It’s common for patient data to be anonymized or pseudonymized at the originating institution, then transmitted to and stored at the site of analysis and model training (known as centralized data sharing),” wrote the researchers. “However, anonymization has proven to provide insufficient protection against re-identification attacks. Therefore, large-scale collection, aggregation and transmission of patient data is critical from a legal and an ethical viewpoint. Furthermore, it is a fundamental patient right to be in control of the storage, transmission and usage of personal health data.”
The team’s approach is a framework called PriMIA (Privacy-preserving Medical Image Analysis). “Our framework incorporates differentially private federated model training with encrypted aggregation of model updates as well as encrypted remote inference,” the researchers wrote.
According to the researchers, PriMIA is easily configurable by the user, supports many medical imaging data formats, and improves federated learning training with features that include diverse data augmentation, weighted gradient descent/federated averaging, local early stopping, differential privacy dataset statistics exchange, and federation-wide hyperparameter optimization.
Why Federated Learning
Federated learning is a method that enables AI models to train on distributed datasets without the users directly providing their data. The advantages of federated learning are that it improves privacy, reduces liability, and reduces the bandwidth required in uploading massive datasets. An added benefit is that the training data resides on the user’s hardware.
Deep Convolutional Neural Network on Real-Life Data
The training dataset of over 5,100 images was a pediatric pneumonia dataset that was reviewed by a radiologist for image quality and representativeness. The validation dataset consisted of over 600 images.
The team trained a ResNet-18 deep convolutional neural network (CNN) on pediatric chest radiography using federated learning (FL) augmented with PriMIA, a framework that is compatible with a wide variety of medical imaging formats.
PriMIA is an extension to the PySyft and PyGrid open-source privacy-preserving machine learning (PPML) tools. PySyft is a Python framework that enables encrypted deep learning and remote task execution. PyGrid is a peer-to-peer platform that enables data owners to manage their own private data clusters for private data science and federated learning.
For testing the AI framework, the team collected over 490 chest radiographs from two university hospitals. The team reported that the AI model’s classification was “on par with locally, non-securely trained models.”
According to the scientists, they demonstrated that not only did “the protections provided prevent the reconstruction of usable data by a gradient-based model inversion attack,” but also that they “successfully employ the trained model in an end-to-end encrypted remote inference scenario using secure multi-party computation to prevent the disclosure of the data and the model.”
Copyright © 2021 Cami Rosso All rights reserved.
Credit: Google News