Now Reading
A GAN Model That Lets You Share Data Anonymously

A GAN Model That Lets You Share Data Anonymously

  • FELICIA allows stakeholders to collaborate in a private and distributed data-sharing environment.

Goldilocks Dilemma is a common challenge in medicine. It involves sharing medical data for collaborative learning and solution derivation while ensuring that the patient’s privacy is not compromised. Several countries have stringent laws and policies to protect patients’ data and information, which, although necessary, often pose a problem for providing diagnostics, and eventually better care.

To overcome this challenge, the researchers at Microsoft and the University of British Columbia have developed Federated Learning with a Centralized Adversary or FELICIA — a framework which combines the generative adversarial network model with a ‘centralized adversary’. The framework allows stakeholders to collaborate in a private and distributed data-sharing environment.

Register for our Workshop on How To Start Your Career In Data Science?

Data-Sharing Dilemma

Image analysis is an important part of medical research to build high-end diagnostic models of diseases. Deep learning is progressively being used in modern medical computer vision techniques such as disease detection, classification, biomedical segmentation etc.

However, building such tools and modes calls for exposure to several medical cases. Images from a single source may carry biases in terms of demographics, equipment, and means of acquisition. Training the model on such sources may mean that it could perform poorly for different sets of population. The solution to this challenge is to obtain additional information from other data owners, provided the sources are kept private. 

With FELICIA, the researchers showed that a generative mechanism enabling collaborative learning could make this possible. The research showed a data owner with fewer image samples can generate ‘high-quality synthetic images with high utility’ without sharing the source. 

How Does FELICIA Work?

Considerable work has been done in the recent past to explore sharing private data, and one of the common approaches includes the use of GANs. GANs has two components– Generator G and a Discriminator D. The generator creates samples, and the discriminator attempts to differentiate between the generated and the real-world samples.

PrivGAN is an architecture of GAN that was originally developed to generate synthetic data while giving protection against membership inference attacks. The FELICIA model is heavily motivated by PrivGAN architecture. It extends the PrivGAN model to a federated learning setting, which refers to training algorithms across decentralised devices containing the data samples without actually exchanging the samples. 

FELICIA model involves the creation of synthetic data that allows multiple downstream use cases and exploration. This synthetic data contains more utility than single datasets and can also be used as benchmarks for machine learning in healthcare. The researchers simulated two hospitals with different populations for the experiment, operating under highly restrictive regulation that prevents sharing of image data.

FELICIA mechanism involves:

  • Duplication of the discriminator and generator architecture of the base GAN to each of the component D-G pairs of FELICIA
  • Selection of privacy discriminator identical in architecture to the other discriminators
  • Training the base GAN on the entire training data to generate synthetic yet realistic images
  • Optimising the model outcomes with the base GAN hyperparameters to obtain ‘good-looking’ samples

Wrapping Up

FELICIA can be implemented with a wide variety of GANs, depending on the type of data and use cases. The researchers said one of its most relevant use cases is in the case of pandemics such as COVID-19. Hospitals could benefit greatly, especially during the beginning of an outbreak, from the data gathered from affected regions.

Another application of FELICIA includes improving diagnostic, for example, cancer pathology images, by augmentation of the image dataset.

The team is now working on implementing FELICIA with progressive GAN. With this, the researchers hope to generate highly complex medical data such as X-rays, CT scans, and histopathology slides in federated learning settings using samples obtained from non-local data owners.

Read the full paper here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top