Google introduced a new privacy testing library in TensorFlow to empower developers to analyse the privacy properties of classification models. This will become a part of TensorFlow Privacy, which was introduced in 2019 to enable privacy within AI models. Today, awareness about privacy among people is more than ever, and it is only growing as companies are coming under the scanner of experts as to how organisations are collecting and processing users’ data. Such circumstances forced governments from across the world to devise privacy protection laws such as GDPR, PDP and CCPA. Consequently, organisations have become critical of their AI models’ outcomes.
One of the biggest challenges for companies while maintaining privacy is to avoid the leakage of information from AI models. In an attempt to mitigate such hurdles, Google introduced differential privacy, which adds noise to hide individual examples in the training dataset. However, according to the researchers of Google, it was designed for academic worst-case scenarios and can significantly affect model accuracy. However, researchers from Cornell University started experimenting with various approaches to ensure privacy with ML models and came up with membership inference attacks.
Membership Inference Attack With TensorFlow
According to Google’s researchers, membership inference attack is a cost-effective methodology that predicts whether a specific piece of data was used during training. Membership inference attack technique has seen a wide range of applications in recent years, especially in the privacy domain. In April 2020, membership inference attack was an inspiration to the work by the University of Edinburgh and Alan Turing Institute to identify if a model can forget the data to ensure privacy.
After using the membership inference tests internally, researchers from Google have now released the support of the technique as a library with TensorFlow. One of the most significant advantages of membership inference attack is its simplicity that does not require any re-training, thereby avoiding the disruption in developers workflows.
The researchers performed a test of membership inference attack on models of CIFAR10 (Canadian Institute For Advanced Research) — an object classification dataset. The dataset contains 60,000 32×32 colour images in 10 different classes representing aeroplanes, car, birds, trucks, among others. “The test produced the vulnerability score that determines whether the model leaks information from the training set. We found that this vulnerability score often decreases with heuristics such as early stopping or using DP-SGD for training,” researchers from Google wrote on the TensorFlow blog.
How Will It Help
Determining whether a data set was present in the training models will allow developers to check if their models are able to preserve privacy before deploying in production. The researchers believe that with membership inference attack feature in TensorFlow, data scientists would explore better architecture choice for their models and use regularisation techniques such as early stopping, dropout, weight decay, and input augmentation.
In addition, the researchers also hope that membership inference attack will become the starting point for the community to strive towards introducing new architectures that can fortify the leaks, and in turn, preserve privacy.
Currently, membership inference attack is only limited to classifiers, and in future, the researchers would further extend its capabilities to assist developers in leveraging the membership inference attack with other data science techniques.
Privacy is gradually becoming the core of any machine learning models as it has drawn concerns from around the world. Although a different approach, Julia Computing, in late 2019, demonstrated training ML models with homomorphic encryption for privacy. Besides, PyTorch introduced CRYPTEN for ensuring privacy while processing data with homomorphic encryption. However, with membership inference attack, TensorFlow opened up new possibilities for developers to better examine their ML models and bring trust among users.