With the advent of APIs that offer state-of-the-art services a click away, setting up a machine learning shop has become more accessible. But with rapid democratisation, there is a risk of non-ML players who have jumped the gun, finding themselves in a flurry of privacy attacks, never been heard of before.
In a first of its kind survey carried out on ML privacy by a team from Czech Technical University, the researchers address the different ways an ML application can be vulnerable. In privacy-related attacks, wrote the researchers, an adversary’s goal is related to gaining knowledge, not intended to be shared, such as knowledge about the training data or information about the model, or even extracting information about properties of the data.
We list down a few of the commonly encountered privacy concerns below:
via paper by Maria Rigaki & Sebastian Garcia
Black-box attacks are those attacks where the adversary does not know the model parameters, architecture or training data. Today, personal data is continuously leveraged by internet companies to train their machine learning models that power machine learning-based applications. It is expected that these models should not reveal information about the data used for their training. However, the attackers can still use the information that the model has learned unintentionally.’
In the case of white-box, the adversary has either complete access to the target model parameters or their loss gradients during training. This is commonly seen in most distributed modes of training.
All the attacks fall under either black-box or white-box such as the following:
Membership Inference Attacks
According to the survey, this is the most popular category of attacks. A type of black-box attack it is carried against supervised machine learning models. Membership inference tries to check whether an input sample was used as part of the training set. With improved access to model parameters and gradients allowed, the accuracy of white-box membership inference attacks improves. In the case of generative models such as GANs, the goal of the attacks is to retrieve information about the training data using varying degrees of knowledge of the data generating elements.
These attacks try to recreate one or more training samples and/or their respective training labels. One such well-documented attack is Adversarial Reprogramming, where a model is repurposed to perform a new task. An adversarial program can be thought of as an additive contribution to network input. An additive offset to a neural network’s input is equivalent to a modification of its first layer biases. In the case of a CNN, new parameters are effectively introduced.
These kinds of tiny updates in the network is an adversarial program. The attacker may try to reprogram across tasks with very different datasets adversarially. The potential of these attacks is considerably high. It can result in malpractices ranging from theft of computational resources from public-facing services to abusing machine learning services for tasks violating the ethical principles of system providers or even repurposing of AI-driven assistants into spies or spambots.
Property Inference Attacks
Property inference is the ability to extract dataset properties which were not explicitly encoded as features or were not correlated to the learning task. One such example can be the extraction of information about the ratio of women and men in a patient dataset where the info is unlabeled.
The extracted information is usually not related to the training task and is learned from the model unintentionally. Even well generalised models may learn properties that are relevant to the whole input data. From an adversarial perspective, it can be the properties that may be inferred from the specific subset of data that was used for training, or about a particular individual.
Model Extraction Attacks
The adversary here is interested in creating a substitute that learns the same task as the target model, equally well or better. The objective of the model extraction attack is to create an alternative that replicates the decision boundary of the model as faithfully as possible.
Model extraction attacks can serve as doorways to launch other adversarial attacks as well. Apart from creating substitute models, there are also approaches that focus on recovering information from the target model such as hyper-parameters or information architectural properties such as activation types, optimisation algorithm, number of layers, etc.
In all the attacks discussed above, the range of knowledge can range from having access to a machine learning API to knowing the full model parameters and training settings. There is a range of possibilities in between these two extremes, such as partial knowledge of the model architecture, its hyper-parameters or training setup. That said, the authors of this work posit that these attacks can be leveraged for positive outcomes such as auditing black-box models to check for data owner’s authorisation.