Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.
An adversarial attacker could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign. A confused car on a busy day is a potential catastrophe packed in a 2000 pound metal box.
So far, the majority of adversarial attacks, the attacker designed few perturbations to produce an output specific to a given input. The attacks consisted of untargeted attacks that aim to degrade the performance of a model. And they did this without the need to produce a specific output.
An attack against a classifier could be targeted a specific desired output class for each input image, or an attack against a reinforcement learning agent can also induce that agent to enter a specific state.
In practice, the adversarial attacks need not necessarily stick to textbook cases. So,it is the responsibility of an ML practitioner to identify the blind spots in these systems by being proactive and designing attacks that would expose the flaws.
To introduce a novel and more challenging adversarial goal, the creator of GANs, Ian Goodfellow collaborated with two other researchers from Google and released a paper that demonstrates adversarial reprogramming. The objective here is to perform a task chosen by the attacker, without the attacker needing to compute the specific desired output.
What Is Adversarial Reprogramming
Adversarial Reprogramming is a class of attacks where a model is repurposed to perform a new task.
An adversarial program can be thought of as an additive contribution to network input. An additive offset to a neural network’s input is equivalent to a modification of its first layer biases. In case of a convolutional neural network, new parameters are effectively introduced. These kinds of tiny updates in the network is an adversarial program.
The attacker may try to adversarially reprogram across tasks with very different datasets. This makes the task of adversarial reprogramming more challenging.
Adversarial reprogramming as illustrated above can be summarised as follows:
(a) The ImageNet labels and adversarial task labels are mapped
(b) Images from the adversarial task (left) are embedded at the centre of an adversarial program (middle), that results in adversarial images (right).
(c) The network then predicts ImageNet labels that map to the adversarial task labels when presented with adversarial images.
To demonstrate the vulnerability of deep neural networks and the feasibility of adversarial reprogramming, the researchers conducted experiments on six architectures trained on ImageNet. In each case, they reprogrammed the network to perform three different adversarial tasks: counting squares, MNIST classification, and CIFAR-10 classification.
The successful demonstration of adversarial reprogramming has exposed a new area of vulnerabilities in ML systems. These attacks are more challenging than the ones done through transfer learning and other convention techniques.
The potential of these attacks is considerably high and can result in malpractices ranging from theft of computational resources from public-facing services, abusing machine learning services for tasks violating the ethical principles of system providers, or even repurposing of AI-driven assistants into spies or spambots.
The authors believe that this study will open up further research in this area as there is a lot left to explore. For instance, adversarial reprogramming in recurrent neural networks is even more challenging. If models like RNNs can be flexibly reprogrammed as mentioned above, then the much-feared computational theft might unfold to something more undesirable. This can also lead to a surge in unethical practices by machine learning service providers.