An adversarial patch is a technique that has been devised to fool the machine learning models. These patches can be a physical obstruction in the captured photos or random photos using algorithms.
Computer vision models are trained on photos that are usually straightforward. There can be different orientations or even different resolutions in the training dataset but rarely any image which has a patch or an unidentified object in the image.
Adversarial patch attacks are the most practical threat model against real-world computer vision systems.
The fact that adversarial patches exist poses two questions:
- How can it be used?
- How to defend models from being fooled?
How Models Can Be Fooled
As shown above, whenever a digital sticker is placed beside the object, the machine learning model fumbles to identify the main object (case in point: banana), instead, the classifier sees the banana as a toaster!
With experiments carried out by the researchers at Google, and by figuring out how to generate a patch methodically, they paved their way for more solutions to defend these attacks. These patches can bring down facial recognition systems that are currently in use, and can even create troubles for surveillance systems and self-driving cars.
Apart from adversarial patches, there is adversarial reprogramming, which is a new class of attacks where a model is repurposed to perform a new task. In case of a convolutional neural network, new parameters are effectively introduced. These kinds of tiny updates in the network are adversarial programs. The attacker may then try to adversarially reprogram across tasks with very different datasets.
Even a human in loop solution is considered, and might not identify the intent behind something ambiguous as a digital sticker as shown above.
Is the digital sticker, an art form, or a holographic signature; or is it a patch? One can end up chasing the tail in such scenarios.
Is There A Way Out
Most published defences against patch attacks are based on preprocessing input images to mitigate adversarial noise. This attack is significant because the attacker does not need to know what image they are attacking while constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use. The existing defence techniques which focus on defending against small perturbations may not be robust to larger perturbations.
In a paper under review at ICLR 2020, the unnamed authors proposed certified defences for an adversarial patch. To make things difficult, they even choreographed white-box attacks that would break the model further. Consequently, they also present a solution that would keep the accuracy of the model intact.
Prior to this work, there were two other works, which were aimed at thwarting adversarial patches:
- Digital watermarking (DW) by Hayes, 2018, could detect unusually dense regions of large gradient entries using saliency maps, before masking them out in the image. Despite a 12% drop in accuracy on clean and non-adversarial images, this defence method supposedly achieved an empirical adversarial accuracy of 63% for non-targeted patch attacks.
- Local Gradient Smoothing (LGS) by Naseer et al. 2019 is based on the empirical observation that pixel values tend to change sharply within these adversarial patches.
Notably, common classification benchmarks often do not naturally provide such protections on their own. Further, besides explicitly incorporating this information, they give away if the learning algorithms are inferring good similarity structure.
In an attempt to fortify a model’s defence strategy, researchers at Open AI have also introduced a new metric known as UAR (Unforeseen Attack Robustness) that has been designed to evaluate the robustness of a single model against an unanticipated attack. It can help developers prepare for a more diverse range of unforeseen attacks.
In practice, the adversarial attacks need not necessarily stick to textbook cases. So, it is the responsibility of an ML practitioner to identify the blind spots in these systems by being proactive and designing attacks that would expose the flaws.
Also, check our analysis on how the state-of-the-art image classification model fumbles in the presence of noise here.