Active Hackathon

What Are Adversarial Patches & Why Should We Worry

An adversarial patch is a technique that has been devised to fool the machine learning models. These patches can be a physical obstruction in the captured photos or random photos using algorithms. 

Computer vision models are trained on photos that are usually straightforward. There can be different orientations or even different resolutions in the training dataset but rarely any image which has a patch or an unidentified object in the image.


Sign up for your weekly dose of what's up in emerging technology.

Adversarial patch attacks are the most practical threat model against real-world computer vision systems.

The fact that adversarial patches exist poses two questions:

  • How can it be used?
  • How to defend models from being fooled?

How Models Can Be Fooled

via paper by Tom Brown et al.,

As shown above, whenever a digital sticker is placed beside the object, the machine learning model fumbles to identify the main object (case in point: banana), instead, the classifier sees the banana as a toaster!

With experiments carried out by the researchers at Google, and by figuring out how to generate a patch methodically, they paved their way for more solutions to defend these attacks. These patches can bring down facial recognition systems that are currently in use, and can even create troubles for surveillance systems and self-driving cars. 

Apart from adversarial patches, there is adversarial reprogramming, which is a new class of attacks where a model is repurposed to perform a new task. In case of a convolutional neural network, new parameters are effectively introduced. These kinds of tiny updates in the network are adversarial programs. The attacker may then try to adversarially reprogram across tasks with very different datasets. 

Even a human in loop solution is considered, and might not identify the intent behind something ambiguous as a digital sticker as shown above. 

Is the digital sticker, an art form, or a holographic signature; or is it a patch? One can end up chasing the tail in such scenarios.

Is There A Way Out

Most published defences against patch attacks are based on preprocessing input images to mitigate adversarial noise. This attack is significant because the attacker does not need to know what image they are attacking while constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use. The existing defence techniques which focus on defending against small perturbations may not be robust to larger perturbations.

In a paper under review at ICLR 2020, the unnamed authors proposed certified defences for an adversarial patch. To make things difficult, they even choreographed white-box attacks that would break the model further. Consequently, they also present a solution that would keep the accuracy of the model intact. 

Prior to this work, there were two other works, which were aimed at thwarting adversarial patches:

  • Digital watermarking (DW) by Hayes, 2018, could detect unusually dense regions of large gradient entries using saliency maps, before masking them out in the image. Despite a 12% drop in accuracy on clean and non-adversarial images, this defence method supposedly achieved an empirical adversarial accuracy of 63% for non-targeted patch attacks.
  • Local Gradient Smoothing (LGS) by Naseer et al. 2019 is based on the empirical observation that pixel values tend to change sharply within these adversarial patches. 

Notably, common classification benchmarks often do not naturally provide such protections on their own. Further, besides explicitly incorporating this information, they give away if the learning algorithms are inferring good similarity structure.

In an attempt to fortify a model’s defence strategy, researchers at Open AI have also introduced a new metric known as UAR (Unforeseen Attack Robustness) that has been designed to evaluate the robustness of a single model against an unanticipated attack. It can help developers prepare for a more diverse range of unforeseen attacks.

In practice, the adversarial attacks need not necessarily stick to textbook cases. So, it is the responsibility of an ML practitioner to identify the blind spots in these systems by being proactive and designing attacks that would expose the flaws.

Also, check our analysis on how the state-of-the-art image classification model fumbles in the presence of noise here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: Enabling a Data-Driven culture within BFSI GCCs in India

Data is the key element across all the three tenets of engineering brilliance, customer-centricity and talent strategy and engagement and will continue to help us deliver on our transformation agenda. Our data-driven culture fosters continuous performance improvement to create differentiated experiences and enable growth.

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter