MITB Banner

What Are Adversarial Patches & Why Should We Worry

Share

An adversarial patch is a technique that has been devised to fool the machine learning models. These patches can be a physical obstruction in the captured photos or random photos using algorithms. 

Computer vision models are trained on photos that are usually straightforward. There can be different orientations or even different resolutions in the training dataset but rarely any image which has a patch or an unidentified object in the image.

Adversarial patch attacks are the most practical threat model against real-world computer vision systems.

The fact that adversarial patches exist poses two questions:

  • How can it be used?
  • How to defend models from being fooled?

How Models Can Be Fooled

via paper by Tom Brown et al.,

As shown above, whenever a digital sticker is placed beside the object, the machine learning model fumbles to identify the main object (case in point: banana), instead, the classifier sees the banana as a toaster!

With experiments carried out by the researchers at Google, and by figuring out how to generate a patch methodically, they paved their way for more solutions to defend these attacks. These patches can bring down facial recognition systems that are currently in use, and can even create troubles for surveillance systems and self-driving cars. 

Apart from adversarial patches, there is adversarial reprogramming, which is a new class of attacks where a model is repurposed to perform a new task. In case of a convolutional neural network, new parameters are effectively introduced. These kinds of tiny updates in the network are adversarial programs. The attacker may then try to adversarially reprogram across tasks with very different datasets. 

Even a human in loop solution is considered, and might not identify the intent behind something ambiguous as a digital sticker as shown above. 

Is the digital sticker, an art form, or a holographic signature; or is it a patch? One can end up chasing the tail in such scenarios.

Is There A Way Out

Most published defences against patch attacks are based on preprocessing input images to mitigate adversarial noise. This attack is significant because the attacker does not need to know what image they are attacking while constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use. The existing defence techniques which focus on defending against small perturbations may not be robust to larger perturbations.

In a paper under review at ICLR 2020, the unnamed authors proposed certified defences for an adversarial patch. To make things difficult, they even choreographed white-box attacks that would break the model further. Consequently, they also present a solution that would keep the accuracy of the model intact. 

Prior to this work, there were two other works, which were aimed at thwarting adversarial patches:

  • Digital watermarking (DW) by Hayes, 2018, could detect unusually dense regions of large gradient entries using saliency maps, before masking them out in the image. Despite a 12% drop in accuracy on clean and non-adversarial images, this defence method supposedly achieved an empirical adversarial accuracy of 63% for non-targeted patch attacks.
  • Local Gradient Smoothing (LGS) by Naseer et al. 2019 is based on the empirical observation that pixel values tend to change sharply within these adversarial patches. 

Notably, common classification benchmarks often do not naturally provide such protections on their own. Further, besides explicitly incorporating this information, they give away if the learning algorithms are inferring good similarity structure.

In an attempt to fortify a model’s defence strategy, researchers at Open AI have also introduced a new metric known as UAR (Unforeseen Attack Robustness) that has been designed to evaluate the robustness of a single model against an unanticipated attack. It can help developers prepare for a more diverse range of unforeseen attacks.

In practice, the adversarial attacks need not necessarily stick to textbook cases. So, it is the responsibility of an ML practitioner to identify the blind spots in these systems by being proactive and designing attacks that would expose the flaws.

Also, check our analysis on how the state-of-the-art image classification model fumbles in the presence of noise here.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.