MITB Banner

How Reliable Are Neural Networks Classifiers Against Unforeseen Adversarial Attacks

Share

 The reliability of a machine learning model is assessed based on how erroneous it is. Lesser the number of errors, better the prediction. In theory, ML models should be able to predict, classify and recommend right every single time. However, when deployed in the real world, the model has a very good chance of running into information that has never appeared during training. To prepare the model for such untimely adversities, adversarial techniques have been developed.

Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake.

What Do Adversarial Examples Look Like

Source: OpenAI

In the above example, the image of the panda is given an adversarial input. This input is overlaid on a typical image to cause a classifier to miscategorize a panda as a gibbon.

But how critical is the monitoring of adversarial attacks on model?

For instance, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign. A confused car on a busy day is a potential catastrophe packed in a 2000 pound metal box.

Nevertheless, extreme cases need not only be scenarios of life and death but also can be a flawed facial recognition system installed at a money lending machine or a smart door system that doesn’t recognise its owner.

The whole idea of introducing adversarial examples to a model during training is to ensure AI safety. In short, avoid misclassification.

Preparing For Unforeseen Attacks

Notably, common classification benchmarks often do not naturally provide such protections on their own. Further, besides explicitly incorporating this information, they give away if the learning algorithms are inferring good similarity structure.

In an attempt to fortify a model’s defense strategy to avoid misclassification, researchers at Open AI introduce a new metric, UAR (Unforeseen Attack Robustness), which evaluates the robustness of a single model against an unanticipated attack, and highlights the need to measure performance across a more diverse range of unforeseen attacks.

Source: OpenAI

If one looks at the picture above, they can make an educated guess that it is a picture of a black swan. They can also see the distortions in the image but that’s fine. It doesn’t make much of a difference to a human. However, change in pixel values ends up confusing the machine learning model.

A model can be trained to detect adversities such as the above distortions. These learnings need not necessarily be transferred for future unforeseen distortions.

The researchers propose a 3-step method to tackle the above challenge, which can be summarized as follows:

  • Evaluate models against adversarial distortions that are not similar to those used in training. Consider  L_1, L_2 JPEG, Elastic, and Fog attacks as a starting point. They provide implementations, pre-trained models, and calibrations for a variety of attacks in our code package
  • Choose a calibrated range of distortion sizes by evaluating against adversarially trained models
  • Compute the UAR scores of adversarially trained models for several different distortion types. A UAR score near 100 against an unforeseen adversarial attack implies performance comparable to a defense with prior knowledge of the attack

UAR (Unforeseen Attack Robustness) normalizing the accuracy of a model M against adversarial training accuracy(ATA) is given as:

An UAR score greater than 100 means that a defense is outperforming an adversarially trained model on that distortion.

Envisioning AI Safety Through Adversity

Since increasing robustness against one distortion type can decrease robustness against others, the authors in their paper, insist on measuring performance on different distortions is important to avoid overfitting to a specific type, especially when a defense is constructed with adversarial training.

The researchers also find “Common” visual corruptions such as (non-adversarial) fog, blur, or pixelation to be rich with solutions to achieve adversarial robustness. Fog or blur effects on images have emerged as another avenue for measuring the robustness of computer vision models. The robustness to such common corruptions is considered to be linked to adversarial robustness and proposes corruption robustness as an easily computed indicator of adversarial robustness.

However, researchers at OpenAI believe that reliability of a machine learning model shouldn’t just stop at assessing robustness but future research should also focus on building  a diverse toolbox for understanding machine learning models, including visualization, disentanglement of relevant features, and measuring extrapolation to different datasets or to the long tail of natural but unusual inputs to get a clearer picture.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.