How to ensure robustness of neural networks using FoolBox?

Most advanced machine learning models based on CNN can now be easily fooled by very small changes to the samples on which we are going to make a prediction, and the confidence in such a prediction is much higher than with normal samples.

Most advanced machine learning models based on CNN can now be easily fooled by very small changes to the samples on which we are going to make a prediction, and the confidence in such a prediction is much higher than with normal samples. So, in order to improve our performance on such samples, we should test the model before mass production. In terms of the model’s robustness, in this article, we will talk about adversarial attacks and a Python-based toolbox that can help us test some predefined attacks using a unified and simple API. Following are the major points to be discussed in this article.

Table Of Contents

  1. What are Adversarial Attacks in Machine Learning?
  2. Possible Attack Strategies
  3. How can Foolbox be used for the Robustness of the Model?
  4. Structure of FoolBox
  5. Implementing the FoolBox 

Let’s start the discussion by understanding the adversarial attacks.

What are Adversarial Attacks In Machine Learning?

Adversarial machine learning is a type of machine learning that tries to exploit models by creating hostile assaults based on publicly available model data. To make a machine learning model fail is the most prevalent explanation.

The great majority of machine learning algorithms were created to work on certain problem sets with data from the same statistical distribution for training and testing. Opponents may provide data that violates that statistical assumption when those models are applied to real-world data. This material could be arranged in such a way that it takes advantage of vulnerabilities in the system and taints the findings.

An adversarial attack is a method of causing the machine learning model to misclassify objects by making small changes to them. Such attacks are known to be vulnerable to neural networks (NN). Historically, research into adversarial methods began in the field of image recognition. It has been demonstrated that minor changes in images, such as the addition of insignificant noise, can cause significant changes in classifier predictions and even completely confuse ML models.

Consider the following demonstration of adversarial examples: starting with an image of a Stop Sign, the attacker now typically adds a small perturbation with the goal of forcing the model to make a false prediction as a Yield sign, which the model does as calculated. Both images appear identical to us, but because the model is based on numbers, adding such a perturbation has a significant impact on the pixel values, resulting in a false prediction.

(Source)

Now we will discuss commonly known or identified Strategies of the attacks.

Attack Strategies

Evasion

The most common type of attack is an evasion attack. Spammers and hackers, for example, frequently try to avoid detection by obscuring the content of spam emails and malware. Samples are tampered with in order to avoid detection and be classified as legitimate. This does not imply that you have any control over the training data. Image-based spam, in which the spam content is embedded within an attached image to avoid textual analysis by anti-spam filters, is a good example of evasion. Spoofing attacks against biometric verification systems are another example of evasion.

Poisoning

Poisoning is the process of contaminating training data in an adversarial way. Data collected during operations can be used to retrain machine learning systems. Intrusion detection systems (IDSs), for example, are frequently retrained using such data. An attacker could contaminate this data by injecting malicious samples into the system during operation, causing retraining to fail.

Model Stealing

Model stealing (also known as model extraction) is the act of an adversary probing a black box machine learning system in order to reconstruct or extract the data used to train the model. This can be challenging if the training data or model itself is sensitive and proprietary. Model stealing, for example, might be used to extract a proprietary stock trading model, which the adversary could then employ for financial gain.

Attack at Inference

Inference attacks take advantage of overgeneralization on training data, which is a prevalent flaw in supervised machine learning models, to identify data utilized during model training. Assailants can do this even if they don’t know or have access to the parameters of a target model, presenting security risks for models trained on sensitive data.

How Can Foolbox Ensure Robustness of Model?

Foolbox is a new Python module for creating adversarial perturbations as well as quantifying and comparing the robustness of machine learning models. Foolbox communicates with the most prominent deep learning frameworks, including PyTorch, Keras, TensorFlow, Theano, and MXNet, and supports a variety of adversarial criteria, including targeted misclassification and top-k misclassification, as well as various distance metrics. Let us now briefly look at FoolBox’s structure.

Structure of FoolBox

Five elements are required to create adversarial examples, and these elements result in the five pillars of FoolBox: first, a model that takes input such as an image and makes a prediction e.g. class probabilities. Second, a criterion for determining what constitutes an adversarial e.g. misclassification.

Finally, a distance measure is used to determine the size of a perturbation e.g. L1-norm. Finally, an attack algorithm generates an adversarial perturbation using input and its label, as well as the model, the adversarial criterion, and the distance measure.

Implementing the FoolBox

In this section, we will look at some use cases of this toolbox. As we discussed earlier this framework supports mostly all widely used deep learning frameworks, to work with your desired framework make sure you have installed it, and then by simply using pip install the FoolBox.

In this example, we’ll use TensorFlow. First, we’ll need to create our transfer learning model (tf.keras.applications, in this case, I’m using the ResNet50), which we’ll then pass to Foolbox’s TensorFlowModel class. A similar class is available for other frameworks. We should also specify the preprocessing expected by the respective model, such as flipping an axis, converting from RGB to BGR, subtracting mean, and dividing by std, as well as the bounds of the input space, which should only contain values expected by the model.   

import foolbox as fb
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Tensorflow based
model = tf.keras.applications.ResNet50V2(weights="imagenet")
preprocessing = dict()
bounds = (-1, 1)
fmodel = fb.TensorFlowModel(model, bounds=bounds, preprocessing=preprocessing)

Now we have initiated the ResNet and FoolBox model too, before proceeding to the attack formulation we need to have some sample of image data that can be obtained directly using Foolbox which comes with helper functions under the utils package that provides a small set of sample images from different computer vision datasets. 

images, labels = fb.utils.samples(fmodel, dataset='imagenet', batchsize=16)

To launch an attack, we must first create an instance of the corresponding class. Foolbox employs a wide range of adversarial attacks. Each attack begins with a model for locating adversaries and a criterion for defining what an adversary is. Misclassification is the default criterion.

It can then be applied to a reference input and the corresponding label to which the adversarial should be close. Internal hyperparameter tuning is used by attacks to find the least amount of perturbation.

For example, while implementing the famous fast gradient sign method (FGSM), it looks for the smallest step-size that converts the input to an adversarial. As a result, manual hyperparameters tuning for attacks like FGSM are no longer required.

As shown below we can choose the type of Attack and under which we can feed the Tensorflow model. Additionally, epsilons are defined which are nothing but the level of perturbation that we want to test. 

# Intialize the attack class
epsilons = np.linspace(0.0, 0.005, num=20)
attack = fb.attacks.LinfDeepFoolAttack()
raw, clipped, is_adv = attack(fmodel, images, labels, epsilons=epsilons)

Now we can simply check the robust accuracy by averaging out the is_adv and will also it w.r.to epsilons. 

# accuracy when model is attacked
robust_accuracy = 1 - np.float32(is_adv).mean(axis=-1)
# visualizing the result
plt.plot(epsilons, robust_accuracy)
plt.title('Perturbation Vs Accuracy of the Model')

Conclusion

As we can see from the above plot when there is 0 perturbation the accuracy of the model is at its top as the toolbox starts testing epsilon values, and accuracy tends to decrease so fast even for a very small change in perturbation. From this, we can say that neural network-based models are highly prone to such attacks. 

Through this article, we have discussed adversarial attacks in neural networks and possible strategies of attacks that may be considered. In contrast, to ensure the robustness of the model we discussed a framework called FoolBox which aims to test our model by some predefined attacks, and ultimately we can check how good our model is.    

References

.

More Great AIM Stories

Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.
MORE FROM AIM

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM