Last updated March 12, 2021
In AI Mysteries

Hands-On Guide To Adversarial Robustness Toolbox (ART): Protect Your Neural Networks Against Hacking

The Adversarial Robustness Toolbox(ART) is a Python library which is one of the complete resources providing developers and researchers for evaluating the robustness of deep neural networks against adversarial attacks

Share

Published on January 7, 2021

by Jayita Bhattacharyya

Machine Learning models can be exposed to the threat to jeopardise with the predictions. Such attacks on deployment ends have been seen time and again and thus needed to be addressed accurately. AI security is most necessary for enterprise AI systems where data storage is mostly in tabular forms, and data privacy policies are at stake. The Adversarial Robustness Toolbox(ART) is a Python library which is one of the complete resources providing developers and researchers for evaluating the robustness of deep neural networks against adversarial attacks. Open-sourced by IBM, ART provides support to incorporate techniques to prevent adversarial attacks for deep neural networks written in TensorFlow, Keras, PyTorch, sci-kit-learn, MxNet, XGBoost, LightGBM, CatBoost and many more deep learning frameworks. It can be applied to all kinds of data from images, video, tables, to audio, and many more. It is cross-platform and supports various machine learning tasks such as classification, speech recognition, object detection, generation, certification, etc.

ART has attracted many developers since its first release. Its latest version v1.5 allows it to evaluate and defend AI models and applications against the 4 adversarial threats of inference, extraction, poisoning and evasion with a single unified library.

The new version extends the supported ML tasks to include object detection, automatic speech recognition (ASR), generative-adversarial networks (GAN), and robustness certification in addition to simple classification models and is compatible with more popular ML frameworks to prevent users from being under the hood of one framework. The threats of extraction, where the attacker is liable to steal a model via model queries, and inference, thereby allowing the attacker to acquire private information in a model’s training data. There are three different types of inference attacks that can disrupt different features of the privacy of data in training. In membership inference, ART allows reproducing a malicious attacker attempting to acquire information of a specific record, e.g. of a person, has been part of training data in an ML model or not. Such attacks can be harmful as they expose sensitive private information from just having access to a trained ML model. The attribute inference attack aims at extracting the original attribute values of an existing record in the training data, which can be only accessed by the trained model and knowing a few of the other features. For example, an ML model trained on demographic data is attacked with attribute inference could expose information about a person’s exact DOB and wages. Lastly, the model invasion where attackers can invert a trained ML model by reconstructing representative averages of features from the record.

Adversarial Action Recognition Attack

This demonstrates the usage of ART library to impose an adversarial attack on video action recognition. First, it uses GluonCV and MXNet for video action recognition. MXNet pre-trained models are used for classification tasks. Specifically, the pre-trained i3d_resnet50_v1_ucf101 model is used. The video clip of a basketball action taken from the UCF101 dataset. To show how to classify the following short video clip correctly.

# Initial working stages

the sample basketball to be downloaded
the pre-trained action recognition model is to be loaded
To show that the model can correctly classify the video action as playing basketball.

# Loading Model and Basketball Sample

 import os
 import tempfile
 import decord
 from gluoncv import utils
 from gluoncv.data.transforms import video
 from gluoncv.model_zoo import get_model
 from gluoncv.utils.filesystem import try_import_decord
 import imageio
 from matplotlib.image import imsave
 import matplotlib.pyplot as plt
 import mxnet as mx
 from mxnet import gluon, nd, image
 from mxnet.gluon.data.vision import transforms
 import numpy as np
 from art.attacks.evasion import FastGradientMethod, FrameSaliencyAttack
 from art import config
 from art.defences.preprocessor import VideoCompression
 from art.estimators.classification import MXClassifier

# setting global variables

 PRETRAINED_MODEL_NAME = 'i3d_resnet50_v1_ucf101'
 VIDEO_SAMPLE_URI = 'https://github.com/bryanyzhu/tiny-ucf101/raw/master/v_Basketball_g01_c01.avi'

# setting seed

 np.random.seed(123)
 def predict_top_k(video_input, model, k=5, verbose=True):
     pred = model(nd.array(video_input))
     classes = model.classes    
     ind = nd.topk(pred, k=k)[0].astype('int')
     if verbose:
         msg = "The sample video clip is"
         for i in range(k):
             msg += f"\n\t[{classes[ind[i].asscalar()]}], with probability {nd.softmax(pred)[0][ind[i]].asscalar():.3f}."
         print(msg)
     return ind
 def sample_to_gif(sample, output="sample.gif", path=config.ART_DATA_PATH, postprocess=None):
     frame_count = sample.shape[1]
     output_path = os.path.join(path, output)
     with tempfile.TemporaryDirectory() as tmpdir, imageio.get_writer(output_path, mode='I') as writer:
         for frame in range(frame_count):
             file_path = os.path.join(tmpdir, f"{frame}.png")
             imsave(file_path, np.transpose(sample[:,frame,:,:], (1,2,0)))
             writer.append_data(imageio.imread(file_path))
     return output_path

# downloading sample video

 decord = try_import_decord()
 video_fname = utils.download(VIDEO_SAMPLE_URI, path=config.ART_DATA_PATH);
 video_reader = decord.VideoReader(video_fname)
 frame_id_list = range(0, 64, 2)
 video_data = video_reader.get_batch(frame_id_list).asnumpy()
 video_sample_lst = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]

# preprocessing the benign video sample

 transform_fn = video.VideoGroupValTransform(size=224, mean=[0.475, 0.465, 0.475], std=[0.220, 0.200, 0.225])
 sample = np.stack(transform_fn(video_sample_lst),  axis=0)
 sample = sample.reshape((-1,) + (32, 3, 224, 224))
 sample = np.transpose(sample, (0, 2, 1, 3, 4))
 print(f"`{video_fname}` has been downloaded and preprocessed.")

`/home/hessel/.art/data/v_Basketball_g01_c01.avi` has been downloaded and preprocessed.

# loading pretrained model

 model = get_model(PRETRAINED_MODEL_NAME, nclass=101, pretrained=True)
 print(f"`{PRETRAINED_MODEL_NAME}` model was successfully loaded.")

`i3d_resnet50_v1_ucf101` model was successfully loaded.

# evaluating model on basketball video sample

_ = predict_top_k(sample, model)

 The video sample clip is 
 [Basketball], with probability 0.725.
 [TennisSwing], with probability 0.212.
 [VolleyballSpiking], with probability 0.036.
 [SoccerJuggling], with probability 0.012.
 [TableTennisShot], with probability 0.007.

For the given video sample, it is seen that the model correctly classified it as playing basketball.

# Creating Adversarial Attack

Now we can include the ART library for the adversarial attack via the Fast Gradient Method. The attack is incorporated to corrupt the video sample so that it could be misclassified. Also, the adversarial example is converted into a GIF

Adversarial Basketball

# preprocessing the adversarial sample video input

 transform_fn_unnormalized = video.VideoGroupValTransform(size=224, mean=[0, 0, 0], std=[1, 1, 1])
 adv_sample_input = np.stack(transform_fn_unnormalized(video_sample_lst),  axis=0)
 adv_sample_input = adv_sample_input.reshape((-1,) + (32, 3, 224, 224))
 adv_sample_input = np.transpose(adv_sample_input, (0, 2, 1, 3, 4))

# wrapping model in a ART classifier

 model_wrapper = gluon.nn.Sequential()
 with model_wrapper.name_scope():
     model_wrapper.add(model)

# preparing the mean and std arrays for ART classifier preprocessing

 mean = np.array([0.485, 0.456, 0.406] * (32 * 224 * 224)).reshape((3, 32, 224, 224), order='F')
 std = np.array([0.229, 0.224, 0.225] * (32 * 224 * 224)).reshape((3, 32, 224, 224), order='F')
 classifier_art = MXClassifier(
     model=model_wrapper,
     loss=gluon.loss.SoftmaxCrossEntropyLoss(),
     input_shape=(3, 32, 224, 224),
     nb_classes=101,
     preprocessing=(mean, std),
     clip_values=(0, 1),
     channels_first=True,
 )

# verifying whether the ART classifier predictions are consistent with the original model:

 pred = nd.array(classifier_art.predict(adv_sample_input))
 ind = nd.topk(pred, k=5)[0].astype('int')
 msg = "The video sample clip is classified"
 for i in range(len(ind)):
     msg += f"\n\t[{model.classes[ind[i].asscalar()]}], with probability {nd.softmax(pred)[0][ind[i]].asscalar():.3f}."
 print(msg)

 The video sample clip is classified to be
 [Basketball], with probability 0.725.
 [TennisSwing], with probability 0.212.
 [VolleyballSpiking], with probability 0.036.
 [SoccerJuggling], with probability 0.012.
 [TableTennisShot], with probability 0.007.

# crafting adversarial attack with FGM

 epsilon = 8/255
 fgm = FastGradientMethod(
     classifier_art,
     eps=epsilon,
 )
 %%time
 adv_sample = fgm.generate(
     x=adv_sample_input
 )

CPU times: user 2min 44s, sys: 826 ms, total: 2min 45s

Wall time: 2min 45s

# printing results

_ = predict_top_k((adv_sample-mean)/std, model)

 The video sample clip is classified to be
 [ThrowDiscus], with probability 0.266.
 [Hammering], with probability 0.244.
 [TennisSwing], with probability 0.155.
 [HulaHoop], with probability 0.082.
 [JavelinThrow], with probability 0.055.

# saving adversarial example to gif:

 adversarial_gif = sample_to_gif(np.squeeze(adv_sample), "adversarial_basketball.gif")
 print(f"`{adversarial_gif}` has been successfully created.")

`/home/hessel/.art/data/adversarial_basketball.gif` has been successfully created.

# Creating Sparse Adversarial Attack

Using the Frame Saliency Attack, now it’s time to create a sparse adversarial example. The final result is shown in the GIF. Here only one frame is needed to be perturbed to achieve a misclassification.

adversarial_basketball_sparse.gif

# Frame Saliency Attack. Note: we specify here the frame axis, which is 2.

 fsa = FrameSaliencyAttack(
     classifier_art,
     fgm,
     "iterative_saliency",
     frame_index = 2
 )
 %%time
 adv_sample_sparse = fsa.generate(
     x=adv_sample_input
 )

CPU times: user 5min 54s, sys: 700 ms, total: 5min 55s

Wall time: 5min 56s

# printing the resulting predictions:

_ = predict_top_k((adv_sample_sparse-mean)/std, model)

 The video sample clip is classified to be
 [TennisSwing], with probability 0.497.
 [Basketball], with probability 0.425.
 [VolleyballSpiking], with probability 0.040.
 [SoccerJuggling], with probability 0.014.
 [TableTennisShot], with probability 0.009.

# Again saving the adversarial example to gif:

 adversarial_sparse_gif = sample_to_gif(np.squeeze(adv_sample_sparse), "adversarial_basketball_sparse.gif")
 print(f"`{adversarial_sparse_gif}` has been successfully created.")

`/home/hessel/.art/data/adversarial_basketball_sparse.gif` has been successfully created.

# counting the number of perturbed frames:

 x_diff = adv_sample_sparse - adv_sample_input
 x_diff = np.swapaxes(x_diff, 1, 2)
 x_diff = np.reshape(x_diff, x_diff.shape[:2] + (np.prod(x_diff.shape[2:]), ))
 x_diff_norm = np.sign(np.round(np.linalg.norm(x_diff, axis=-1), decimals=4))
 print(f"Number of perturbed frames: {int(np.sum(x_diff_norm))}")

Number of perturbed frames: 1

# Applying H.264 compression defence

Next VideoCompression is applied as a simple input preprocessing defence mechanism. This defence is intended to correct predictions when applied to both the original and the adversarial video input.

# initializing VideoCompression defense

video_compression = VideoCompression(video_format="avi", constant_rate_factor=30, channels_first=True)

# applying defense to the original input

adv_sample_input_compressed = video_compression(adv_sample_input * 255)[0] / 255

# applying defense to the sparse adversarial sample

adv_sample_sparse_compressed = video_compression(adv_sample_sparse * 255)[0] / 255

# printing the resulting predictions on compressed original input

_ = predict_top_k((adv_sample_input_compressed-mean)/std, model)

# printing the resulting predictions on sparse adversarial sample

_ = predict_top_k((adv_sample_sparse_compressed-mean)/std, model)

 The video sample clip is classified to be
 [Basketball], with probability 0.512.
 [TennisSwing], with probability 0.439.
 [VolleyballSpiking], with probability 0.021.
 [TableTennisShot], with probability 0.012.
 [SoccerJuggling], with probability 0.008.
 The video sample clip is classified to be
 [Basketball], with probability 0.711.
 [TennisSwing], with probability 0.223.
 [VolleyballSpiking], with probability 0.028.
 [SoccerJuggling], with probability 0.012.
 [TableTennisShot], with probability 0.009.

EndNotes

Here’s a detailed video from IBM Research explaining Adversarial Robustness Toolbox working.