Last updated January 12, 2021
In AI Origins & Evolution

How To Secure Deep Learning Models From Adversarial Attacks

Share

Illustration by How Being A Data Science Generalist Can Be Beneficial In This Uncertain Time

Published on July 13, 2020

by Sejuti Das

With recent advancements in deep learning, it has become critical to improve the robustness of the deployed algorithms. Vulnerability to adversarial samples has always been a critical concern while implementing these DL models for safety-critical tasks like autonomous driving, fraud detection, and facial recognition. Such adversarial inputs are usually undetectable to the human eye. However, they can pose threats or can go completely wrong for AI systems.

There are two prominent instances where self-driving cars were hacked to increase speed limits or to veer into the wrong lane by just putting stickers on street signs. In fact, the application of deep neural networks as inverse problem solvers can be immensely beneficial for medical imaging applications like CT scans and MRIs, but these perturbations have the potential to create vulnerability by reconstructing wrong images for patients.

In order to reduce the impact of adversaries and make it robust for critical tasks, the researchers of the University of Illinois at Urbana-Champaign have recently released a paper proposing a new method for training end-to-end deep learning-based inverse problem-solving models. Here the researchers aimed to understand the impact of adversarial attacks in the measurement-space, instead of the signal-space.

Proposed Method For Securing DL Models

Recovery of images from indirect measurement data is critical for CT scans and MRIs and thus requires to be reliable and accurate. However, existing adversarial perturbations can impact accuracy as well as the quality of image reconstruction. As a matter of fact, adversarial networks deceive into reconstructing things that aren’t part of the data.

According to the researchers, modifying the training strategy can optimise the security and robustness of models. And one of the powerful ways to train a model against adversarial impacts is training it using adversarial examples, which can be effective for classification settings. The optimisations formulation of min and max is similar to Generative adversarial network (GAN), but with a different objective, and thus would require some changes compared to GANs.

For this, the researchers introduced an auxiliary network to create examples of adversarial attacks, which is used in a min-max optimisation formulation. The adversarial training would require to solve two optimisation issues of the model, i.e. inner maximisation, which maximises the loss — the adversarial attack, and an outer minimisation, which minimises the loss. This results in a conflict between the two networks — the attackers and the robust, while training. And for a robust system, the researchers solved the optimisation problems by using projected gradient ascent (PGA) with momentum.

Further, the researchers theoretically analysed a particular case of a linear reconstruction scheme. They noted that by using the min-max formulation, it results in a singular-value(s) filter regularised solution. This singular-value(s) filter regularised solution overshadows the adversarial examples occurring due to ill-conditioning in the measurement matrix.

To compare the theoretically obtained results with the one learned by the researcher’s scheme, the researchers experimented using a linear reconstruction network, with a learned adversarial example generator, in a simulated set-up. The result highlighted that the network indeed converges to the solution obtained theoretically.

Besides, the researchers further stated that for deep non-linear networks for Compressed Sensing (CS), the proposed formulation by researchers for training is going to showcase robustness than any other traditional methods. Alongside, while experimenting for CS on two different measurement matrices, one well-conditioned and another relatively ill-conditioned, the researchers noted that the behaviour for both the cases is vastly different. However, the responses for both the cases depend heavily on the conditioning of matrices for the linear reconstruction scheme.

Wrapping Up

To test the robustness of the neural network, the researchers tested their adversarially trained network on MNIST and CelebA datasets. Although the results weren’t accurate, the researchers noted that the trained system is able to reconstruct the original data better than other methods available. The researchers further suggested that the technique still needs more refinement to be downright perfect.
Read the whole paper here.

Access all our open Survey & Awards Nomination forms in one place