An idiot’s guide to adversarial attacks in machine learning

The end goal of adversarial attacks is to deceive the model into giving away sensitive information, making incorrect predictions, or corrupting them.
An idiot’s guide to adversarial attacks in machine learning

Adversarial machine learning uses accessible model information to launch malicious attacks. Such adversarial attacks attempt to hamper the performance of classifiers on certain tasks by providing the models with false data. 

The end goal of such attacks is to deceive the model into giving away sensitive information, making incorrect predictions, or corrupting them.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Most research into adversarial machine learning has been done in the realm of image recognition, in which images are doctored in a way that causes the classifier to make incorrect predictions. 

Adversarial attacks generate false data to deceive classifiers. Such inputs are purposely designed to cause ML models to make a mistake. They are corrupted versions of valid data that work as optical illusions for machines. 

When the attacker has access to the target model and knows its architecture and parameters, it is called a whitebox attack. 

Alternately, when the attacker has zero access to the targeted model and can only work by observing its outputs, it is called a blackbox attack. 

Different types of adversarial attacks 

Poisoning attacks occur during the training phase of ML systems. They “contaminate” or “poison” the training data of ML models by manipulating the existing data or slapping incorrect labels. Such hacks are likely to work on models that are continuously retrained. For example, reinforcement learning models may be trained daily or biweekly, giving the hacker multiple opportunities to introduce deceptive data to the training data. 

Evasion attacks are the most prevalent (and most researched) adversarial attacks and occur after the models have already been trained. The attacks tend to be more practical as they are performed during the deployment phase. They involve imperceptibly altering the data used by the models to make predictions (not the training data), so that it looks legitimate but makes incorrect predictions. The attacks are often launched on a trial and error basis, as the attackers don’t know in advance what data manipulation will finally break the ML system. 

Evasion attacks are often associated with computer vision. Attackers can modify images and trick the model into making incorrect predictions. This works because image recognition models have been trained to correlate certain types of pixels with intended variables: If the pixels are re-tailored in a specific way (such as by adding an imperceptible layer of noise), it will cause the model to change its prediction. This poses a threat to medical imaging systems, as they could be tricked into classifying a benign mole as malignant. 

Model stealing attacks are aimed at already trained models. The attacker examines the structure and training data of a black box machine system, which could then be used to reconstruct the model or extract the potentially confidential data the model was trained on. Such attacks are usually motivated by financial gain.

How to prevent adversarial attacks

A potential method to counter adversarial attacks is to train ML systems to learn what an adversarial attack might look like ahead of time by incorporating adversarial examples in their training process. 

Another method is to regularly modify the algorithms the ML models use to classify data, thereby creating a “moving target” to retain the secrecy of the algorithms. 

Developers of ML systems should be aware of the risks associated with them and put in place security measures for cross-checking and verifying information. Furthermore, to avoid pitfalls preemptively, they should make frequent attempts to corrupt their models to detect as many shortcomings as possible in advance.

More Great AIM Stories

Srishti Mukherjee
Drowned in reading sci-fi, fantasy, and classics in equal measure; Srishti carries her bond with literature head-on into the world of science and tech, learning and writing about the fascinating possibilities in the fields of artificial intelligence and machine learning. Making hyperrealistic paintings of her dog Pickle and going through succession memes are her ideas of fun.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM