An idiot’s guide to adversarial attacks in machine learning

The end goal of adversarial attacks is to deceive the model into giving away sensitive information, making incorrect predictions, or corrupting them.
An idiot’s guide to adversarial attacks in machine learning

Adversarial machine learning uses accessible model information to launch malicious attacks. Such adversarial attacks attempt to hamper the performance of classifiers on certain tasks by providing the models with false data. 

The end goal of such attacks is to deceive the model into giving away sensitive information, making incorrect predictions, or corrupting them.

Most research into adversarial machine learning has been done in the realm of image recognition, in which images are doctored in a way that causes the classifier to make incorrect predictions. 

Adversarial attacks generate false data to deceive classifiers. Such inputs are purposely designed to cause ML models to make a mistake. They are corrupted versions of valid data that work as optical illusions for machines. 

When the attacker has access to the target model and knows its architecture and parameters, it is called a whitebox attack. 

Alternately, when the attacker has zero access to the targeted model and can only work by observing its outputs, it is called a blackbox attack. 

Different types of adversarial attacks 

Poisoning attacks occur during the training phase of ML systems. They “contaminate” or “poison” the training data of ML models by manipulating the existing data or slapping incorrect labels. Such hacks are likely to work on models that are continuously retrained. For example, reinforcement learning models may be trained daily or biweekly, giving the hacker multiple opportunities to introduce deceptive data to the training data. 

Evasion attacks are the most prevalent (and most researched) adversarial attacks and occur after the models have already been trained. The attacks tend to be more practical as they are performed during the deployment phase. They involve imperceptibly altering the data used by the models to make predictions (not the training data), so that it looks legitimate but makes incorrect predictions. The attacks are often launched on a trial and error basis, as the attackers don’t know in advance what data manipulation will finally break the ML system. 

Evasion attacks are often associated with computer vision. Attackers can modify images and trick the model into making incorrect predictions. This works because image recognition models have been trained to correlate certain types of pixels with intended variables: If the pixels are re-tailored in a specific way (such as by adding an imperceptible layer of noise), it will cause the model to change its prediction. This poses a threat to medical imaging systems, as they could be tricked into classifying a benign mole as malignant. 

Model stealing attacks are aimed at already trained models. The attacker examines the structure and training data of a black box machine system, which could then be used to reconstruct the model or extract the potentially confidential data the model was trained on. Such attacks are usually motivated by financial gain.

How to prevent adversarial attacks

A potential method to counter adversarial attacks is to train ML systems to learn what an adversarial attack might look like ahead of time by incorporating adversarial examples in their training process. 

Another method is to regularly modify the algorithms the ML models use to classify data, thereby creating a “moving target” to retain the secrecy of the algorithms. 

Developers of ML systems should be aware of the risks associated with them and put in place security measures for cross-checking and verifying information. Furthermore, to avoid pitfalls preemptively, they should make frequent attempts to corrupt their models to detect as many shortcomings as possible in advance.

Download our Mobile App

Srishti Mukherjee
Drowned in reading sci-fi, fantasy, and classics in equal measure; Srishti carries her bond with literature head-on into the world of science and tech, learning and writing about the fascinating possibilities in the fields of artificial intelligence and machine learning. Making hyperrealistic paintings of her dog Pickle and going through succession memes are her ideas of fun.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.