How To Fool AI With Adversarial Attacks

Trick AI bots


Research in adversarial attacks has been the latest trend in technology, where developers, experts, and scientists are trying to trick AI bots by making subtle changes. Undoubtedly, ML models perform miserably if they are evaluated in a completely different environment as we are yet to develop an AI that can generalise and deliver superior results in new situations. But what has drawn interest from experts is that the outputs of these AI-based solutions can be swayed even with the smallest of changes.

Such flaws depict that we are still a long way away from achieving an AI that we all dream of. In this article, we will show you how some researchers have deceived AI bots.


Sign up for your weekly dose of what's up in emerging technology.

Tricking NLP Bots

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) developed a tool called TextFooler to trick AI bots. The tool forces Alexa and Siri to predict wrong with adversarial attacks, where inputs were deliberately created to fool the ML algorithms.

TextFooler attacks natural language processing (NLP) systems such as Alexa and Siri. The framework takes the input as text and then determines the word that will be vital for NLP-based systems to make predictions. Post that, the TextFooler replaces the word with a contextual synonym while ensuring that the grammar and original meaning has not been altered.

For one, instead of using the input ‘The characters, cast in impossibly contrived situations, are estranged from reality,’ TextFooler replaced it with “The characters, cast in impossibly engineered circumstances, are fully estranged from reality’ to get different outputs.

TextFooler was even used with some of the most popular open-source NLP model, BERT. Researchers were successfully able to bring down the 90 plus accuracy of BERT to under 20% by only changing 10% of the input words.

Such technologies can be used by hackers or lawbreakers as a workaround to bypass ML-based solutions such as spam filtering in emails, social media bots for flagging sensitive speech, and other text classification models. However, Di Jin, an MIT PhD student said that TextFooler’s capabilities could be extended to attack any classification-based NLP models to test their robustness and improve generalisation of deep learning models.

Deceiving AI Hiring Bot

South Korean firms are deploying AI bots for hiring candidates to increase the chances of getting the right talent for the right position. The ML model in the solutions analyse facial expressions and evaluate the choice of words. As per the Korean Economic Research Institute, nearly a quarter of the top 131 organisations in the country plan to use AI for hiring. However, Park Seong-Jung offers lessons to job-seekers on how to dupe ML-based hiring systems. He said that a forced smile with lips can be identified by the machines, resulting in rejecting the student, applicants, however, should smile with their eyes to ensure a job offer by exploiting the flaws.

Graphic Print

Belgian scientists developed graphic print that can baffle surveillance technology in real-time. Simen Thys, Wiebe Van Ranst, and Toon Goedeme, unlike others who attempt to trick the facial recognition, designed this graphic to sway the outputs of object detections models. Fooling facial recognition systems can be easy as the details of the face are of prime importance, so even a simple natural change in the face is enough to trick the AI bots. However, deceiving an objects detection model is not as easy as small details in objects do not significantly impact the output.

Bamboozling Self-Driving Cars

Tencent made experimental security research of Tesla autopilot and tricked the AI through adversarial attacks. Self-driving cars are trained to understand the symbols and make decisions accordingly. Besides, the cars’ position is also determined with the dashed lines that are marked on roads. Tencent deceived the Tesla car with only three strikes, where the car interpreted them as if the lane was veering left. Thus, the vehicle moved into oncoming traffic by taking a sudden turn. This is yet another example as to how a smallest of change can confuse the AI models completely.

Adversarial Attacks On Facial Recognition Domain

Published on 30 January 2020, a study has demonstrated how adversarial attacks that can fool deep neural network (DNN) classifiers. The researchers applied a fast gradient sign method, an approach of manipulation by adding or subtracting small error to each pixel, in order to introduce perturbations to the dataset, resulting in miss classification. Such shortcoming in neural networks can be a severe threat as the technology of computer vision is being deployed to authenticate using biometrics, malware filters, and more.

More Great AIM Stories

Rohit Yadav
Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email:

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM