In a study by IDC, it was found that the global cybersecurity market was worth $107 million in 2019 and is poised to grow up to $151 million by 2023. While most of this expenditure is towards designing software and hardware for protecting systems from hacking or compromising networks, an area which is often overlooked is the integrity of the data being used to train the datasets used by the machine learning algorithms. This is called the poisoning attack, where the intruder injects false training data to corrupt the learning model itself. It could become a significant attack that can undermine the AI systems, businesses and processes built around them.
What is Poisoning Attack?
As per a well-respected and often-cited report on the vulnerabilities of the AI system for the Belfer Center by Marcus Comiter, the attacks can be broadly classified into
- Input attack: It is among the more conventional adversarial attacks where the data fed to the AI system is manipulated to affect the output in a way desired by the attacker.
- Poisoning attack: These attacks occur earlier in the process during the time when the AI system is being developed and trained. It typically involves manipulating the data that is used to train the system itself.
Here we will discuss the poisoning attack in particular. This attack seeks to damage the AI model itself so that it is inherently flawed the output can be explicitly controlled by the attacker. In a poisoning attack, the attacker compromises the learning process in a way that the system fails on the inputs chosen by the attacker and further constructs a backdoor through which he can control the output even in future.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Image credit: Informatiomatters.net
There are three ways in which the attacker can ‘poison’ the AI/ML system:
- Dataset poisoning: It is one of the most direct ways with which a model can be corrupted. It depends on the principle, ‘poison the dataset, poison the model’. In this case, the attacker introduces incorrect or mislabeled data into the dataset. Alternatively, the adversary can change its behaviour so that the data collected itself will be wrong.
- Algorithm poisoning: In this type, the attacker takes advantage of the algorithm used to learn the model. There are many ways to poison the algorithm — poison through transfer learning where attackers teach an algorithm poison and then spread it further to new ML algorithms using transfer learning; data injection and manipulation where bad data is introduced to the data pool of the algorithm; and logic corruption where the attacker changes the way the algorithm learns.
- Model poisoning: This type of poisoning is pretty straightforward. Here the attacker simply replaces a functional model with a poisoned one. This model lives within the computer and provides the attacker with a backdoor to either alter this model or replace it completely with a poisoned model.
The implication of poisoning attacks can be pretty fatal for many businesses and industries, and even life-threatening in the cases of the medical sector, aviation sector, or road safety. One of the most popular experiments in this regard was done when a group of researchers added small changes or ‘perturbations’ to an image of a panda, which caused changes in the machine learning algorithm to identify panda (a giant bear belonging to Ursidae family) as gibbon (a small ape belonging to Hylobatidae family.)
Many researchers and experts have referred to poisoning attacks as ‘ticking bombs’ that require immediate attention. As AI/ML systems run along with organisations and which in turn control the economy, it is important that the decision making is done on reliable and trusted data.With increased reliance on web-based resources for AI training models, it is important that one understands and appreciates the authenticity of such resources. Just building a secure data network is not enough as here we are dealing with compromised data even before it enters the system. In such a case, educating the stakeholders about the issue through national and international AI policy is very pertinent.