The age of algorithmic innovations has now entered a new realm where the researchers are finding flaws in the techniques through adversarial attacks. In the case of computer vision problems, the role of adversarial attacks has been well established, and there have been several startups that are concentrating only on adversarial attacks.
Any talk of intelligence in machines would eventually lead to reinforcement learning — i.e. teaching machines how to teach itself. And, it is quite natural to suspect the kinds of adversarial attacks. Adam Gleave and his colleagues at UC Berkeley had similar suspicions and therefore decided to test the RL policies for vulnerabilities.
In their paper, the authors have tried to demonstrate the existence of adversarial policies in zero-sum games between simulated humanoid robots (aka stick figures) against state-of-the-art victims trained via self-play to be robust to opponents.
Testing Deep RL Policies For New Attacks
To demonstrate the effect of adversarial effects on reinforcement learning policy, Gleave and his colleagues trained stick-like figures to play a handful of two-player games such as football, racing and wrestling. The bots were aware of the position and movement of their limbs and those of their opponents.
According to the paper, adversarial policies have a high chance of winning against the victims but generate seemingly random and uncoordinated behaviour. They claim that these policies are more successful in high-dimensional environments, and can induce substantially different activations in the victim policy network than when the victim plays against a normal opponent.
Both of these agents are trained to observe the position, velocity, and the contact forces of joints in their body, along with the position of their opponent’s joints.
For instance, when these stick-like figures were put in a game similar to soccer where the end goal is to score a goal, the bot, instead of blocking the goal, it just drops to the ground and wiggles its legs. Confused, the striker does a weird little sideways dance, stamping its feet and waving one arm, and then falls over. And then, 1-0 to the goalie.
The above illustrations are the snapshots of a victim (in blue) against normal and adversarial opponents (in red). The victim wins if it crosses the finish line; or else, the opponent wins.
Despite never standing up, as shown in the previous illustration, the adversarial opponent strikingly wins 86% of the times compared to the normal opponent with a 47% win rate.
The tactics employed here don’t look sane, but somehow they manage to confuse the agents trained using deep RL algorithms. And, since deep RL is part of many popular modern-day AI demonstrations like those of AlphaZero and the OpenAI Five — the authors warned that the RL policies are more vulnerable to attack than previously thought. And that could have serious consequences.
The authors conclude by saying that the adversarial policies have the potential to manipulate by grasping insights from the body position. This ability makes these systems more vulnerable to adversarial policies in high-dimensional environments.
Adversarial attacks on reinforcement learning systems have significant implications in critical use cases such as self-driving cars and robotics. This work emphasises the importance of verifying deep RL systems for their ability to thwart these unwanted or unseen attacks.
Here a few highlights from this work:
- The authors have proposed a novel threat model of natural adversarial observations produced by an adversarial policy taking actions in a shared environment.
- They demonstrate that adversarial policies exist in a range of zero-sum simulated robotics games against the state-of-the-art victims trained via self-play to be robust to adversaries.
- They also verify the adversarial policies win by confusing the victim, not by learning a generally strong policy; and it also observes that victim performance increases when it is blind to the adversary’s position.
Read the full paper here.