Revenge Of The Humans: Why Open AI Five Could Not Win The Dota 2 Championship

The International, which is the FIFA of Dota 2, a complex battle arena game, had an artificial intelligence system compete with professional players in the 2018 tournament. Earlier this August, an AI player called Five, created by OpenAI, failed to defeat professional human gamers. Despite having the training and “experience” of over 180 years, the AI was unable to achieve the feat. Why was it so?

To give a brief to the uninitiated, Dota 2 is a popular online multiplayer video game which has 115 heroes, categorised according to strength, agility and intelligence. There are two teams of five players each and every team player has to pick a hero, which has different powers and characteristics, and destroy the opposite team’s base while encountering a lot of hurdles.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Tech Behind Five

Each of the five heroes of Five were trained with a neural network. They were trained for a gameplay worth of 180 years, for two months before the final match. Every neural network was trained by playing against itself. Learning from self-play provides a way for natural exploration of the game environment. During training, properties like health, speed or starting level, were randomised.

At the beginning of each game, each hero was randomly assigned some set of lanes to follow and was not allowed to distract from these lanes. At first, Five players walked aimlessly inside the game, but after some hours of training, they could do things like farming and fighting.

Download our Mobile App

After some days, the Five players could think and play like humans by making strategies and performing actions such as stealing the opponent’s Bounty runes and walking to their tier one towers to farm. Gradually they became proficient in advanced tactics, like the 5-hero push. It was found that when the randomisations were increased the human player teams started to lose games. 80% of the games were trained against itself and the other 20% against its past selves. This was done to avoid any strategy collapse.

The system was implemented as a general-purpose OpenAI Five’s learning algorithm named Rapid, which can be applied to any Gym environment. An advanced method based on policy gradient methods called proximal policy optimisation (PPO) was used to make decisions

OpenAI used a separate long short term memory (LSTM) networks, a kind of recurrent neural network, for each hero to learn strategies. Each of the neural networks of Five has a single layer, 2024-unit LSTM that observes the current game state from the Bot API. It then eventually gives actions based on it via several action heads. Each head has a distinct action and is computed independently.

To train the AI to play a game as real-time and complex as Dota 2, it had to be put in a very powerful processing capability. It has 256 P100 GPUs on GCP and 128,000 preemptible CPU cores on its CGP. Observations were 7.5 per second of the gameplay and the size of observation was 36.8 kilobytes. Batch per minute was 60 and the batch size was 1048576 observations.

5 Observations Where Five Went Wrong

Unity: Five always seemed to stay in unity, even when it wasn’t required. This was beneficial to them when it was a good time to attack, but not favourable when the opponent took the advantage of it and tried to defeat them all together. It did not probably have the ability to realise that the opponent player heroes are not as same as they themselves and they could not decide what opponent powers could make them use as their strengths and weaknesses. It only took actions according to their own team heroes and so it wasn’t effective in a game where the opponent could be any hero of a hundred and fifteen.

Missing couriers: They did not worry much about the courier in the game and kept playing in the battlefield despite the courier being present. They could not grasp that the courier is more important than fighting in the battlefield for the team survival.

Speed: Five, being a machine, naturally had a faster response time than the professional players. So, they had fast decision-making abilities and could react faster in the gameplay. They didn’t have to keep checking on the map where their team was or check if their most powerful spell is ready. The usual human response time is around 150 to 500 milliseconds. Whereas, Five had a response time of about 80 milliseconds.

Poor decisions: Although the decision-making skills were very fast, there were instances when the decision made was extremely poor. Five could not make optimal decisions to all the situations. For example, staying in groups all the time.

Fearless: Five repeatedly sacrificed their top lane or bottom lane, with an intention of having a control over the opponent team’s safe lane. The instant they saw a kill, they went for it without gauging the consequences; without considering the enemy’s powers and what disadvantages might going near it and killing it have.


The failure of OpenAI Five was not really a failure of AI. It showed that it could play something as complex as Dota 2. Dota 2 reflects many real-world environments. Games like this is a perfect testbed for AI research. OpenAI is one of the biggest organisations that are focused on solving humanity problems with AI.

Humans are in turn learning new techniques from their matches with bots. For example, professional Go player Lee Sedol, was defeated by DeepMind’s AlphaGo, but it taught him a new technique in the game. DotA’s example would when Five allowed players to recharge a certain weapon quickly by staying out of range of the enemy. This was new and the human players learnt from it.

Therefore, AI gives an opportunity to learn for both the parties — a win-win situation.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Disha Misal
Found a way to Data Science and AI though her fascination for Technology. Likes to read, watch football and has an enourmous amount affection for Astrophysics.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: From Promise to Peril: The Pros and Cons of Generative AI

Most people associate ‘Generative AI’ with some type of end-of-the-world scenario. In actuality, generative AI exists to facilitate your work rather than to replace it. Its applications are showing up more frequently in daily life. There is probably a method to incorporate generative AI into your work, regardless of whether you operate as a marketer, programmer, designer, or business owner.

Meet the Tech Fanatic, Deedy

Debarghya Das or Deedy is the founding engineer of internal enterprise search space Glean, a company that strives to solve workplace search queries