After Google’s AlphaGo, deep reinforcement learning is being further developed in an initiative by IIT Madras where researchers will construct their own algorithms for complex tasks, said a report published in the Hindu.
Sign up for your weekly dose of what's up in emerging technology.
“There are two parts to engineering this – one involves incorporating features into the neural network that will get the program to recognize parts of the screen [when playing a game]. The other part involves making associations between utilities and action – for instance deciding whether to move left or right based on a specific pattern on the screen,” explains Prof. B Ravindran who heads the Robert Bosch Centre for Data Science and Artificial Intelligence, at IIT Madras.
The team trained the algorithm using “experts” that were basically programs that had mastered a method of playing the game. Apart from this, the algorithm was also made to learn “from scratch” – that is, without the intervention of experts, the report said.
An advanced version of AlphaGo will not just help play the Go game better but the algorithms also make provisions for learning from mistakes.
“When we came up with algorithms that incorporated this, we observed improvement by several thousand percents in the learning performance,” says Prof. Ravindran.
Another ability built into the program was a tendency to avoid negative transfer. That is, if the “expert” that the program was learning from is actually bad at the game, the algorithm stops following this expert and chooses a different option – which may be following another expert or learning from scratch by itself.
Having worked on the relatively simple arcade games, the team now plans to move on to more complex tasks involving higher-level skills. Moreover, they might also work on self-driving cars very soon: “We are planning to build in concepts of risk-awareness through deep reinforcement learning. To apply these ideas to robotics and, say, self-driving cars, there needs to be safety and risk-awareness built in. So, we are working on this,” he says.