Listen to this story
In the AI realm, reinforcement learning (RL) is lauded for good reasons. It is one of the most important advancements towards enabling general AI. But outside of popular interest, some researchers question whether it is the correct way to train machines in order to move forward. The technique has often been described as “the first computational theory of intelligence” by scientists.
One of the players that have made it to the top of the reinforcement learning leaderboard is DeepMind, a London-based research firm. In fact, the first blog shared by the firm in 2016 was about this technique, and there’s barely any models launched by them since that go without the use of reinforcement learning.
To gain further insight into the current discourse within the company about abandoning reinforcement learning, Analytics India Magazine spoke to Pushmeet Kohli, head of research (AI for science, robustness and reliability) at DeepMind.
Powerful But Not For All Problems
The reigning supremacy of reinforcement learning is due to its ability to develop behaviour by taking actions and getting feedback, much similar to the way humans and animals learn by interacting with their environments. Furthermore, RL does not require labelled datasets and makes real-life decisions based on a reward system—closely mimicking human behaviour.
“Reinforcement learning is an important toolbox in the things required to create an intelligence system. It plays a crucial role in DeepMind’s work, be it AlphaGo or AlphaTensor. It is important but not the only technique we need to create intelligent systems,” said Kohli.
Also read: Kohli On Solving Intelligence at DeepMind for Humanity & Science
DeepMind is keenly invested in “deep reinforcement learning”. RL models require extensive training to learn the simplest things and are rigidly constrained to the narrow domain they are trained on. But it is not the only one. Big names like OpenAI, Google Brain, Facebook and Microsoft have made RL their priority for investment and research. However, for the time being, developing deep reinforcement learning models requires exorbitantly expensive computing resources, which makes research in the area limited to deep-pocketed companies.
No doubt, RL is a powerful approach but it is not fit for every problem. For problems that require sequential decision-making—a series of decisions that all affect one another—this technique would be perfect. For instance, it should be leveraged while developing an AI model to win a game. It isn’t enough for the algorithm to make one good decision but a whole sequence of good decisions. By providing a single reward for a positive outcome, RL weeds out solutions that result in low rewards and elevates those that enable an entire sequence of good decisions.
DeepMind is Open Minded
The reinforcement learning chatter recently gained popularity when Yann LeCun, in jest, suggested abandoning the technique for the betterment of the community. In an exclusive conversation with AIM, LeCun said that though RL is inevitable in machine learning, the purpose behind incorporating it in algorithms should be to minimise its use eventually.
LeCun has been an advocate of self-supervised learning and claimed innovations using SSL had worked better than he anticipated. He further said that even ChatGPT uses SSL more than RL, but there are only two obstacles—defining explicit objectives and planning abilities.
Commenting on the debate, Kohli said, “Models require other types of learning methodologies as well. So, we see them as part of a broader collection of techniques that DeepMind is developing. We think that they all have importance in different contexts and we should be leveraging them accordingly. Rather than abandoning one way or the other. Our perspective is that we need to be open minded about all the signals and leverage them.”
However, contrary to Kohli’s belief, DeepMind had published a paper in 2021 where the team had argued that ‘Reward is Enough‘ for all kinds of intelligence. Specifically, they argued that “maximising reward is enough to drive behaviour that exhibits most if not all attributes of intelligence.” The paper was highly critiqued by the community and hotly debated upon. As a result, several members concluded that reward is enough but not efficient.