MITB Banner

Imagine a World Without Reinforcement Learning

It is important but not the only technique we need to create intelligent systems, said Kohli DeepMind’s Head of Research (AI for science).

Share

Listen to this story

In the AI realm, reinforcement learning (RL) is lauded for good reasons. It is one of the most important advancements towards enabling general AI. But outside of popular interest, some researchers question whether it is the correct way to train machines in order to move forward. The technique has often been described as “the first computational theory of intelligence” by scientists. 

One of the players that have made it to the top of the reinforcement learning leaderboard is DeepMind, a London-based research firm. In fact, the first blog shared by the firm in 2016 was about this technique, and there’s barely any models launched by them since that go without the use of reinforcement learning. 

To gain further insight into the current discourse within the company about abandoning reinforcement learning, Analytics India Magazine spoke to Pushmeet Kohli, head of research (AI for science, robustness and reliability) at DeepMind.

Powerful But Not For All Problems

The reigning supremacy of reinforcement learning is due to its ability to develop behaviour by taking actions and getting feedback, much similar to the way humans and animals learn by interacting with their environments. Furthermore, RL does not require labelled datasets and makes real-life decisions based on a reward system—closely mimicking human behaviour.

“Reinforcement learning is an important toolbox in the things required to create an intelligence system. It plays a crucial role in DeepMind’s work, be it AlphaGo or AlphaTensor. It is important but not the only technique we need to create intelligent systems,” said Kohli. 

Also read: Kohli On Solving Intelligence at DeepMind for Humanity & Science

DeepMind is keenly invested in “deep reinforcement learning”. RL models require extensive training to learn the simplest things and are rigidly constrained to the narrow domain they are trained on. But it is not the only one. Big names like OpenAI, Google Brain, Facebook and Microsoft have made RL their priority for investment and research. However, for the time being, developing deep reinforcement learning models requires exorbitantly expensive computing resources, which makes research in the area limited to deep-pocketed companies. 

No doubt, RL is a powerful approach but it is not fit for every problem. For problems that require sequential decision-making—a series of decisions that all affect one another—this technique would be perfect. For instance, it should be leveraged while developing an AI model to win a game. It isn’t enough for the algorithm to make one good decision but a whole sequence of good decisions. By providing a single reward for a positive outcome, RL weeds out solutions that result in low rewards and elevates those that enable an entire sequence of good decisions.

DeepMind is Open Minded

The reinforcement learning chatter recently gained popularity when Yann LeCun, in jest, suggested abandoning the technique for the betterment of the community. In an exclusive conversation with AIM, LeCun said that though RL is inevitable in machine learning, the purpose behind incorporating it in algorithms should be to minimise its use eventually.

LeCun has been an advocate of self-supervised learning and claimed innovations using SSL had worked better than he anticipated. He further said that even ChatGPT uses SSL more than RL, but there are only two obstacles—defining explicit objectives and planning abilities. 

Commenting on the debate, Kohli said, “Models require other types of learning methodologies as well. So, we see them as part of a broader collection of techniques that DeepMind is developing. We think that they all have importance in different contexts and we should be leveraging them accordingly. Rather than abandoning one way or the other. Our perspective is that we need to be open minded about all the signals and leverage them.”

However, contrary to Kohli’s belief, DeepMind had published a paper in 2021 where the team had argued that ‘Reward is Enough‘ for all kinds of intelligence. Specifically, they argued that “maximising reward is enough to drive behaviour that exhibits most if not all attributes of intelligence.” The paper was highly critiqued by the community and hotly debated upon. As a result, several members concluded that reward is enough but not efficient

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.