MITB Banner

How can language be used for exploration tasks in reinforcement learning

DeepMind researchers have introduced a novel method where agents are endowed with prior knowledge in the form of abstractions that are derived from large vision language models which are pretrained on image captioning data.

Share

exploration task

Exploration is an important part of reinforcement learning; here, the agents learn to predict and control stochastic and unknown environments. Exploration is essential because it fetches important information, a lack of which could hinder effective learning. That said, it is one of the challenging tasks. Over the years of research, it has been found that an effective way to increase an agent’s tendency to explore is to augment trajectories with intrinsic rewards for reaching novel environment states. However, the challenge here again is on what states are considered novel – this, in turn, relies on how environment states are represented.

To deal with this challenge, DeepMind researchers have introduced a novel method where agents are endowed with prior knowledge in the form of abstractions that are derived from large vision language models which are pretrained on image captioning data.

The research

Earlier, research work on novelty-driven exploration offered several approaches for deriving state representation in reinforcement learning agents. Among the most popular method is employing random features where the state is represented by embedding visual observations with fixed and randomly initialized target networks. Another popular method is learning visual features that are taken from an inverse dynamics model.

Both these methods, and others like them, work well for classic 2D environments; their effectiveness for 3D environments that are high dimensional and partially observable is still not proved. In 3D environments, there are challenges like different viewpoints of the same scene map to distinct features; therefore, there is difficulty in identifying a good mapping between visual state and feature space, which is further exacerbated by the fact that useful state abstractions are highly task-dependent. The DeepMind researchers in their paper call acquiring environment representations that support effective exploration as a ‘chicken-and-egg’ problem. This means that an agent can understand whether two states should be considered similar or different only when it has effectively explored its environment.

Now, what if an agent is taught representations in the form of abstractions that are derived from language models? That’s what the DeepMind researchers did. They hypothesized that the representations that are acquired by these vision-language pretraining result in effective exploration of 3D environments because they are shaped by the unique abstract nature of language models.

The researchers were able to demonstrate that language abstraction and pretrained vision-language improve the sample efficiency of existing exploration methods. The benefit of this model is seen across on- and off-policy algorithms, different 3D domains, and other task specifications. The team also designed control experiments to understand how language contributes to better exploration and found that their results are consistent with perspectives from cognitive science on the utility of human language. 

Read the full paper here.

Language for exploration

Using language to learn exploration is not a new method. Many researchers before have also explored this technique. In 2021, a group of Stanford researchers introduced Exploration through Learned Language Abstraction (ELLA), a reward shaping approach for boosting sample efficiency in sparse reward environments by correlating high-level instructions with simpler constituents. More specifically, this approach had two important components – a termination classifier to identify when agents complete low-level instructions and a relevance classifier that correlates low-level instructions with success on high-level tasks.

Earlier this year, a group of researchers published a paper titled – Improving Intrinsic Exploration with Language Abstractions, which showed how natural language could be used as a general medium for highlighting relevant abstractions in an environment. This work was different from the previous ones because the researchers evaluated whether the language could improve over existing exploration methods by directly extending to competitive exploration baselines like AMIGo and NovelD. They found an improvement of 45-85 per cent over their non-linguistic counterparts.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.