MITB Banner

Top Exploration Methods In Reinforcement Learning

Researchers from Mila AI Institute, Quebec, in their survey, have departed from the traditional categorisation of RL exploration methods and given a treatise into RL exploration methods.

Share

Over the years, many exploration methods have been formulated by incorporating mathematical approaches. Reinforcement Learning (RL) exploration techniques have been categorised generally into undirected and directed methods based on the choice of information considered by the exploration algorithm.

(Image credits: Survey by Amin et al.,)

However, in their survey, researchers from Mila AI Institute, Quebec, have departed from the traditional categorisation of RL exploration methods and given a treatise into RL exploration methods by segregating based on rewards, memory, and more. In the next section, we briefly discuss a few of the most important exploration methods listed above.

Reward-Free vs Reward-Based Exploration

According to the survey, the reward-free techniques select actions of an agent randomly, use some intrinsic information for guiding exploration, or utilise some notion of intrinsic information to guide exploration without taking into account extrinsic rewards. Reward-free exploration comes in handy in environments where the reward signal is not immediately available to the agent. In comparison, reward-based exploration methods leverage the information related to the reward signal. The methods are categorised based on the type of information used and how it is used in the selection of exploratory actions.

Memory-free vs Memory-based Exploration

Memory-free exploration methods only take the state of the environment into account. Whereas, memory-based consider additional information about the history of the agent’s interaction with the environment. For example, ​​DeepMind’s Agent57, which set a new benchmark for Atari games last year, employed episodic memory in their RL policy. 

Blind Exploration

Blind exploration methods explore environments via a random action selection. The agents are not guided through their exploratory path by any form of information. Thus, they are categorised as uninformed or blind. 

Intrinsically-Motivated Exploration

As part of reward-free exploration, intrinsically motivated exploration methods utilise an intrinsic motivation to explore the unexplored parts of the environment. Unlike blind exploration, intrinsically motivated exploration techniques utilise some form of intrinsic information to encourage exploring the state-action spaces in the absence of external rewards.

Value-Based Methods

This exploration approach selects the stochastic actions based on value function or rewards from the environment. These methods use these value functions to decide if the preferred action is a more knowledge acquisition or reward maximisation.

Policy-Search Based Methods

Unlike value-based methods, policy-search based methods, as the name suggests, explicitly represent a policy instead of, or in addition to, a value function. Most policy search methods learn a stochastic policy. The initialisation of the exploration policy can be freely chosen. In some policy architectures, the amount of exploration is fixed to some constant or decreased according to a set schedule.

Randomised Action Selection Exploration Methods

Randomised exploration methods assign action selection probabilities to the possible actions based on the estimated value functions/rewards or policies, akin to Value-Based Exploration and Policy-Search Based Exploration. 

Optimism/Bonus-Based Exploration

In this method, actions with uncertain values are preferred over the rest of the possible actions. As the name suggests, these exploration methods usually involve a form of bonus, which is added to the reward. Bonus-based techniques utilise an extrinsic reward for motivating the exploration of the environment.

Deliberate Exploration 

Deliberate exploration deals with Bayes-Adaptive exploration methods. As per the survey, deliberate exploration requires the computation of posterior distribution over models and updating it assuming a prior over the transition dynamics. This category also consists of Meta-Learning Based Exploration techniques, via which the agent learns to adapt quickly using the prior given tasks.

Probability Matching 

This exploration method decides an action by sampling a single instance from the posterior belief over environments or value functions (feedback) and solving for that sampled environment exactly. The agent then acts in accordance with that solution. Each action is thus taken with the probability that the agent considers it to be the optimal action. 

Meta-Learning Based Methods

In meta learning-based RL, agents interact with multiple train Markov Decision Process(MDP), allowing them to learn a strategy. According to the survey, Meta-reinforcement learning strategies have the potential to learn an approximately optimal exploration-exploitation trade-off with respect to MDPs.

Learn more about the exploration methods in this survey.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.