MITB Banner

What Are Evolving Reinforcement Learning Algorithms

Share

Reinforcement learning (RL) algorithm powers the brains of walking robots and AI chess grandmasters. The algorithm uses neat tricks (policies) that hunt targets by rewarding itself; by nudging itself to the destination.

Reinforcement learning systems rely on the framework of a Markov decision process (MDPs). MDPs in their ideal state are not easily available to the learning algorithm in a real-world environment. In practical and scalable real-world scenarios, RL systems usually run into the following challenges:

  • the absence of reset mechanisms, 
  • state estimation  
  • reward specification 

For example, in robotics, collecting high-quality data for a task is very challenging. To achieve the generalisation–what ML is all about– in robotics, it may require smarter reinforcement algorithms that take advantage of vast amounts of prior data unlike computer vision, where humans can label the data.

Learning to learn was first popularised by Juergen Schmidhuber in his 1987 thesis: meta-learning with genetic programming. As defined by Prof. Schmidhuber, “metalearning means learning the credit assignment method itself through self-modifying code. Meta Learning may be the most ambitious but also the most rewarding goal of machine learning. There are few limits to what a good meta learner will learn. Where appropriate it will learn to learn by analogy, by chunking, by planning, by subgoal generation, by combinations thereof – you name it.”

While RL is used for AutoML, automating RL hasn’t had much success. Unlike supervised learning, explained the authors, the RL design decisions that affect learning and performance are usually chosen through trial and error. AutoRL bridges this gap by applying the AutoML framework from supervised learning to the MDP setting in RL. 

Now, to make reinforcement learning agents smarter, the researchers at Google have proposed a new method. In a paper titled, “Evolving Reinforcement Learning Algorithms”, the researchers have introduced a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimise. The learned algorithms perform independent of the domain they are operating, and can generalise to new environments not seen during training.

Algorithms That Evolve

(Source: Paper by Co-Reyes et al.,)

Previous works on learning RL algorithms applied meta-gradients, evolutionary strategies, and RNNs. The Google researchers represented the update rule as a computation graph that includes both neural network modules and symbolic operators. The resulting graph can be interpreted analytically and can optionally be initialised from known existing algorithms.

The researchers describe the RL algorithm as general programs with a domain specific language. “We target updates to the policy rather than reward bonuses for exploration,” they explained. The agent state, policy parameters and other factors are mapped to scalar loss, which will be used to optimise with gradient descent. The computational graph here is a directed acyclic graph (DAG) of nodes with typed inputs and outputs.

  • Search is carried over programs with a maximum of 20 nodes, not including inputs or parameter nodes. 
  • Mutations occur with probability 0.95. Otherwise, a new random program is sampled. 
  • The search is done over 300 CPUs and runs for roughly 72 hours, at which point around 20,000 programs have been evaluated.

As shown in the illustration above, the mutator component produces a new algorithm by skimming through top-performing algorithms. The algorithm’s performance is then evaluated over a set of training environments, and the population is updated. This allows incorporation of existing knowledge by starting the population from known RL algorithms instead of purely from scratch.

For evaluating the learnability of the RL algorithms, the researchers have used the popular CartPole, and Lunar Lander challenges. If an algorithm succeeds on CartPole, it then proceeds to more challenging training environments. “For learning from scratch we also compare the effect of the number of training environments on the learned algorithm by comparing training on just CartPole versus training on CartPole and LunarLander,” they added.

The results show that this method is capable of automatically discovering algorithms on par with recently proposed RL research, and empirically attain better performance than deep Q-learning methods.

Key Takeaways

The paper focuses on task-agnostic RL update rules in the value-based RL setting that are both interpretable and generalisable. This work takes the best of reinforcement learning and AutoML techniques to bolster the domain of AutoRL. The contributions can be summarised as follows:

  • Introduction of a new method that improves the “learning to learn” ability in algorithms.
  • Introduction of a general language for representing algorithms which compute the loss function for value-based model-free RL agents to optimise. 
  • The two new learned RL algorithms perform good generalisation performance over a wide range of environments. 

Find the original paper here.

PS: The story was written using a keyboard.
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed