Active Hackathon

Google’s New Algorithm Increases Deployment Efficiency With Low Costs In RL Algorithms

Recently, developers from Google Research teamed up with the University of Tokyo to introduce Deployment Efficiency and a model-based algorithm known as Behavior-Regularised Model-ENsemble (BREMEN). The algorithm is said to have the capability to optimise an effective policy offline using much lesser data.

Reinforcement Learning is one of the most trending techniques that have been used by a number of domains including robotics, operations research, medicine, autonomous driving and more. The technique has recently gained impressive success in learning behaviours for a number of sequential decision-making tasks.


Sign up for your weekly dose of what's up in emerging technology.

Behind the Model

According to the researchers, most of the Reinforcement Learning algorithms assume online access to the environment because assuming online access one can interleave updates to the policy with experience collection using that policy. However, doing so, the cost or potential risk of deploying a new data-collection policy is high and it can also become prohibitive to update the data-collection policy more than a few times during learning.

The researchers stated that if a task can be learned with a small number of data collection policies, then the costs, as well as risks, can be substantially reduced during the process. This is the reason behind developing a novel measure of RL algorithm performance, known as Deployment Efficiency. The deployment efficiency counts how many times the data-collection policy has been changed during improvement from random policy to solve the task. 

In order to develop an algorithm that is both sample-efficient and deployment efficient, each iteration of the algorithm between successive deployments has to work effectively on much smaller dataset sizes. However, using smaller datasets, the model cannot predict properly and results in poorer performance or can be said as extrapolation errors.

In order to better approach these problems arising in limited deployment settings, the researchers further proposed Behavior-Regularized Model-ENsemble (BREMEN). BREMEN learns an ensemble of dynamics models in conjunction with a policy using imaginary rollouts while implicitly regularising the learned policy via appropriate parameter initialisation and conservative trust-region learning updates.

The BREMEN model incorporates Dyna-style model-based RL, which learns an ensemble of dynamics models in combination with a policy using the imaginary rollouts from the ensemble as well as behaviour regularisation via conservative trust-region updates. 

For the baseline methods, the researchers used the open-source implementations of Soft-Actor-Critic (SAC), BC, BCQ, and Behaviour Regularised Actor-Critic (BRAC). They also used Adam as the optimiser, which is an algorithm for first-order gradient-based optimisation of stochastic objective functions.

Evaluating BREMEN

The researchers evaluated BREMEN on high-dimensional continuous control benchmarks and found out that it achieves impressive deployment efficiency results. The model-based algorithm is able to learn successful policies with only 5-10 deployments, while significantly surpassing the existing off-policy and offline reinforcement learning algorithms in the deployment-constrained setting.

The researchers further evaluated  BREMEN on standard offline Reinforcement Learning benchmarks, where only a single static dataset is used. To this, the researchers found out that BREMEN can not only achieve performance competitive with state-of-the-art when using standard dataset sizes but also learn with 10-20 times smaller datasets, which previous methods are unable to attain.

Wrapping Up

In this work, the researchers introduced deployment efficiency, which is a novel measure for reinforcement learning performance that counts the number of changes in the data-collection policy during learning. To enhance the deployment efficiency, they proposed Behavior-Regularised Model-ENsemble (BREMEN), which is a model-based offline algorithm with implicit KL regularisation via appropriate policy initialisation and trust-region updates.

 Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

How does the Indian Army want to use AI?

An AI system that can collect data, analyse them and present the same to the commander in a very short time frame is one of the key requirements for the Indian Army

How Data Science Can Help Overcome The Global Chip Shortage

China-Taiwan standoff might increase Global chip shortage

After Nancy Pelosi’s visit to Taiwan, Chinese aircraft are violating Taiwan’s airspace. The escalation made TSMC’s chairman go public and threaten the world with consequences. Can this move by China fuel a global chip shortage?