What Is Constrained Reinforcement Learning And How Can One Build Systems Around It

Published on November 25, 2019

by Ambika Choudhury

One of the most important innovations in the present era for the development of highly-advanced AI systems has been the introduction of Reinforcement Learning (RL). It has the potential to solve complex decision-making problems.

It generally follows a “trial and error” method to learn optimal policies of a given problem. It has been used to achieve superhuman performance in competitive strategy games, including Go, Starcraft, Dota, among others.

Despite the promise shown by reinforcement algorithms in many decision-making problems, there are few glitches and challenges, which still need to be addressed.

And, this is where constrained reinforcement learning comes into play.

Constrained Reinforcement Learning

via OpenAI (Source)

Constrained Reinforcement Learning helps a model to learn about costly mistakes without actually having to experience them. Constrained RL is in a way, similar to how standard RL functions. However, in the case of the constrained system, the environment is embedded with cost functions that restrict the agents from taking certain paths.

The fundamental principle of standard RL is that an agent, the AI system, tries to maximize a reward signal by trial and error method as a method of safe exploration. This safe exploration problem can sometimes try dangerous or harmful behaviors in the course of learning.

Designing a reward function is fundamentally hard. It also includes the challenges of choosing between task performance and satisfying the safety requirements. On the other hand, in constrained RL the system mitigates these challenges by figuring out the trade-offs with a suitable and safe outcome.

In order to establish a more reliable platform for building reinforcement learning models, OpenAI announced its safety gym where the developers can play around with cost functions and design safer systems.

Safety Gym is a set of environment and tools which helps in measuring progress towards reinforcement learning agents as well as accelerating safe exploration research. In order to study constrained RL, researchers from OpenAI developed the platform, Safety Gym.

It mainly consists of two components as mentioned below:

An environment-builder which allows a user to create a new environment by mixing and matching from a wide range of physics elements, goals, and safety requirements.

A suite of pre-configured benchmark environments to help standardize the measurement of progress on the safe exploration problem.

Overview Of OpenAI Safety Gym

In all Safety Gym environments, the agent perceives the environment through a robot’s sensors and interacts with the environment through its actuators. The robot has to navigate through a cluttered environment to achieve a task. There are mainly three pre-made robots which are Point, Car, and Doggo.

Point: It is a simple robot constrained to the 2D-plane, with one actuator for turning and another for moving forward/backward.
Car: Car is a slightly more complex robot that has two independently-driven parallel wheels and a free-rolling rear wheel.
Doggo: Doggo is a quadrupedal robot with bilateral symmetry. It is designed in such a manner that a uniform random policy should keep the robot from falling over and generate some travel.

The Safety Gym environment-builder currently supports three main tasks which are Goal, Button, and Push, along with two levels of difficulty for each task.

Goal: This task is accomplished by moving the robot to a series of goal positions. When a goal is achieved, the goal location is randomly reset to someplace new, while keeping the rest of the layout the same.
Button: This task is done by pressing a series of goal buttons.
Push: This task includes the moving of a box to a series of goal positions.

Currently, the Safety Gym environment-builder supports five main kinds of elements relevant to safety requirements which are hazards, vases, pillars, buttons, and gremlins.

Going Forward

In one of our articles, we discussed why one should consider reinforcement learning while solving a problem and when it is the right approach for a specific problem. In this article, we will discuss the importance of constrained reinforcement learning and how Open AI’s Safety Gym will help the researchers to construct a more advanced RL system.

In certain cases, safety is considered as one of the most concerned cases. For instance, the terrible accident which happened last year by Uber self-driving car in Tempe, Arizona. It happened because the victim was classified as an unknown object, a vehicle, and a bicycle.

There is no doubt that the reinforcement learning systems still need a lot of improvement to have any large scale deployment in the future and innovations like the ones discussed above takes us closer to establishing such safer systems.

According to the researchers at OpenAI, the Safety Gym is the first benchmark of high-dimensional continuous control environments for evaluating the performance of constrained RL algorithms.

In order to clarify that Safety Gym proves to be state-of-the-art in safe exploration, the researchers have also benchmarked several popular constrained and unconstrained RL algorithms on the Safety Gym environments which are believed to ease the designing process.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Former Google DeepMind Researchers Go Deep for Sales Triumph

DeepMind Wants to Take Humans Out of RLHF

Who Will Win the AGI Race?

Google Introduces Offline Reinforcement Learning to Train AI Agents

Top Reinforcement Learning Algorithms

Human Feedback Frenzy: How it Turns AI into Narcissistic, Control-Freak Machines

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Siddharth Jindal

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the