One of the most important innovations in the present era for the development of highly-advanced AI systems has been the introduction of Reinforcement Learning (RL). It has the potential to solve complex decision-making problems.
It generally follows a “trial and error” method to learn optimal policies of a given problem. It has been used to achieve superhuman performance in competitive strategy games, including Go, Starcraft, Dota, among others.
Despite the promise shown by reinforcement algorithms in many decision-making problems, there are few glitches and challenges, which still need to be addressed.
And, this is where constrained reinforcement learning comes into play.
Constrained Reinforcement Learning
via OpenAI (Source)
Constrained Reinforcement Learning helps a model to learn about costly mistakes without actually having to experience them. Constrained RL is in a way, similar to how standard RL functions. However, in the case of the constrained system, the environment is embedded with cost functions that restrict the agents from taking certain paths.
The fundamental principle of standard RL is that an agent, the AI system, tries to maximize a reward signal by trial and error method as a method of safe exploration. This safe exploration problem can sometimes try dangerous or harmful behaviors in the course of learning.
Designing a reward function is fundamentally hard. It also includes the challenges of choosing between task performance and satisfying the safety requirements. On the other hand, in constrained RL the system mitigates these challenges by figuring out the trade-offs with a suitable and safe outcome.
In order to establish a more reliable platform for building reinforcement learning models, OpenAI announced its safety gym where the developers can play around with cost functions and design safer systems.
Safety Gym is a set of environment and tools which helps in measuring progress towards reinforcement learning agents as well as accelerating safe exploration research. In order to study constrained RL, researchers from OpenAI developed the platform, Safety Gym.
It mainly consists of two components as mentioned below:
- An environment-builder which allows a user to create a new environment by mixing and matching from a wide range of physics elements, goals, and safety requirements.
- A suite of pre-configured benchmark environments to help standardize the measurement of progress on the safe exploration problem.
Overview Of OpenAI Safety Gym
In all Safety Gym environments, the agent perceives the environment through a robot’s sensors and interacts with the environment through its actuators. The robot has to navigate through a cluttered environment to achieve a task. There are mainly three pre-made robots which are Point, Car, and Doggo.
- Point: It is a simple robot constrained to the 2D-plane, with one actuator for turning and another for moving forward/backward.
- Car: Car is a slightly more complex robot that has two independently-driven parallel wheels and a free-rolling rear wheel.
- Doggo: Doggo is a quadrupedal robot with bilateral symmetry. It is designed in such a manner that a uniform random policy should keep the robot from falling over and generate some travel.
The Safety Gym environment-builder currently supports three main tasks which are Goal, Button, and Push, along with two levels of difficulty for each task.
Home » What Is Constrained Reinforcement Learning And How Can One Build Systems Around It
- Goal: This task is accomplished by moving the robot to a series of goal positions. When a goal is achieved, the goal location is randomly reset to someplace new, while keeping the rest of the layout the same.
- Button: This task is done by pressing a series of goal buttons.
- Push: This task includes the moving of a box to a series of goal positions.
Currently, the Safety Gym environment-builder supports five main kinds of elements relevant to safety requirements which are hazards, vases, pillars, buttons, and gremlins.
In one of our articles, we discussed why one should consider reinforcement learning while solving a problem and when it is the right approach for a specific problem. In this article, we will discuss the importance of constrained reinforcement learning and how Open AI’s Safety Gym will help the researchers to construct a more advanced RL system.
In certain cases, safety is considered as one of the most concerned cases. For instance, the terrible accident which happened last year by Uber self-driving car in Tempe, Arizona. It happened because the victim was classified as an unknown object, a vehicle, and a bicycle.
There is no doubt that the reinforcement learning systems still need a lot of improvement to have any large scale deployment in the future and innovations like the ones discussed above takes us closer to establishing such safer systems.
According to the researchers at OpenAI, the Safety Gym is the first benchmark of high-dimensional continuous control environments for evaluating the performance of constrained RL algorithms.
In order to clarify that Safety Gym proves to be state-of-the-art in safe exploration, the researchers have also benchmarked several popular constrained and unconstrained RL algorithms on the Safety Gym environments which are believed to ease the designing process.
Provide your comments below
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box. Contact: firstname.lastname@example.org