How To Make Reinforcement Learning Work For Robotics

Reinforcement learning is one of the most happening domains within AI since the early days. The innovations are often ingenious, but we rarely see them in the real world. Robotics is one area where reinforcement learning is widely used, where robots usually learn novel behaviours through trial and error interactions.

However, there are challenges posed by the fundamental design of state-of-the-art reinforcement learning algorithms, which comes to light in unstructured settings like indoors of a house, where the robotic systems need to be diverse to adapt to the real world.

Simulation is another alternative to teach robots to behave in a certain way. However, this technique comes with many limitations as well. One such problem is again the adaptiveness of the systems. The researchers at Berkeley’s AI department said that improvements in simulation performance may not translate to improvements in the real world. Additionally, if a new simulation needs to be created for every new task and environment, content creation can become prohibitively expensive.


Sign up for your weekly dose of what's up in emerging technology.

Therefore, they suggest concentrating more on reinforcement learning models and its challenges. Exploring the same, they conducted a few experiments and came up with certain recommendations that can push the boundaries of real-world reinforcement learning.

Dealing With The Challenges

Reinforcement learning systems rely on the framework of a Markov decision process (MDPs) and idealised MDP, say the researchers, are not easily available to the learning algorithm in a real-world environment.

Download our Mobile App

To begin with, they list down three main challenges that one encounters with RL systems in practical and scalable environments:

  1. the absence of reset mechanisms, 
  2. state estimation and 
  3. reward specification 
via BAIR
  1. Explaining about the disadvantages of assuming a reset mechanism, the researchers say that in real world environments reset mechanism does not usually exist, and since RL algorithms almost always assume access to an episodic reset mechanism, this can be impractical. So, what do we do?

Recommendation: The team at Berkeley suggests a perturbation to the existing state of the agent so that it never stays in the same state for too long. To do this, they propose a perturbation controller, which is trained with the goal of taking the agent to less explored states of the world.

  1. Next comes the challenge of feature engineering. The real-world environments might not always be equipped with sensors and other data collecting entities, which is usually how RL systems flourish. Absence of motion capture or vision-based tracking systems is a problem. So what do we do?

Recommendation: Using unsupervised representation learning techniques, the team treated images into their fundamental forms — latent features. These latent features contain key information and make learning easier. For this work, they experimented with a variational encoder. 

  1. The most important mechanism of any RL system pivots around its reward functions. Rewards are diligently designed mathematical nudges that push a system or a machine towards a certain goal. However, in the real world, one can’t expect external inputs, which a reward function heavily relies on. So what do we do?

Recommendation: The researchers propose a self-assign reward mechanism where a human operator provides images that depict successful outcomes, a means to specify the desired task. These images are used to make a classifier that learns well and the likelihood of this classifier is used to self-assign reward through the learning process. 

Do These Recommendations Work?

The researchers put their aforementioned recommendations into practice and obtained promising results. 

As can be seen in the video above, which is a timelapse of a three-fingered robotic hand recorded overnight. 

However, the team admits that their solution is not yet perfect since the experiments were done in a small workspace and on simple tasks. They feel that the following avenues can provide a good starting point going forward:

  • Making the robots safer without the need for any interruption is still an uphill task and is crucial in a real-world human environment. If a robotic system has to be effective, it should require less human interference and safe long-term operation. This, the researchers believe, would be a crucial step going forward.
  • Facilitating knowledge transfer across tasks is another key aspect, and there is a need for efficient off-policy learning systems.  

This whole work aims to fortify the paradigm of data-driven reinforcement learning methods for more practical purposes.

Know more about this work here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Do machines feel pain?

Scientists worldwide have been finding ways to bring a sense of awareness to robots, including feeling pain, reacting to it, and withstanding harsh operating conditions.

IT professionals and DevOps say no to low-code

The obsession with low-code is led by its drag-and-drop interface, which saves a lot of time. In low-code, every single process is shown visually with the help of a graphical interface that makes everything easier to understand.

Neuralink elon musk

What could go wrong with Neuralink?

While the broad aim of developing such a BCI is to allow humans to be competitive with AI, Musk wants Neuralink to solve immediate problems like the treatment of Parkinson’s disease and brain ailments.

Understanding cybersecurity from machine learning POV 

Today, companies depend more on digitalisation and Internet-of-Things (IoT) after various security issues like unauthorised access, malware attack, zero-day attack, data breach, denial of service (DoS), social engineering or phishing surfaced at a significant rate.