Last updated February 8, 2022
In AI Origins & Evolution

New research on enabling a vision-based robotic manipulation system

The Google AI study concluded that robots could use the BC-Z system to complete 24 new tasks with an average success rate of 44%.

Share

Published on February 8, 2022

by Poulomi Chatterjee

Robots with the ability to interact with the real-world and navigate multiple novel tasks based on random user commands remain the holy grail of robotics. While research in general-purpose robots has made great strides, machines with the human-like ability to learn something new on their own is still a distant dream.

Of late, the robotics team at Google AI published a paper demonstrating how robots can understand new instructions and figure out how to finish a novel task. The research tackled the problem of helping robots adapt to generalisable language models using a visual system.

The paper titled “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning” aimed to prove that having a broader and scaled-up dataset strengthened the robot’s generalisation abilities.

The study was divided into two parts:

A large demonstration dataset that included 100 different tasks
A neural network policy

The study concluded that robots could use the BC-Z system to complete 24 new tasks with an average success rate of 44%.

Data collection

The study collected data by remote-controlling the robot using a virtual reality headset. The researchers then recorded the robots demonstrating each task. When the robot has finished learning a policy, the researcher deploys the policy under tight supervision. As soon as the robot gets stuck or makes a mistake, the researcher interferes, and course corrects.

Berkeley Artificial Intelligence Research or BAIR developed a visual training method called One-Shot Imitation, which combined model-agnostic meta learning (MAML) and imitation learning. In model-agnostic meta learning, a model could use a small sample dataset and apply it to various learning problems like regression and reinforcement learning.

Google AI used this method of visual training along with periodic human intervention.

The mixed approach, which includes both demonstration and intervention, led to a notable improvement in the robot’s performance. Sequential problems like imitation learning rely on observations from past actions, which can cause compounding errors. The data collection strategy led to better results than experiments that only used human demonstrations.

Can robots 🤖 do tasks they weren't trained to do?

BC-Z trains a robot to do 100 tasks prompted with language instructions.

The robot generalizes to 20 *new* tasks.

Our @GoogleAI Blog Post: https://t.co/HMqp2J8gOL pic.twitter.com/YLMfrMJcKD
— Chelsea Finn (@chelseabfinn) February 3, 2022

Training

The data was used to do all 100 tasks by training a neural network policy to map the robot’s positioning and orientation from camera images. The next process was to describe the task either as a language command or video, where a person shows how to do the task.

After the policy was trained and conditioned to the instructions, there was a chance that the neural network could interpret them to do a new task. The robot will face the challenge of identifying the relevant objects and ignoring cluttered objects in its environment.

Results

Out of 28 held-out tasks, the robot succeeded in completing 24 tasks, suggesting the experiment was successful to a certain degree. Also, natural language models can give robots flexibility using pre-trained language embeddings. Furthermore, language models can generalise concepts in the training data. The compositional generalisation capabilities can be transferred to robots to help them follow instructions for pairs of objects that were previously unseen together. The study shows human intervention at essential moments can speed up the learning curve for a robot to adapt to new tasks. The solution to the grand problem of robots being able to perform new tasks independently may still be a far off dream, but this indicates gradual progress in this regard.

Access all our open Survey & Awards Nomination forms in one place

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.