New research on enabling a vision-based robotic manipulation system

The Google AI study concluded that robots could use the BC-Z system to complete 24 new tasks with an average success rate of 44%.

Robots with the ability to interact with the real-world and navigate multiple novel tasks based on random user commands remain the holy grail of robotics. While research in general-purpose robots has made great strides, machines with the human-like ability to learn something new on their own is still a distant dream. 

Of late, the robotics team at Google AI published a paper demonstrating how robots can understand new instructions and figure out how to finish a novel task. The research tackled the problem of helping robots adapt to generalisable language models using a visual system.

The paper titled “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning” aimed to prove that having a broader and scaled-up dataset strengthened the robot’s generalisation abilities. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The study was divided into two parts: 

  • A large demonstration dataset that included 100 different tasks
  • A neural network policy

The study concluded that robots could use the BC-Z system to complete 24 new tasks with an average success rate of 44%. 

Download our Mobile App

Data collection

The study collected data by remote-controlling the robot using a virtual reality headset. The researchers then recorded the robots demonstrating each task. When the robot has finished learning a policy, the researcher deploys the policy under tight supervision. As soon as the robot gets stuck or makes a mistake, the researcher interferes, and course corrects.

Berkeley Artificial Intelligence Research or BAIR developed a visual training method called One-Shot Imitation, which combined model-agnostic meta learning (MAML) and imitation learning. In model-agnostic meta learning, a model could use a small sample dataset and apply it to various learning problems like regression and reinforcement learning.

Google AI used this method of visual training along with periodic human intervention.

The mixed approach, which includes both demonstration and intervention, led to a notable improvement in the robot’s performance. Sequential problems like imitation learning rely on observations from past actions, which can cause compounding errors. The data collection strategy led to better results than experiments that only used human demonstrations. 


The data was used to do all 100 tasks by training a neural network policy to map the robot’s positioning and orientation from camera images. The next process was to describe the task either as a language command or video, where a person shows how to do the task. 

After the policy was trained and conditioned to the instructions, there was a chance that the neural network could interpret them to do a new task. The robot will face the challenge of identifying the relevant objects and ignoring cluttered objects in its environment. 


Out of 28 held-out tasks, the robot succeeded in completing 24 tasks, suggesting the experiment was successful to a certain degree. Also, natural language models can give robots flexibility using pre-trained language embeddings. Furthermore, language models can generalise concepts in the training data. The compositional generalisation capabilities can be transferred to robots to help them follow instructions for pairs of objects that were previously unseen together. The study shows human intervention at essential moments can speed up the learning curve for a robot to adapt to new tasks. The solution to the grand problem of robots being able to perform new tasks independently may still be a far off dream, but this indicates gradual progress in this regard.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Our Upcoming Events

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023

21 Jul, 2023 | New York
MachineCon USA 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox