Robots with the ability to interact with the real-world and navigate multiple novel tasks based on random user commands remain the holy grail of robotics. While research in general-purpose robots has made great strides, machines with the human-like ability to learn something new on their own is still a distant dream.
Of late, the robotics team at Google AI published a paper demonstrating how robots can understand new instructions and figure out how to finish a novel task. The research tackled the problem of helping robots adapt to generalisable language models using a visual system.
The paper titled “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning” aimed to prove that having a broader and scaled-up dataset strengthened the robot’s generalisation abilities.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
The study was divided into two parts:
- A large demonstration dataset that included 100 different tasks
- A neural network policy
The study concluded that robots could use the BC-Z system to complete 24 new tasks with an average success rate of 44%.
Download our Mobile App
The study collected data by remote-controlling the robot using a virtual reality headset. The researchers then recorded the robots demonstrating each task. When the robot has finished learning a policy, the researcher deploys the policy under tight supervision. As soon as the robot gets stuck or makes a mistake, the researcher interferes, and course corrects.
Berkeley Artificial Intelligence Research or BAIR developed a visual training method called One-Shot Imitation, which combined model-agnostic meta learning (MAML) and imitation learning. In model-agnostic meta learning, a model could use a small sample dataset and apply it to various learning problems like regression and reinforcement learning.
Google AI used this method of visual training along with periodic human intervention.
The mixed approach, which includes both demonstration and intervention, led to a notable improvement in the robot’s performance. Sequential problems like imitation learning rely on observations from past actions, which can cause compounding errors. The data collection strategy led to better results than experiments that only used human demonstrations.
The data was used to do all 100 tasks by training a neural network policy to map the robot’s positioning and orientation from camera images. The next process was to describe the task either as a language command or video, where a person shows how to do the task.
After the policy was trained and conditioned to the instructions, there was a chance that the neural network could interpret them to do a new task. The robot will face the challenge of identifying the relevant objects and ignoring cluttered objects in its environment.
Out of 28 held-out tasks, the robot succeeded in completing 24 tasks, suggesting the experiment was successful to a certain degree. Also, natural language models can give robots flexibility using pre-trained language embeddings. Furthermore, language models can generalise concepts in the training data. The compositional generalisation capabilities can be transferred to robots to help them follow instructions for pairs of objects that were previously unseen together. The study shows human intervention at essential moments can speed up the learning curve for a robot to adapt to new tasks. The solution to the grand problem of robots being able to perform new tasks independently may still be a far off dream, but this indicates gradual progress in this regard.