Now Reading
Deepmind Releases New Framework For Robots’ Object Stacking Capabilities

Deepmind Releases New Framework For Robots’ Object Stacking Capabilities

  • DeepMind presents a new standard for improving robots' ability to stack objects.

DeepMind suggests a new benchmark for improving robots’ ability to stack objects. For the majority of people, stacking an object on top of another is a simple task. However, even the most complicated robots fail to perform more than one of these tasks concurrently. Stacking needs a variety of motor, sensory, and analytical abilities, as well as the capacity to interact with a variety of items. 

RGB-Stacking is a new benchmark for vision-based robotic manipulation, according to researchers. A robot must learn how to grasp several objects and balance them on top of one another in this benchmark. The variety of objects employed in the research, as well as the huge number of empirical evaluations undertaken to support their findings, distinguishes it from previous work. Their findings show that a mix of simulation and real-world data may be utilised to learn complicated multi-object handling. They provide a solid foundation for the open topic of generalising to novel items. They’re open-sourcing a version of our simulated environment, as well as the designs for developing their real-robot RGB-stacking environment, the RGB-object models and information for 3D printing them, to help other researchers. Additionally, they’re also making a variety of libraries and tools used in robotics research available to the public.

How To Start Your Career In Data Science?

Source: DeepMind RGB-Stacking

Objective

The goal of RGB-Stacking is to train a robotic arm to stack objects of various shapes using reinforcement learning. The researchers suspend a parallel gripper coupled to a robot arm above a basket with three objects — one red, one green, and one blue — in the basket, hence the name RGB. The objective is straightforward: stack the red object on top of the blue object in less than 20 seconds, while the green object acts as an obstruction and diversion. Through training on a variety of object sets, the learning process ensures that the agent gains generic skills. The researchers varied the grip and stack affordances — the characteristics that dictate how the agent can grasp and stack each object — deliberately. This design principle compels the agent to engage in behaviours other than simply pick-and-place.

Challenge

RGB-Stacking features two challenge variants with varying degrees of difficulty:

  • Skill Mastery
  • Skill Generalisation

The objective of “Skill Mastery” is to train a single agent to be proficient at stacking a specified set of five triplets. They employ the same triplets for evaluation in “Skill Generalisation,” but train the agent on a vast set of training items — totaling more than a million possible triplets. To assess generalisation, these training objects are chosen to be distinct from the group of objects where the test triplets were selected.

See Also

The researchers separate their learning pipeline into three phases in both versions:

  • First, they train in simulation using a commercial reinforcement learning technique called Maximum a posteriori Policy Optimization (MPO). At this step, they take advantage of the simulator’s state, which enables rapid training by providing the agent with the object positions directly, rather than the agent having to learn to detect the objects in images. Because this information is not available in the real world, the generated policy cannot be directly transferred to the real robot.
  • They then simulate a new policy using solely realistic observations: visuals and the robot‘s proprioceptive state. They improve transfer to real-world visuals and dynamics by using a domain-randomised simulation. The state policy acts as a teacher, providing corrections to the learning agent’s behaviours, which are then turned into the new policy. 
  • Finally, they gather data using this approach on real robots and train an improved policy offline using a learnt Q function to weight good transitions, as done in Critic Regularised Regression (CRR). This enables us to leverage data acquired passively throughout the project rather than executing a time-consuming online training algorithm on real robots.

As researchers continue to work on the open problem of generalisation in robotics, let us hope that this new standard contributes to the development of new concepts and approaches that make manipulation easier and robots more capable.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join our Telegram Group. Be part of an engaging community

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top