Active Hackathon

Deepmind Releases New Framework For Robots’ Object Stacking Capabilities

DeepMind presents a new standard for improving robots' ability to stack objects

DeepMind suggests a new benchmark for improving robots’ ability to stack objects. For the majority of people, stacking an object on top of another is a simple task. However, even the most complicated robots fail to perform more than one of these tasks concurrently. Stacking needs a variety of motor, sensory, and analytical abilities, as well as the capacity to interact with a variety of items. 

RGB-Stacking is a new benchmark for vision-based robotic manipulation, according to researchers. A robot must learn how to grasp several objects and balance them on top of one another in this benchmark. The variety of objects employed in the research, as well as the huge number of empirical evaluations undertaken to support their findings, distinguishes it from previous work. Their findings show that a mix of simulation and real-world data may be utilised to learn complicated multi-object handling. They provide a solid foundation for the open topic of generalising to novel items. They’re open-sourcing a version of our simulated environment, as well as the designs for developing their real-robot RGB-stacking environment, the RGB-object models and information for 3D printing them, to help other researchers. Additionally, they’re also making a variety of libraries and tools used in robotics research available to the public.


Sign up for your weekly dose of what's up in emerging technology.

Source: DeepMind RGB-Stacking


The goal of RGB-Stacking is to train a robotic arm to stack objects of various shapes using reinforcement learning. The researchers suspend a parallel gripper coupled to a robot arm above a basket with three objects — one red, one green, and one blue — in the basket, hence the name RGB. The objective is straightforward: stack the red object on top of the blue object in less than 20 seconds, while the green object acts as an obstruction and diversion. Through training on a variety of object sets, the learning process ensures that the agent gains generic skills. The researchers varied the grip and stack affordances — the characteristics that dictate how the agent can grasp and stack each object — deliberately. This design principle compels the agent to engage in behaviours other than simply pick-and-place.


RGB-Stacking features two challenge variants with varying degrees of difficulty:

  • Skill Mastery
  • Skill Generalisation

The objective of “Skill Mastery” is to train a single agent to be proficient at stacking a specified set of five triplets. They employ the same triplets for evaluation in “Skill Generalisation,” but train the agent on a vast set of training items — totaling more than a million possible triplets. To assess generalisation, these training objects are chosen to be distinct from the group of objects where the test triplets were selected.

The researchers separate their learning pipeline into three phases in both versions:

  • First, they train in simulation using a commercial reinforcement learning technique called Maximum a posteriori Policy Optimization (MPO). At this step, they take advantage of the simulator’s state, which enables rapid training by providing the agent with the object positions directly, rather than the agent having to learn to detect the objects in images. Because this information is not available in the real world, the generated policy cannot be directly transferred to the real robot.
  • They then simulate a new policy using solely realistic observations: visuals and the robot‘s proprioceptive state. They improve transfer to real-world visuals and dynamics by using a domain-randomised simulation. The state policy acts as a teacher, providing corrections to the learning agent’s behaviours, which are then turned into the new policy. 
  • Finally, they gather data using this approach on real robots and train an improved policy offline using a learnt Q function to weight good transitions, as done in Critic Regularised Regression (CRR). This enables us to leverage data acquired passively throughout the project rather than executing a time-consuming online training algorithm on real robots.

As researchers continue to work on the open problem of generalisation in robotics, let us hope that this new standard contributes to the development of new concepts and approaches that make manipulation easier and robots more capable.

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM