This Robot used Dreamer Algorithm to learn walking in 60 minutes

the Dreamer algorithm could learn from small amounts of interaction through planning in a learned world model and, in turn, outperform pure reinforcement learning in video games.
Listen to this story

A team of researchers from the University of California, Berkeley, have introduced an approach to teaching robots how to walk in under 60 minutes. This technique is different from the conventional deep reinforcement learning practices in a way that in this technique, robots can be trained without simulators. Named “DayDreamer: World Models for Physical Robot Learning”, this project is led by Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg and Pieter Abbeel. As per the authors, the Dreamer algorithm could learn from small amounts of interaction through planning in a learned world model and, in turn, outperform pure reinforcement learning in video games.

DayDreamer: A different approach to reinforcement learning

One of the fundamental challenges that robotics has struggled with is imbibing in robots the capability to solve complex tasks in real-world scenarios. Deep reinforcement learning (RL) is a popular method that enables robots to learn through trial and error. Current algorithms based on reinforcement learning require too much interaction with a simulated environment to learn successful behaviours, making them impractical for many real-world tasks. 

For the daydreamer project, the researchers have applied the Dreamer algorithm to four robots to learn directly in the real world. They were able to overcome challenges like different action spaces, sensory modalities, and reward structures.

The main contributions of the team are:

  • A1 Quadraped – The researchers trained the robot directly in the end-to-end reinforcement learning setting without any simulators. They trained the Unitree A1 robot, consisting of 12 direct drive motors, from scratch. Within 10 minutes, the robot could adapt and learn to withstand external stimuli like pushing and pulling. 

Source: arxiv.org 

  • UR5 Multi-Object visual pick and place – Robotic arms are trained to pick and place balls. The process involves locating the ball from third-person camera images, grasping them and moving them to the designated bin. Dreamer was able to reach an average pick rate of 2.5 objects per minute within 8 hours.

Source: arxiv.org

  • XArm visual pick and place – For XArm, the team used a third-person RealSense camera with RGB and depth modalities, as well as proprioceptive inputs for the robot arm, requiring the world model to learn sensor fusion along with proprioceptive inputs for the robot arm, needing the world model to learn sensor fusion. Here, a soft object is used instead of the ball, which is a challenge to simulate. XArm manages to complete the task in 10 hours.

Source: arxiv.org

  • Sphero Navigation – This task consisted of the robot called Sphero Ollie navigating to a designated location through continuous actions, with the only sensory input being top-down RGB images. The robot identifies its position from pixels and infers its orientation with the help of a sequence of past images, and controls the robot from under-actuated motors that build up momentum over time. The Spero ollie learns this task in under 2 hours.

Source: arxiv.org

DayDreamer versus MIT’s Cheetah

Before the makers of DayDreamer, another group at MIT Improbable AI Lab worked on a similar project. This team developed mini-Cheetah, the then fastest moving quadrupled robot. 

Cheetah’s controller is based on a neural network architecture that uses reinforcement learning to train in a simulation which is later transferred to the real world. The Cheetah’s performance is measured against two benchmarks: (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work.

This model could accumulate 100 days’ worth of experience on diverse terrains in three hours of actual time. Although this is three times the training time required for the Dreamer model, it is a substantial feat in the field of reinforcement learning.

The future of AI-based robotics

According to the researchers, the Dreamer model approach can solve robot locomotion, manipulation, and navigation tasks without changing hyperparameters. Dreamer taught a quadruped robot to roll off the back, stand up, and walk-in 1 hour from scratch, which previously required extensive training in simulation followed by transfer to the real world or parameterized trajectory generators and given reset policies. 

While Dreamer shows promising results, learning on hardware over many hours creates wear-and-tear on robots that may require human intervention or repair. Additionally, more work is required to explore the limits of Dreamer and that of the baselines by training for a longer time. 

Indian Scope

The Indian robotics ecosystem is abuzz with regular reports of new models. IISc ARTPARK recently conducted a contest to create robots that took on janitorial tasks. 

Indiascience.in an initiative of the Department of Science and Technology (DST), Govt of India, states that Robotics is the future as it has the potential to transform every aspect of society. Some leading Indian robotics firms are – Gridbots, an Ahmedabad-based startup that creates robots for a variety of industrial applications; Asimov Tech, the Kerala-based startup that develops robots for the service industry; Planys Technology of Chennai, which is designing robots that cater to underwater tasks.

As part of the ‘AI for India’ initiative launched by the Ministry of Education, the Council for Indian School Certificate Examinations (CISCE) and the Indian Institute of Technology of Delhi (IIT- Delhi) are working on designing a curriculum for schools that include robotics, AI, machine learning, and data science. The curriculum is for grades 9 to 12 in schools affiliated with the CISCE board.

Download our Mobile App

Kartik Wali
A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.