Listen to this story
A team of researchers from the University of California, Berkeley, have introduced an approach to teaching robots how to walk in under 60 minutes. This technique is different from the conventional deep reinforcement learning practices in a way that in this technique, robots can be trained without simulators. Named “DayDreamer: World Models for Physical Robot Learning”, this project is led by Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg and Pieter Abbeel. As per the authors, the Dreamer algorithm could learn from small amounts of interaction through planning in a learned world model and, in turn, outperform pure reinforcement learning in video games.
DayDreamer: A different approach to reinforcement learning
One of the fundamental challenges that robotics has struggled with is imbibing in robots the capability to solve complex tasks in real-world scenarios. Deep reinforcement learning (RL) is a popular method that enables robots to learn through trial and error. Current algorithms based on reinforcement learning require too much interaction with a simulated environment to learn successful behaviours, making them impractical for many real-world tasks.
For the daydreamer project, the researchers have applied the Dreamer algorithm to four robots to learn directly in the real world. They were able to overcome challenges like different action spaces, sensory modalities, and reward structures.
Sign up for your weekly dose of what's up in emerging technology.
The main contributions of the team are:
- A1 Quadraped – The researchers trained the robot directly in the end-to-end reinforcement learning setting without any simulators. They trained the Unitree A1 robot, consisting of 12 direct drive motors, from scratch. Within 10 minutes, the robot could adapt and learn to withstand external stimuli like pushing and pulling.
- UR5 Multi-Object visual pick and place – Robotic arms are trained to pick and place balls. The process involves locating the ball from third-person camera images, grasping them and moving them to the designated bin. Dreamer was able to reach an average pick rate of 2.5 objects per minute within 8 hours.
- XArm visual pick and place – For XArm, the team used a third-person RealSense camera with RGB and depth modalities, as well as proprioceptive inputs for the robot arm, requiring the world model to learn sensor fusion along with proprioceptive inputs for the robot arm, needing the world model to learn sensor fusion. Here, a soft object is used instead of the ball, which is a challenge to simulate. XArm manages to complete the task in 10 hours.
- Sphero Navigation – This task consisted of the robot called Sphero Ollie navigating to a designated location through continuous actions, with the only sensory input being top-down RGB images. The robot identifies its position from pixels and infers its orientation with the help of a sequence of past images, and controls the robot from under-actuated motors that build up momentum over time. The Spero ollie learns this task in under 2 hours.
DayDreamer versus MIT’s Cheetah
Before the makers of DayDreamer, another group at MIT Improbable AI Lab worked on a similar project. This team developed mini-Cheetah, the then fastest moving quadrupled robot.
Cheetah’s controller is based on a neural network architecture that uses reinforcement learning to train in a simulation which is later transferred to the real world. The Cheetah’s performance is measured against two benchmarks: (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work.
This model could accumulate 100 days’ worth of experience on diverse terrains in three hours of actual time. Although this is three times the training time required for the Dreamer model, it is a substantial feat in the field of reinforcement learning.
The future of AI-based robotics
According to the researchers, the Dreamer model approach can solve robot locomotion, manipulation, and navigation tasks without changing hyperparameters. Dreamer taught a quadruped robot to roll off the back, stand up, and walk-in 1 hour from scratch, which previously required extensive training in simulation followed by transfer to the real world or parameterized trajectory generators and given reset policies.
While Dreamer shows promising results, learning on hardware over many hours creates wear-and-tear on robots that may require human intervention or repair. Additionally, more work is required to explore the limits of Dreamer and that of the baselines by training for a longer time.
The Indian robotics ecosystem is abuzz with regular reports of new models. IISc ARTPARK recently conducted a contest to create robots that took on janitorial tasks.
Indiascience.in an initiative of the Department of Science and Technology (DST), Govt of India, states that Robotics is the future as it has the potential to transform every aspect of society. Some leading Indian robotics firms are – Gridbots, an Ahmedabad-based startup that creates robots for a variety of industrial applications; Asimov Tech, the Kerala-based startup that develops robots for the service industry; Planys Technology of Chennai, which is designing robots that cater to underwater tasks.
As part of the ‘AI for India’ initiative launched by the Ministry of Education, the Council for Indian School Certificate Examinations (CISCE) and the Indian Institute of Technology of Delhi (IIT- Delhi) are working on designing a curriculum for schools that include robotics, AI, machine learning, and data science. The curriculum is for grades 9 to 12 in schools affiliated with the CISCE board.