MITB Banner

Google Takes Leap Forward in Robotics with RT-2

It showed emergent robotic skills that were not present in the data due to knowledge transfer from web pre-training

Share

Listen to this story

Google DeepMind introduced a successor to its Robotics Transformer model 1 called RT-2, a Transformer-based model trained on text and images from the web, enabling it to directly produce robotic actions. 

Unlike chatbots, robots face real-world challenges, requiring a grounding in the physical environment and complex tasks. However, RT-2 is a significant step towards creating more capable and helpful robots, addressing the challenges of time-consuming and expensive training methods used previously. Similar to how language models learn from web data to understand general concepts, RT-2 employs web data to inform and guide robot behaviour. 

It is an advancement that extends the capabilities of vision-language models (VLMs), which take images as input and generate text. It builds upon models like PaLI-X and PaLM-E and adapts them to serve as the foundation for RT-2. To enable robot control, RT-2 represents actions as tokens in its output, similar to language tokens, allowing actions to be processed using standard natural language tokenizers. This approach enables the model to output robotic actions and control the behaviour of robots effectively.

Tests and Abilities

DeepMind conducted qualitative and quantitative experiments on RT-2 models using over 6,000 robotic trials. Three categories of skills were defined: symbol understanding, reasoning, and human recognition, which required combining knowledge from web-scale data and the robot’s experience. 

RT-2 demonstrated emergent robotic skills that were not present in the robot data, thanks to knowledge transfer from web pre-training. For instance, by leveraging knowledge from a vast web dataset, RT-2 understands concepts like identifying trash and throwing it away, without the need for specific training. It can even grasp abstract concepts, recognizing that certain objects become trash after use.

RT-2 simplifies the process of instructing robots by combining complex reasoning with robotic actions in a single model. It can perform tasks even without explicit training for them. RT-2’s ability to transfer knowledge from language and vision training data to robot actions showcases its versatility and effectiveness in handling various tasks.

It showed more than a 3x improvement in generalization performance compared to previous baselines like RT-1 and VC-1. RT-2 retained performance on original tasks seen in robot data and significantly improved performance on previously unseen scenarios, showcasing the benefits of large-scale pre-training. Moreover, RT-2 outperformed baselines pre-trained on visual-only tasks, indicating its superior performance in handling novel situations.

Google ventured into developing smarter robots by incorporating its language model, LLM PaLM, into robotics, resulting in the PaLM-SayCan system. However, the new robot demonstrated some imperfections during a live demo. The New York Times witnessed the robot inaccurately identifying soda flavours and misidentifying fruit as the colour white.

Others in the Game

While Google DeepMind has been at it when it comes to robotics, Boston Dynamics has also bolstered its efforts and is one of the leading competitors. Boston Dynamics has made significant advancements in robotics with the release of robots like Spot and the improved capabilities of its humanoid robot ‘Atlas.’ 

Atlas is now capable of navigating uneven terrain, recovering from falls, carrying objects, opening doors, climbing ladders, and performing various tasks. These improvements are a result of enhanced grasping and manipulation capabilities and new control algorithms, allowing Atlas to improvise and adapt to different conditions, at par with top-notch developments, if not more than them. 

The robot’s 28 hydraulically operated joints and various sensors, such as LIDAR and cameras, contribute to its flexibility and understanding of its surroundings. Boston Dynamics has a history of developing advanced robots, including Spot and Handle, with the goal of creating versatile robots that can perform a wide range of activities.

While other companies like Musk’s Tesla have come up with Optimus, the project is still in progress and looks lacklustre at the moment.

OpenAI, on the other hand, had a robotics division that created a robotic arm capable of solving the Rubik’s cube. However, the company shut down this division in 2021. Yet, OpenAI has now decided to re-enter the robotics domain and has invested in a Norwegian startup called 1x.

In 2021, Google DeepMind made strides in building more generalized robots through vision-based robotic manipulation based on RGB-Stacking. This technology enables robots to understand the environment and objects around them.

Meanwhile, Microsoft seems to be focusing on the development of ChatGPT, extending its capabilities to robotics arms, drones, and home assistant robots. The company’s AI Lab Projects division is experimenting with AI and robots together to automate various tasks using the collaborative robot Paul-E, which possesses embedded vision and high-res force control. However, Microsoft’s research efforts in robotics are not as extensive as those of Google DeepMind.

Google DeepMind is deeply involved in researching the integration of language models into machines, which could potentially impact the ongoing debate about embodiment’s significance for AGI.

Overall, the robotics landscape is highly competitive, with various companies investing in different approaches and technologies to push the boundaries of what robots can achieve.

Share
Picture of Shyam Nandan Upadhyay

Shyam Nandan Upadhyay

Shyam is a tech journalist with expertise in policy and politics, and exhibits a fervent interest in scrutinising the convergence of AI and analytics in society. In his leisure time, he indulges in anime binges and mountain hikes.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.