MITB Banner

DeepMind’s Latest RT-2 Algo Makes Robots Perform Novel Tasks

The model can grow smarter as time goes by and easily understand both words and pictures.

Share

Listen to this story

Google’s DeepMind unit has introduced RT-2, the first ever vision-language-action (VLA) model that is more efficient in robot control than any model before. Aptly named “robotics transformer” or RT, this advancement is set to change the way robots interact with their environment and execute tasks with precision.

RT-2 is a learning wizard. The model can grow smarter as time goes by and easily understand both words and pictures. The problem-solving model can tackle tricky challenges it has never faced before or been trained on. 

The model has the ability to learn and adapt in real-world scenarios and has the capacity to learn information from diverse sources such as the web and robotics data. By understanding both language and visual input, RT-2 can effortlessly tackle tasks it has not been trained on or come across. 

The researchers integrated two pre-existing models, Pathways Language and Image Model (PaLI-X) along with Pathways Language Model Embodied (PaLM-E), to serve as the foundation for RT-2. This VLA model enables robots to understand both language and visuals, which enables them to take appropriate actions. The system’s training involved extensive text data and images from the the internet, akin to internet’s favourite chatbots like ChatGPT.

According to researchers, the RT-2 enabled robot can undertake a diverse range of complex tasks, by using both visual and language data. These tasks include activities like organising files in alphabetical order by perusing the labels on the documents and subsequently sorting and placing them in their appropriate locations. 

The paper titled “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control” is authored by Anthony Brohan and colleagues, and posted within the latest Deepmind blog post.

Read more: Google DeepMind Takes Back What it Lost to OpenAI

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.