MITB Banner

Google Introduces Offline Reinforcement Learning to Train AI Agents

Scaled Q-Learning can efficiently train RL agents to play Atari or pick up objects.

Share

Listen to this story

Researchers from Google have developed a pre-trained model built on the CQL algorithm for scaled offline Reinforcement Learning (RL) called Scaled Q-Learning, to efficiently train RL agents for decision-making tasks such as playing games or picking up objects. 

Contrary to the traditional approach, Scaled Q-Learning uses diverse data to learn representations for quick transfer to new tasks. Moreover, it outperforms Transformer-based methods and others which use larger models. The researchers evaluated the approach on a suite of Atari games, where the goal is to train a single RL agent using data from low-quality players and then learn new variations in pre-training or completely new games.

This method made initial progress towards enabling more practical real-world training of RL agents as an alternative to costly and complex simulation-based pipelines or large-scale experiments. The team tested the method using two types of data: near-optimal data and low-quality data. They compared it to other methods and found that only Scaled Q-Learning improved on the offline data, reaching about 80% of human performance.

The study shows that pre-training RL agents with multi-task offline learning can significantly improve their performance on different tasks, even for challenging ones like Atari games with different appearances and dynamics. 

The results demonstrate that the method can significantly boost the performance of RL, both in offline and online modes. In online RL, Scaled Q-Learning can provide improvements while methods like MAE yield little improvement. It can incorporate prior knowledge from pre-training games to improve the final score after 20k online interactions.

In conclusion, Scaled Q-learning suggests that it is learning the game dynamics, not just improving the visual features like other techniques. As per the blog, the work could develop generally capable pre-trained RL agents that can develop applicable interaction skills from large-scale offline pre-training. Future research will involve validating these results on a broader range of more realistic tasks, in domains such as robotics and NLP.

Share
Picture of Tasmia Ansari

Tasmia Ansari

Tasmia is a tech journalist at AIM, looking to bring a fresh perspective to emerging technologies and trends in data science, analytics, and artificial intelligence.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.