Reinforcement Learning With Augmented Data Is So Superior It Beats Google & DeepMind Hands-down

From teaching a robot to drive itself off-road to adapting to never-before-seen tasks and grasping occluded objects, University of California, Berkeley, has been investing and doing a number of researches around self-learning techniques. 

Recently, researchers at UC Berkeley open-sourced reinforcement learning with Augmented Data (RAD). It is a simple plug-and-play module that can be used to enhance any reinforcement learning algorithm. The researchers claimed that this technique has proved to be faster as well as efficient in computation method by noticeable margins as compared to state-of-the-art model-based algorithms such as Google AI’s PlaNet and DeepMind’s Dreamer and SLAC for data and wall-clock efficiency.

They stated, “We hope that the performance gains, ease of implementation along with wall clock efficiency of this new technique make it a useful module for future research in data-efficient and generalisable RL methods as well as a useful tool for facilitating real-world applications of reinforcement learning.”

Behind RAD

Reinforcement learning with Augmented Data or RAD is a technique to incorporate data-augmentations to image-based observations for reinforcement learning pipelines. This technique can be combined with any on-policy or off-policy reinforcement learning algorithm and can be utilised for both discrete and continuous control tasks without any additional losses.

The technique does not make any changes to the underlying RL method and ensures that the trained policy, as well as the value function neural networks, are consistent across augmented views of the image-based observations.

With the help of RAD, the researchers ensure that an agent is learning on multiple views. In other words, the model is trained on the augmented data of the same input. According to the researchers, this allows the agent to improve on two main capabilities. They are:-

  • Data Efficiency: Agents learn to quickly master the task at hand with drastically fewer experience rollouts
  • Generalisation: Agents improve transfer to unseen tasks or levels simply by training on more diversely augmented samples  

Why RAD?

According to the researchers, supervised learning, in the context of computer vision, has addressed the problems of data-efficiency and generalisation by injecting useful priors, where one such ignored prior is data augmentation. 

Although the advancements in algorithms combined with convolutional neural networks (CNNs) have proved to be groundbreaking in various aspects, yet the current methods lack sample efficiency in learning as well as a generalisation in new environments. To mitigate such issues, RAD has been developed to incorporate data-augmentations on input observations for reinforcement learning pipelines.

Contributions Of This Research

The researchers highlighted some of the crucial contributions of this work. They are mentioned below:-

  • The researchers showed that across 15 DeepMind’s control environments, a simple RL algorithm coupled with augmented data either matches or beats every state-of-the-art baseline in terms of performance and data-efficiency
  • This technique improves test-time generalisation in several environments in the OpenAI ProcGen benchmark suite that are widely used for generalisation in RL
  • This method is faster as well as a more compute-efficient by noticeable margins compared to state-of-the-art model-based algorithms such as SLAC, PlaNet and Dreamer for data and wall-clock efficiency
  • The custom implementations of random data augmentations enabled researchers to apply augmentation in the RL setting, where observations consist of stacked frames inputs, without breaking the temporal information present in the stack
  • The vectorised and GPU-accelerated augmentations in RAD are competitive and on average faster than state-of-the-art framework APIs such as PyTorch.

Wrapping Up

The researchers open-sourced the RAD module, which is available on GitHub. The researchers showed that data augmentations such as random crop, colour jitter, patch cutout, and random convolutions could enable simple RL algorithms to match or outperform complex state-of-the-art methods across common benchmarks in terms of data-efficiency, generalisation, and wall-clock speed.

Read the paper here.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.