From teaching a robot to drive itself off-road to adapting to never-before-seen tasks and grasping occluded objects, University of California, Berkeley, has been investing and doing a number of researches around self-learning techniques.
Recently, researchers at UC Berkeley open-sourced reinforcement learning with Augmented Data (RAD). It is a simple plug-and-play module that can be used to enhance any reinforcement learning algorithm. The researchers claimed that this technique has proved to be faster as well as efficient in computation method by noticeable margins as compared to state-of-the-art model-based algorithms such as Google AI’s PlaNet and DeepMind’s Dreamer and SLAC for data and wall-clock efficiency.
They stated, “We hope that the performance gains, ease of implementation along with wall clock efficiency of this new technique make it a useful module for future research in data-efficient and generalisable RL methods as well as a useful tool for facilitating real-world applications of reinforcement learning.”
Reinforcement learning with Augmented Data or RAD is a technique to incorporate data-augmentations to image-based observations for reinforcement learning pipelines. This technique can be combined with any on-policy or off-policy reinforcement learning algorithm and can be utilised for both discrete and continuous control tasks without any additional losses.
The technique does not make any changes to the underlying RL method and ensures that the trained policy, as well as the value function neural networks, are consistent across augmented views of the image-based observations.
With the help of RAD, the researchers ensure that an agent is learning on multiple views. In other words, the model is trained on the augmented data of the same input. According to the researchers, this allows the agent to improve on two main capabilities. They are:-
- Data Efficiency: Agents learn to quickly master the task at hand with drastically fewer experience rollouts
- Generalisation: Agents improve transfer to unseen tasks or levels simply by training on more diversely augmented samples
According to the researchers, supervised learning, in the context of computer vision, has addressed the problems of data-efficiency and generalisation by injecting useful priors, where one such ignored prior is data augmentation.
Although the advancements in algorithms combined with convolutional neural networks (CNNs) have proved to be groundbreaking in various aspects, yet the current methods lack sample efficiency in learning as well as a generalisation in new environments. To mitigate such issues, RAD has been developed to incorporate data-augmentations on input observations for reinforcement learning pipelines.
Contributions Of This Research
The researchers highlighted some of the crucial contributions of this work. They are mentioned below:-
- The researchers showed that across 15 DeepMind’s control environments, a simple RL algorithm coupled with augmented data either matches or beats every state-of-the-art baseline in terms of performance and data-efficiency
- This technique improves test-time generalisation in several environments in the OpenAI ProcGen benchmark suite that are widely used for generalisation in RL
- This method is faster as well as a more compute-efficient by noticeable margins compared to state-of-the-art model-based algorithms such as SLAC, PlaNet and Dreamer for data and wall-clock efficiency
- The custom implementations of random data augmentations enabled researchers to apply augmentation in the RL setting, where observations consist of stacked frames inputs, without breaking the temporal information present in the stack
- The vectorised and GPU-accelerated augmentations in RAD are competitive and on average faster than state-of-the-art framework APIs such as PyTorch.
The researchers open-sourced the RAD module, which is available on GitHub. The researchers showed that data augmentations such as random crop, colour jitter, patch cutout, and random convolutions could enable simple RL algorithms to match or outperform complex state-of-the-art methods across common benchmarks in terms of data-efficiency, generalisation, and wall-clock speed.
Read the paper here.