New York University (NYU) & Facebook Artificial Intelligence Research (FAIR) researchers, including Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto, have introduced DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 is an upgraded version of DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels.
DrQ (Data regularised Q) algorithm was introduced in March 2021 by NYU & FAIR.
At present, lots of methods exist to address the sample efficiency of RL algorithms that directly learn from pixels. The approaches can be classified into two groups:
- Model-based methods: Attempt to learn the system dynamics to acquire a compact latent representation of high-dimensional observations to later perform policy search.
- Model-free methods: Either learn the latent representation indirectly by optimising the RL objective or by employing auxiliary losses that provide additional supervision.
The DrQ approach can be combined with them to improve performance. DrQ-v2’s implementation is released publicly to provide RL practitioners with a strong and computationally efficient baseline.
What’s new in DrQ-v2
DrQ-v2 improves upon DrQ by making several algorithmic changes:
- Switching the base RL algorithm from Soft Actor Critic (SAC) to Deep Deterministic Policy Gradient (DDPG).
- Addition of bilinear interpolation to the random shift image augmentation.
- Introducing an exploration schedule.
- Selection of better hyper-parameters, including a larger capacity of the replay buffer.
The research claims to introduce various improvements that yield state-of-the-art results on the DeepMind Control Suite. In particular, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. “In addition, DrQ-v2 is conceptually simple, easy to implement, and provides a significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU,” as per the paper.
Source: NYU & FAIR
Present-day state-of-the-art model-free methods have three major limitations:
- They are inadequate to solve the more challenging visual control problems such as quadruped and humanoid locomotion.
- They often require significant computational resources, i.e. lengthy training times using distributed multi-GPU infrastructure.
- It is often unclear how different design choices affect overall system performance.
The humanoid control problem is one of the hardest control problems due to its large state and action spaces. Apart from NYU & FAIR, various other research has been initiated for the same. In collaboration with the University of Toronto and DeepMind, Google AI has introduced DreamerV2, the first RL agent to achieve human-level performance on the Atari benchmark.
“Recently, a model-based method, DreamerV2, was also shown to solve visual continuous control problems, and it was first to solve the humanoid locomotion problem from pixels. However, while our model-free DrQ-v2 matches DreamerV2 in terms of sample efficiency and performance, it does so four times faster in terms of wall-clock time to train,” as per the paper.
DreamerV2 from Google AI relies exclusively on general information from the images and accurately predicts future task rewards even when those rewards did not influence its representations. “Using a single GPU, DreamerV2 outperforms top model-free algorithms with the same compute and sample budget,” as per the blog. It builds upon the Recurrent State-Space Model (RSSM).
“An unofficial implementation of DreamerV2 is available on Github and provides a productive starting point for future research projects. We see world models that leverage large offline datasets, long-term memory, hierarchical planning, and directed exploration as exciting avenues for future research,” as per the blog.
Recent research in this field opens avenues for further futuristic applications of the technique. Moreover, RL algorithms that are good at working with pixels can be useful in applications such as Neuralink’s LINK, Mindpong and even make the RL training simulation more realistic and robust.