Reinforcement learning operates in a delayed-return environment and can be slow as it has to learn through repeated and continuous interaction with a simulation of the environment. As a result, using RL in a complex environment with multiple agents remains a bottleneck. The problem necessitates the deployment of deep reinforcement learning – a combination of artificial neural networks and RL framework.
Training a large number of agents asks for repeatedly training agent models and running multi-agent simulations. Conventional multi-agent deep reinforcement learning (MADRL) combines CPU simulators with GPU deep learning models, which is again a time-consuming process. Coming to its rescue, customer relationship management company, Salesforce, has open-sourced WarpDrive, a deep reinforcement learning framework, implementing end-to-end multi-agent RL on a single GPU.
Sign up for your weekly dose of what's up in emerging technology.
Inside the WarpDrive
The Salesforce research team, including Tian Lan, Sunil Srinivasa, and Stephan Zheng, introduced WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU. “Using the extreme parallelisation capability of GPUs, WarpDrive enables orders of magnitude faster RL compared to common implementations that blend CPU simulations and GPU models. Our design runs simulations and the agents in each simulation in parallel,” as per the paper.
The lightweight RL framework eliminates the need to copy data between CPU and GPU. Additionally, it also uses a single simulation datastore on the GPU that is safely updated in place. The WarpDrive architecture is shown below.
Image Credits: Salesforce paper
WarpDrive from Salesforce provides a framework and tool to build fast multi-agent RL systems quickly. WarpDrive relies on CUDA, allowing users to run programmes on GPU hardware. Compute kernels are another name for CUDA programmes. The CUDA API provides direct access to the GPU’s virtual instruction set and parallel computational elements. Moreover, the framework enjoys certain benefits that include:
- After the first reset, there is just a one-time data transfer between the host and the device, and no further host-to-device communication is necessary. The data arrays are exclusively stored on the GPU and changed in place during all subsequent step and reset calls. Even though the given Trainer class has direct access to and modification of all data on the device, no data copying cost is attached.
- With each agent using a single thread on the GPU, the framework is capable of simulating millions of agents and environments within a short period.
- The current framework requires only a single GPU and does not require communication between multiple GPU devices. It is an open direction to explore efficient multi-device RL systems. However, exploring multi-GPU setups in the future can produce better results.
- The framework is fully compatible with PyTorch.
Deep RL Use Cases
Deep RL has found its applications in the industry. Google uses it for automated game testing agents, chip design, and robot locomotion; Microsoft powers its autonomous control systems technology, Pathmind, to simulate industrial operations and supply chains Covariant in industrial robotics. Let’s discuss.
Google research and the Google hardware architecture team collaborated to solve the chip placement problem, one of the most time-consuming processes in chip designing with deep RL methods. The goal is to optimise power, performance, and area (PPA) while adhering to placement density and routing congestion limitations by placing a netlist graph of macros (for example, SRAMs) and standard cells (logic gates such as NAND, NOR, and XOR) onto a chip canvas. As a result, RL agents become faster and better at chip placements, bringing down the time to accomplish the same task from weeks to just six hours.
Automated playtesting reduces the need for human intervention. Game-playing agents that use Deep Reinforcement Learning (DRL) can anticipate both game complexity and player engagement. Aalto University and Rovio entertainment developed a novel method that combines DRL and Monte Carlo Tree Search to predict game pass and churn rates. Additionally, DeepRl has shown signs of its usage in neuroscience, although it remains a work in progress.