Salesforce Open-Sources WarpDrive Deep RL Framework

WarpDrive runs the entire MADRL workflow end-to-end on a single GPU, thereby using a single store of data for simulation roll-outs, inference, and training.

Reinforcement learning operates in a delayed-return environment and can be slow as it has to learn through repeated and continuous interaction with a simulation of the environment. As a result, using RL in a complex environment with multiple agents remains a bottleneck. The problem necessitates the deployment of deep reinforcement learning – a combination of artificial neural networks and RL framework. 

Training a large number of agents asks for repeatedly training agent models and running multi-agent simulations. Conventional multi-agent deep reinforcement learning (MADRL) combines CPU simulators with GPU deep learning models, which is again a time-consuming process. Coming to its rescue, customer relationship management company, Salesforce, has open-sourced WarpDrive, a deep reinforcement learning framework, implementing end-to-end multi-agent RL on a single GPU. 

Inside the WarpDrive 

The Salesforce research team, including Tian Lan, Sunil Srinivasa, and Stephan Zheng, introduced WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU. “Using the extreme parallelisation capability of GPUs, WarpDrive enables orders of magnitude faster RL compared to common implementations that blend CPU simulations and GPU models. Our design runs simulations and the agents in each simulation in parallel,” as per the paper

The lightweight RL framework eliminates the need to copy data between CPU and GPU. Additionally, it also uses a single simulation datastore on the GPU that is safely updated in place. The WarpDrive architecture is shown below.

Image Credits: Salesforce paper

WarpDrive from Salesforce provides a framework and tool to build fast multi-agent RL systems quickly. WarpDrive relies on CUDA, allowing users to run programmes on GPU hardware. Compute kernels are another name for CUDA programmes. The CUDA API provides direct access to the GPU’s virtual instruction set and parallel computational elements. Moreover, the framework enjoys certain benefits that include:

  • After the first reset, there is just a one-time data transfer between the host and the device, and no further host-to-device communication is necessary. The data arrays are exclusively stored on the GPU and changed in place during all subsequent step and reset calls. Even though the given Trainer class has direct access to and modification of all data on the device, no data copying cost is attached.
  • With each agent using a single thread on the GPU, the framework is capable of simulating millions of agents and environments within a short period. 
  • The current framework requires only a single GPU and does not require communication between multiple GPU devices. It is an open direction to explore efficient multi-device RL systems. However, exploring multi-GPU setups in the future can produce better results.
  • The framework is fully compatible with PyTorch.

Read the entire paper here and find the code here.

Deep RL Use Cases

Deep RL has found its applications in the industry. Google uses it for automated game testing agents, chip design, and robot locomotion; Microsoft powers its autonomous control systems technology, Pathmind, to simulate industrial operations and supply chains Covariant in industrial robotics. Let’s discuss.

Google research and the Google hardware architecture team collaborated to solve the chip placement problem, one of the most time-consuming processes in chip designing with deep RL methods. The goal is to optimise power, performance, and area (PPA) while adhering to placement density and routing congestion limitations by placing a netlist graph of macros (for example, SRAMs) and standard cells (logic gates such as NAND, NOR, and XOR) onto a chip canvas. As a result, RL agents become faster and better at chip placements, bringing down the time to accomplish the same task from weeks to just six hours.

Automated playtesting reduces the need for human intervention. Game-playing agents that use Deep Reinforcement Learning (DRL) can anticipate both game complexity and player engagement. Aalto University and Rovio entertainment developed a novel method that combines DRL and Monte Carlo Tree Search to predict game pass and churn rates. Additionally, DeepRl has shown signs of its usage in neuroscience, although it remains a work in progress. 

Download our Mobile App

kumar Gandharv
Kumar Gandharv, PGD in English Journalism (IIMC, Delhi), is setting out on a journey as a tech Journalist at AIM. A keen observer of National and IR-related news.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.