MITB Banner

Is Cost-Effective Deep Reinforcement Learning Possible?

The cost of a Tesla P100 GPU is around $6,000, providing this evidence will take an unreasonably long time as it is prohibitively expensive to have multiple GPUs

Share

“Is there scientific value in conducting empirical research in reinforcement learning when restricting oneself to small- to mid-scale environments?”

Can a research done on a smaller computational budget can provide valuable scientific insights? Given the insane training times and budgets, it is natural to wonder if anything worthwhile in AI comes at a small price. So far, the researchers have focused on the training costs of language models which have become too large. But, what about the deep reinforcement learning(RL) algorithms -the brains behind autonomous cars, warehouse robots and even the AI that beat chess grandmasters?

Deep RL combines RL with deep learning. Deep RL made a splash back in 2015 when Alphabet’s DeepMind released their work on Deep Q Networks (DQN). When tested on Atari 2600 games, the DQN agent surpassed the performance of all previous algorithms and achieved a level comparable to a professional human games tester.

However, according to Google researchers, the advancement of deep RL comes at a cost— a computational one. The original DQN algorithm was tweaked over the years to beat the arcade learning(ALE) benchmark. ALE is widely used as an interface for benchmarking deepRL models on Atari games. The Rainbow algorithm is one such improvement which helped the DQN paradigm to attain state of the art status. However, the Rainbow algorithm is heavy on the computational front. 

Rainbow was first introduced in 2018. The experiments reportedly required a large research lab set up as it took roughly 5 days to fully train using specialised hardware like the NVIDIA Tesla P100 GPU. According to Google researchers, to prove Rainbow’s superiority, it required approximately 34,200 GPU hours (or 1425 days). Moreover, this cost does not include the hyper-parameter tuning that was necessary to optimise the various components. “Considering that the cost of a Tesla P100 GPU is around $6,000, providing this evidence will take an unreasonably long time as it is prohibitively expensive to have multiple GPUs in a typical academic lab so they can be used in parallel,” according to Google researchers.

In their work titled, “Revisiting Rainbow”, the researchers at Google tried to answer the following questions:

  • Would state of the art (ALE) have been possible with smaller-scale experiments unlike in the case of Rainbow back in 2018?
  • How good are these algorithms in non-ALE environments?
  • Is there scientific value in conducting empirical research in reinforcement learning when restricting oneself to small- to mid-scale environments?
(Image credits: Google AI blog)

To demonstrate the effectiveness of small to mid-scale experiments, the researchers evaluated a set of four classic control environments as shown above. These experiments, according to the researchers, can be fully trained in 10-20 minutes (compared to five days for ALE games): 

  1. CartPole: Here the agent is tasked to balance a pole on a cart that the agent can move left and right.
  2. Acrobot: The agent has to apply force to the joint between the two arms in order to raise the lower arm above a threshold. 
  3. LunarLander: The agent is meant to land the spaceship between the two flags. 
  4. MountainCar: The agent must build up momentum between two hills to drive to the top of the rightmost hill.
(Image credits: Paper by Castro et al.,)

In their experiments, the researchers gradually added double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and other components to the DQN model while at the same time removing components of the Rainbow algorithm. The researchers found that a combination of these components performed on par with DQN that runs on Rainbow. 

The original loss functions and optimisers were also tested during these experiments. Huber loss and RMS prop optimiser are commonly used while developing DQN models. The researchers also mixed these runs with Adam optimiser and mean squared error loss (MSE). The results show dam+MSE is a superior combination than RMSProp+Huber. 

The researchers were able to reproduce the results of the original Rainbow paper on a  limited computational budget and even uncover new and interesting phenomena. Hence making a strong case for the relevance and significance of empirical research on small- and medium-scale environments. The researchers believe that these less computationally intensive environments lend themselves well to a more critical and thorough analysis of the performance, behaviors, and intricacies of new algorithms.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.