MITB Banner

What Are DQN Reinforcement Learning Models

Share

DQN or Deep-Q Networks were first proposed by DeepMind back in 2015 in an attempt to bring the advantages of deep learning to reinforcement learning(RL), Reinforcement learning focuses on training agents to take any action at a particular stage in an environment to maximise rewards. Reinforcement learning then tries to train the model to improve itself and its choices by observing rewards through interactions with the environment. A simple demonstration of such learning is seen in the figure below.

Source: Stephen Gou, Yuyang Liu (2019

For example, imagine training a bot to play a game like Ludo. The bot will play with other players, and each of them, including the bot, will have four tokens and a dice (which will be their environment). The machine should then choose which token to draw to move (i.e. choose an action) based on what everyone else has played and how close the bot is to winning (the state). The bot will want to play so that it wins the game (i.e. maximise its reward).  

What Q-Learning has to do with RL?

In Q-learning, a memory table Q[s,a] is built to store Q-values for every possible combination of s and a (which denote the state and action, respectively). The agent learns a Q-Value function, which gives the expected total return in a given state and action pair. The agent thus has to act in a way that maximises this Q-Value function.

The agent can take a single move, a, and see the reward they receive, R. Thus, R+Q(s’,a’) becomes the target the agent would want from Q(s,a). 

Where γ denotes a discount factor for this function. This causes rewards to lose their value over time, due to which more immediate rewards are more valuable. For example, if all Q-values equal 1, taking another action and scoring 2 points would move Q(s,a) closer to 3 (1+2). As the agent keeps playing, the Q values will converge as rewards keep diminishing in value (especially if γ is smaller than one). This can be displayed as the following algorithm:

(Source: DeepMind)

DQN

The memory and computation required for the Q-value algorithm would be too high. Thus, a deep network Q-Learning function approximator is used instead. This learning algorithm is called Deep Q-Network (DQN). The key idea in this development was thus to use deep neural networks to represent the Q-network and train this network to predict total reward. 

Previous attempts at bringing deep neural networks into reinforcement learning were primarily unsuccessful due to instabilities. Deep neural networks are prone to overfitting in reinforcement learning models, which disables them from being generalised. According to DeepMind, DQN algorithms address these instabilities by providing diverse and de-correlated training data by storing all of the agent’s experiences and randomly sampling and replaying the experiences. 

In a 2013 paper, DeepMind tested DQN by teaching it to learn how to play seven games on the Atari 2600 console. At each time-step, the agent observed the raw pixels on the screen and a reward signal corresponding to the game score and thus selected a joystick direction. DeepMind’s 2015 paper expanded this by training separate DQN agents for fifty Atari 2600 games (without prior knowledge of how these games are played). DQN performed just as well as humans in almost half of these games—which was a better result from every prior attempt to combine reinforcement learning with neural networks. 

Source: DeepMind

DeepMind has made its DQN source code and Atari 2600 emulator freely available to anyone looking to work with and experiment themselves. The research group has also improved its DQN algorithm, including further stabilising its learning dynamics, prioritising replayed experiences and normalising, and aggregating and rescaling outputs. With these improvements, DeepMind claims that DQN can achieve human-level performance in almost every Atari game and that a single neural network can learn about multiple such games. 

According to DeepMind, the primary goal is to build upon the capabilities of DQN and put it to use in real-life applications. Regardless of how soon we reach this stage, it is quite safe to say that DQN Reinforcement Learning Models widen the scope of machine learning and the ability of machines to master a diverse set of challenges.

Share
Picture of Mita Chaturvedi

Mita Chaturvedi

I am an economics undergrad who loves drinking coffee and writing about technology and finance. I like to play the ukulele and watch old movies when I'm free.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.