DQN or Deep-Q Networks were first proposed by DeepMind back in 2015 in an attempt to bring the advantages of deep learning to reinforcement learning(RL), Reinforcement learning focuses on training agents to take any action at a particular stage in an environment to maximise rewards. Reinforcement learning then tries to train the model to improve itself and its choices by observing rewards through interactions with the environment. A simple demonstration of such learning is seen in the figure below.
For example, imagine training a bot to play a game like Ludo. The bot will play with other players, and each of them, including the bot, will have four tokens and a dice (which will be their environment). The machine should then choose which token to draw to move (i.e. choose an action) based on what everyone else has played and how close the bot is to winning (the state). The bot will want to play so that it wins the game (i.e. maximise its reward).
What Q-Learning has to do with RL?
In Q-learning, a memory table Q[s,a] is built to store Q-values for every possible combination of s and a (which denote the state and action, respectively). The agent learns a Q-Value function, which gives the expected total return in a given state and action pair. The agent thus has to act in a way that maximises this Q-Value function.
The agent can take a single move, a, and see the reward they receive, R. Thus, R+Q(s’,a’) becomes the target the agent would want from Q(s,a).
Where γ denotes a discount factor for this function. This causes rewards to lose their value over time, due to which more immediate rewards are more valuable. For example, if all Q-values equal 1, taking another action and scoring 2 points would move Q(s,a) closer to 3 (1+2). As the agent keeps playing, the Q values will converge as rewards keep diminishing in value (especially if γ is smaller than one). This can be displayed as the following algorithm:
DQN
The memory and computation required for the Q-value algorithm would be too high. Thus, a deep network Q-Learning function approximator is used instead. This learning algorithm is called Deep Q-Network (DQN). The key idea in this development was thus to use deep neural networks to represent the Q-network and train this network to predict total reward.
Previous attempts at bringing deep neural networks into reinforcement learning were primarily unsuccessful due to instabilities. Deep neural networks are prone to overfitting in reinforcement learning models, which disables them from being generalised. According to DeepMind, DQN algorithms address these instabilities by providing diverse and de-correlated training data by storing all of the agent’s experiences and randomly sampling and replaying the experiences.
In a 2013 paper, DeepMind tested DQN by teaching it to learn how to play seven games on the Atari 2600 console. At each time-step, the agent observed the raw pixels on the screen and a reward signal corresponding to the game score and thus selected a joystick direction. DeepMind’s 2015 paper expanded this by training separate DQN agents for fifty Atari 2600 games (without prior knowledge of how these games are played). DQN performed just as well as humans in almost half of these games—which was a better result from every prior attempt to combine reinforcement learning with neural networks.
DeepMind has made its DQN source code and Atari 2600 emulator freely available to anyone looking to work with and experiment themselves. The research group has also improved its DQN algorithm, including further stabilising its learning dynamics, prioritising replayed experiences and normalising, and aggregating and rescaling outputs. With these improvements, DeepMind claims that DQN can achieve human-level performance in almost every Atari game and that a single neural network can learn about multiple such games.
According to DeepMind, the primary goal is to build upon the capabilities of DQN and put it to use in real-life applications. Regardless of how soon we reach this stage, it is quite safe to say that DQN Reinforcement Learning Models widen the scope of machine learning and the ability of machines to master a diverse set of challenges.