Now Reading
Hands-On Guide to Understand and Implement Q – Learning

Hands-On Guide to Understand and Implement Q – Learning

Anurag Upadhyaya

Q-Learning is a traditional model-free approach to train Reinforcement Learning agents. It is also viewed as a method of asynchronous dynamic programming. It was introduced by Watkins&Dayan in 1992.

Q-Learning Overview

In Q-Learning we build a Q-Table to store Q values for all possible combinations of state and action pairs. It is called Q-Learning because it represents the quality of a certain action an agent can take in a provided space.
The agents use a Q-table to choose the best action which gives maximum reward to the agent. So, basically the Q-Table acts as a cheat sheet to the agent as it has all the possible combinations for the environment. It is also called model-free because the Q-value is not approximated using any function, it is simply stored inside a table, with rows as states and actions as columns.

However, Q-learning suffers from curse-of-dimensionality as sometimes due to a large number of state and action pairs it’s not possible to store all the mappings.

Q – Learning Algorithm

Let’s Implement the Q-Learning algorithm using Numpy and see how it works.

The Q-function can be iteratively optimized to reach an optimal Q-value using the Bellman Equations.

This is how a Q-table schema looks like,

Q – Learning Implementation

Let’s implement a Q-Learning algorithm from scratch to play Frozen Lake provided by OpenAI Gym. We will use NumPy to implement the entire algorithm.

Environment Details

Frozen Lake environment has the following specifications and the agent is rewarded for finding a walkable path to a goal tile.

SFFF       (S: starting point, safe)

FHFH       (F: frozen surface, safe)

FFFH       (H: hole, fall to your doom)

HFFG       (G: goal, where the frisbee is located)

The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.

Code Walkthrough

Let’s understand the NumPy code step by step.

  • Let’s declare a two-dimensional array with rows equal to state size and columns equal to action size.

Q - Learning

  • Let’s see how the Q-table looks like, we can see that it has 16 possible states with 4 different actions,
    • Possible 16 States in Frozen-lake environment are as follows.
      1. SFFF       (S: starting point, safe)
      2. FHFH       (F: frozen surface, safe)
      3. FFFH       (H: hole, fall to your doom)
      4. HFFG       (G: goal, where the frisbee is located)

  • Possible 4 Actions in Frozen-lake environment are as follows.
    • Top, Bottom, Right, Left

  • Finally, Q-Table has respective 16 states and 4 actions.

See Also
Time series

  • Let’s define some hyperparameters needed to learn the Q-values.

  • Let’s go ahead and Implement the Q-Learning algorithm now.
    • Based on the hyperparameters defined above, let’s iterate through the total number of episodes, for every episode the agent is allowed to take a maximum of 99 steps as max_steps.
    • We keep the trade-off between exploration vs. exploitation using a random number generator, here exp_tradeoff.
    • We take a random step if epsilon is lesser than exp_tradeoff.
    • We record the rewards for every step and update the Q-table using Bellman Equations.

Let’s have a look at the Q-Learning Algorithm Code snippet,

Q - Learning



Q - Learning

The above figure shows the number of steps it took the Q-learning based agent to reach the goal. We basically tested our agent on 5 episodes and in every episode, the agent was able to reach the Goal(G).

This is how we can train an end to end Q-learning agent using NumPy.

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
What's Your Reaction?
In Love
Not Sure

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top