# Hands-On Guide to Understand and Implement Q – Learning Q-Learning is a traditional model-free approach to train Reinforcement Learning agents. It is also viewed as a method of asynchronous dynamic programming. It was introduced by Watkins&Dayan in 1992.

## Q-Learning Overview

In Q-Learning we build a Q-Table to store Q values for all possible combinations of state and action pairs. It is called Q-Learning because it represents the quality of a certain action an agent can take in a provided space.
The agents use a Q-table to choose the best action which gives maximum reward to the agent. So, basically the Q-Table acts as a cheat sheet to the agent as it has all the possible combinations for the environment. It is also called model-free because the Q-value is not approximated using any function, it is simply stored inside a table, with rows as states and actions as columns.

However, Q-learning suffers from curse-of-dimensionality as sometimes due to a large number of state and action pairs it’s not possible to store all the mappings.

### Q – Learning Algorithm

Let’s Implement the Q-Learning algorithm using Numpy and see how it works.

The Q-function can be iteratively optimized to reach an optimal Q-value using the Bellman Equations.

This is how a Q-table schema looks like,

## Q – Learning Implementation

Let’s implement a Q-Learning algorithm from scratch to play Frozen Lake provided by OpenAI Gym. We will use NumPy to implement the entire algorithm.

### Environment Details

Frozen Lake environment has the following specifications and the agent is rewarded for finding a walkable path to a goal tile.

SFFF       (S: starting point, safe)

FHFH       (F: frozen surface, safe)

FFFH       (H: hole, fall to your doom)

HFFG       (G: goal, where the frisbee is located)

The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.

### Code Walkthrough

Let’s understand the NumPy code step by step.

• Let’s declare a two-dimensional array with rows equal to state size and columns equal to action size.

• Let’s see how the Q-table looks like, we can see that it has 16 possible states with 4 different actions,
• Possible 16 States in Frozen-lake environment are as follows.
1. SFFF       (S: starting point, safe)
2. FHFH       (F: frozen surface, safe)
3. FFFH       (H: hole, fall to your doom)
4. HFFG       (G: goal, where the frisbee is located)

• Possible 4 Actions in Frozen-lake environment are as follows.
• Top, Bottom, Right, Left

• Finally, Q-Table has respective 16 states and 4 actions.

• Let’s define some hyperparameters needed to learn the Q-values.

• Let’s go ahead and Implement the Q-Learning algorithm now.
• Based on the hyperparameters defined above, let’s iterate through the total number of episodes, for every episode the agent is allowed to take a maximum of 99 steps as max_steps.
• We keep the trade-off between exploration vs. exploitation using a random number generator, here exp_tradeoff.
• We take a random step if epsilon is lesser than exp_tradeoff.
• We record the rewards for every step and update the Q-table using Bellman Equations.

Let’s have a look at the Q-Learning Algorithm Code snippet,

### Results

The above figure shows the number of steps it took the Q-learning based agent to reach the goal. We basically tested our agent on 5 episodes and in every episode, the agent was able to reach the Goal(G).

This is how we can train an end to end Q-learning agent using NumPy. Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.