# Hands-On Guide to Understand and Implement Q – Learning

Q-Learning is a traditional model-free approach to train Reinforcement Learning agents. It is also viewed as a method of asynchronous dynamic programming. It was introduced by Watkins&Dayan in 1992.

## Q-Learning Overview

In Q-Learning we build a Q-Table to store Q values for all possible combinations of state and action pairs. It is called Q-Learning because it represents the quality of a certain action an agent can take in a provided space.
The agents use a Q-table to choose the best action which gives maximum reward to the agent. So, basically the Q-Table acts as a cheat sheet to the agent as it has all the possible combinations for the environment. It is also called model-free because the Q-value is not approximated using any function, it is simply stored inside a table, with rows as states and actions as columns.

However, Q-learning suffers from curse-of-dimensionality as sometimes due to a large number of state and action pairs it’s not possible to store all the mappings.

### Q – Learning Algorithm

Let’s Implement the Q-Learning algorithm using Numpy and see how it works.

The Q-function can be iteratively optimized to reach an optimal Q-value using the Bellman Equations.

This is how a Q-table schema looks like,

## Q – Learning Implementation

Let’s implement a Q-Learning algorithm from scratch to play Frozen Lake provided by OpenAI Gym. We will use NumPy to implement the entire algorithm.

### Environment Details

Frozen Lake environment has the following specifications and the agent is rewarded for finding a walkable path to a goal tile.

SFFF       (S: starting point, safe)

FHFH       (F: frozen surface, safe)

FFFH       (H: hole, fall to your doom)

HFFG       (G: goal, where the frisbee is located)

The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.

### Code Walkthrough

Let’s understand the NumPy code step by step.

• Let’s declare a two-dimensional array with rows equal to state size and columns equal to action size.

• Let’s see how the Q-table looks like, we can see that it has 16 possible states with 4 different actions,
• Possible 16 States in Frozen-lake environment are as follows.
1. SFFF       (S: starting point, safe)
2. FHFH       (F: frozen surface, safe)
3. FFFH       (H: hole, fall to your doom)
4. HFFG       (G: goal, where the frisbee is located)

• Possible 4 Actions in Frozen-lake environment are as follows.
• Top, Bottom, Right, Left

• Finally, Q-Table has respective 16 states and 4 actions.

• Let’s define some hyperparameters needed to learn the Q-values.

• Let’s go ahead and Implement the Q-Learning algorithm now.
• Based on the hyperparameters defined above, let’s iterate through the total number of episodes, for every episode the agent is allowed to take a maximum of 99 steps as max_steps.
• We keep the trade-off between exploration vs. exploitation using a random number generator, here exp_tradeoff.
• We take a random step if epsilon is lesser than exp_tradeoff.
• We record the rewards for every step and update the Q-table using Bellman Equations.

Let’s have a look at the Q-Learning Algorithm Code snippet,

### Results

The above figure shows the number of steps it took the Q-learning based agent to reach the goal. We basically tested our agent on 5 episodes and in every episode, the agent was able to reach the Goal(G).

This is how we can train an end to end Q-learning agent using NumPy.

Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Unity is Not as United as It Sounds

Unity’s new pricing model has caused an exodus to Unreal and Godot. Developers give up on Unity, accusing it of being anti-consumer

### KPI Partners: Accelerating Business Growth with Modernization and Monetization of Data Assets

KPI Partners, a leader in digital transformation, is dedicated to accelerate business profitability by providing customized data analytics and digital transformation services and solutions to address the unique needs of each client.

### How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.

### Google is Officially Killing the Internet with AI

While Google has long preached ‘helpful content written by people, for people, in search results’, its recent actions suggest otherwise

### Why Neeva Pivoted to Enterprise AI with Snowflake

A part of the founder’s job is to see beyond the corner and get a feel of what is coming

### Larry Lulls Everyone into Generative AI

“Does that mean we are going to have massive layoffs at Oracle? No, we are too ambitious for that,” said Larry Ellison

### [Report] LLM Economics – A Guide to Generative AI Implementation Cost

This will serve as a vital tool helping stakeholders to assess the potential costs and benefits associated with different implementation strategies, whether it be through API or open-source pathways.

### OpenAI Banks on Red Team to Win the Moral Battle

OpenAI’s latest announcement on inviting people for forming a Red Teaming Network may not be a completely new initiative. Then, why now?

### The Proportional Rise of Inference and CPUs

As inference costs scale with the number of users, the impending shift in AI landscape has forced NVIDIA, AMD, Intel to pay heed to it

### Intel Goes All in on AI

Pat Gelsinger said, there are three types of chip manufacturers, “you’re big, you’re niche or you’re dead”