MITB Banner

The Gridworld: Dynamic Programming With PyTorch & Reinforcement Learning For Frozen Lake Environment

Share

The Gridworld: Dynamic Programming With PyTorch & Reinforcement Learning For Frozen Lake Environment

Illustration by The Gridworld: Dynamic Programming With PyTorch & Reinforcement Learning For Frozen Lake Environment

Reinforcement learning is built on the mathematical foundations of the Markov decision process (MDP). It’s critical to compute an optimal policy in reinforcement learning, and dynamic programming primarily works as a collection of the algorithms for constructing an optimal policy. Unlike the classical algorithms that always assume a perfect model of the environment, dynamic programming comes with greater efficiency in computation.  In a finite-state reinforcement learning environment, we can represent the state, action, and reward sets as St, Ac(St), and R, for stSt, where the states are finite. The probability of the environmental dynamics provided by the set of the probabilities p(St, r|St, Ac), for all the elements stSt, acAc(st), r ℝ, and st???? St+ , St+ can be represented as a terminal state of multiple iterations in episodes. The dynamic programming in a reinforcement learning landscape is applicable for both continuous and discrete state spaces. Dynamic programming explores the good policies by computing the value policies by deriving the optimal policy that meets the following Bellman’s optimality equations. 

We need to compute the state-value function GP with an arbitrary policy ???? for performing a policy evaluation for the predictions. 

????(ac|st) ???? Probability in the environment for taking action ac for state st with policy ???? 

The computation of value-state function GP is for the exploration of the best policy, the policy improvement Kimprove defined as:

We can apply policy improvement by expanding Kimprove???? iteratively till there is an improvement. 

The dynamic programming works better on grid world-like environments. The objective of the agent in the gridworld is to control the movement of the characters. Some of the tiles in the gridworld are walkable by the characters, while other tiles may lead the characters/agents to fall inside the water of the frozen lake. The ultimate objective of the agent is to find the goal tile by finding the most optimal walkable path. Every time the agent finds the walkable path to the goal, the agent is awarded. 

The following are the key components to watch out for in the gridworld. 

S ???? Starting position (Safe)

F ???? Frozen surface (Safe for some time) 

H ???? Hole (Death) 

G ???? Goal (Safe and ultimate goal). 

The agent can perform the following actions in the frozen lake environment 

  1. Left – 0
  2. Down – 1
  3. Right – 2
  4. Up – 3 
https://mk0analyticsindf35n9.kinstacdn.com/wp-content/uploads/2018/03/Frozen-Lake.png

We will implement dynamic programming with PyTorch in the reinforcement learning environment for the frozen lake, as it’s best suitable for gridworld-like environments by implementing value-functions such as policy evaluation, policy improvement, policy iteration, and value iteration. 

Import the gym library, which is created by OpenAI, an open-source ecosystem leveraged for performing reinforcement learning experiments. In the following step, we register the parameters for Frozen Lake and make the Frozen lake game environment, and we print the observation space of the environment. 

Assign the observation space to a variable and print to see the number of state spaces available in the environment. 

We will sample the grids from 0 to 15 from the observation space for a range of g. The total grids that are possible in the environment are from 0 to 15. 

Then, we print the action space for the agent to find the walkable path in the shortest amount of time with optimal policy. 

We can find the possible actions by the agent in the Frozen lake environment from the action space by sampling the actions for a range of 15. 

We then render the environment to explore the current state of the environment 

We can navigate in the frozen lake environment of the gridworld by going left by executing the action as zero. This will not result in a penalty, as there’s nothing on the left side. We should be able to navigate down with action as one, and going to the right should not cause a problem either with action two, as the agent will still be standing on the surface of the frozen lake that does not cause any problem, and we can go to the right twice by executing the action as two, that should be safe as well as the agent steps on the frozen lake’s surface. However, going down thrice, the agent encounters death as the agent will fall through the hole into the frozen lake. The agent is not likely to survive, recover, and swim back from the hole; it’s a high risk unless the agent has exceptionally overcome near-death experiences. For the sake of the frozen lake game, we can consider, the agent will die. 

To navigate successfully inside the gridworld of the frozen lake environment, the agent has to navigate to the right twice, and down thrice, and go right once to reach the goal. 

Share
Picture of Ganapathi Pulipaka

Ganapathi Pulipaka

Dr Ganapathi Pulipaka is Chief AI HPC Scientist and bestselling author of books covering AI infrastructure, supercomputing, high-performance computing for HPC, parallel computing, neural network architecture, data science, machine learning, and deep learning in C, C++, Java, Python, R, TensorFlow, and PyTorch on Linux, macOS, and Windows.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.