MITB Banner

Hands-On Guide to OpenAI Gym Custom Environments

enAI Gym is a well known RL community for developing and comparing Reinforcement Learning agents. OpenAI Gym doesn’t make assumptions about the structure of the agent and works out well with any numerical computation library such as TensorFlow, PyTorch.

Share

OpenAI Gym Custom RL Agents

OpenAI Gym is a well known RL community for developing and comparing Reinforcement Learning agents. OpenAI Gym doesn’t make assumptions about the structure of the agent and works out well with any numerical computation library such as TensorFlow, PyTorch.

The gym also provides various types of environments.

Some of these environments are well covered in some of the previous articles, 

In this hands-on guide, we will develop a tic-tac-toe environment from scratch using OpenAI Gym.

Folder Setup

To start with, let’s create the desired folder structure with all the required files.

Once, all the files and folders displayed above are in place, open the setup.py file and insert the following lines.

import sys

from setuptools import setup, find_packages

if sys.version_info < (3, 5):

    sys.exit(‘Sorry, Python < 3.5 is not supported!’)

setup(name=’gym_tictactoe’, version=’0.0.1′,

      install_requires=[‘gym’, ‘click’, ‘tqdm’, ‘pandas’],

      packages=find_packages() )

Inside the __init__.py file, insert the following lines.

from gym.envs.registration import register
register(id=’tictactoe-v0′, entry_point=’gym_tictactoe.env:TicTacToeEnv’)

Installation

After adding the following snippets, the environment can be installed as a pip package.

Just open the terminal and try pip install -e gym-tictactoe

After successful installation, you must see the similar messages as the below screen.

Test Run

Let’s make our tic tac toe environment using the gym and run it for 10 steps.

Multiple Episodes

Let’s use our environment, and play for multiple episodes and see how many timesteps it takes to finish one episode.

Use the below snippet to run the environment, for multiple episodes taking random steps.

Understanding Observations Space

To understand what actions are doing inside the environment, let’s understand various return types of every action. The environment’s step function returns four values as follows,

  • Observation: an environment-specific object representing board state in the game.
  • Reward: the amount of reward achieved by the previous action. 
  • Done: returns a boolean response based on various scenarios. For example, perhaps the pole tipped too far, or you lost your last life.)
  • Info: diagnostic information useful for debugging. It can sometimes be useful for learning (for example, it might contain the raw probabilities behind the environment’s last state change).

Our episode finished in 6 timesteps. Now we can train various RL algorithms to learn the environment and perform better than a random agent.

The ease of implementation considering environment assumptions, OpenAI gym has all the basics required to easily design and debug an environment and train various RL models on top.


Share
Picture of Anurag Upadhyaya

Anurag Upadhyaya

Experienced Data Scientist with a demonstrated history of working in Industrial IOT (IIOT), Industry 4.0, Power Systems and Manufacturing domain. I have experience in designing robust solutions for various clients using Machine Learning, Artificial Intelligence, and Deep Learning. I have been instrumental in developing end to end solutions from scratch and deploying them independently at scale.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.