Exploring Panda Gym: A Multi-Goal Reinforcement Learning Environment

The gym is an open-source toolkit for developing and comparing reinforcement learning algorithms. What makes it easier to work with is that it makes it easier to structure your environment using only a few lines of code and compatible with any numerical computation library, such as TensorFlow or Theano.

With the latest breakthroughs in artificial intelligence and more developing research every day, it   is very believable that intelligent and self-sufficient machines are just on the horizon of arrival. Machines these days can understand verbal commands, distinguish between pictures, drive cars and play games, sometimes even better than an average human does. One can only wonder how much longer, and maybe it’ll walk among us? 

But, in developing an artificially intelligent machine, Reinforcement Learning and the learning environment it is trained in play a major role. The development environment used to train for machine learning is as important as the machine learning methods used to solve the predictive modeling problem. The Environment makes up for the basic and fundamental elements in a reinforcement learning problem. Therefore, it is important to understand the underlying environment with which the RL agent is to interact. This helps to come up with the right design and learning technique for the agent being delivered. 

The environment is the Agent’s world in which it lives, and the agent interacts with the environment by performing some action, but it does not have the right to influence the rules or dynamics of the environment by performing those actions. So, for example, just like humans are an agent in the earth’s environment and are confined with the laws. We can interact with the environment with our actions but cannot change the laws. The environment also gives a Reward to the agent; a scalar value returned that acts as feedback for the agent informing him whether its action was good or bad. Within Reinforcement Learning, multiple paradigms attain a winning strategy, i.e., making the agent perform the desired action in multiple ways. In complex situations, calculating the exact winning strategy or reward-value function becomes hard, especially when the agents start learning from interactions rather than the prior-gained experience. 

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

There are several types of learning environments present. The different types of Reinforcement Learning Environments are as follows:

  • Deterministic environment: An environment where the next state of the environment can always be determined based on the current state and the agent’s action.
  • Stochastic reinforcement learning environment:  An environment where we cannot always determine what the next state of the environment will be from the current state by performing a certain action.
  • Single-agent environment: where only one agent exists and interacts with the environment.
  • Multi-agent environment: Where there are more than one agents present that are interacting with the environment.
  • Discrete environment: An action space of the environment that is discrete in nature.
  • Continuous environment: Where the action space of the environment is continuous in nature.
  • Episodic environment:  Here, the agent’s actions are only confined to the particular episode and not to any previous actions.
  • Sequential environment: Here, the agent’s actions are connected with the previous actions it took.

What is Open AI Gym? 

The gym is an open-source toolkit for developing and comparing reinforcement learning algorithms. What makes it easier to work with is that it makes it easier to structure your environment using only a few lines of code and compatible with any numerical computation library, such as TensorFlow or Theano. The gym library is a collection of test problems and environments that one can use to train and develop stronger reinforcement learning models. The present environments have a shared interface, allowing to write general algorithms as well. Furthermore, it provides a wide variety of simulated environments such as Atari games, board games, 2D and 3D physical simulations, and much more, so you can train multiple agents, compare them, or develop new Machine Learning algorithms for Reinforcement Learning problems. OpenAI is an artificial intelligence research company that Elon Musk partly funds. Its goal is to promote and develop friendly AI systems that will benefit humanity and work towards its betterment, rather than exterminating it!

Download our Mobile App

About Panda-Gym

Panda-Gym, is an open-source library that provides a set Reinforcement Learning (RL) environment for the Franka Emika Panda robot integrated with OpenAI Gym. The Robot Simulation Environment consists of five tasks: reach, push, slide, pick & place and stack. It follows a Multi-Goal RL framework, allowing the use of goal-oriented RL algorithms. To foster open research, it also makes use of the open-source physics engine PyBullet. The implementation chosen for this package allows us to define new tasks easily or even create new robots.

About The Simulation and Challenges

The environments presented consist of a Panda robotic arm known as Franka Emika1, which is already widely used in simulation and real-life academic works. It has been designed with  7 degrees of freedom and a parallel finger gripper to perform tasks. The robot is simulated with the PyBullet physics engine, which, being open-source, helps show simulation performance. Furthermore, the environments are integrated with OpenAI Gym, allowing all learning algorithms based on the API. 

The simulation task consists of a challenge in moving either the gripper or objects to a target position. A task is considered as completed when the distance between the entity to move and the target position is less than 5 cm. The five tasks presented can be further tuned with an increasing level of difficulty. In the PandaReach-v1 task, a target position must be reached with the gripper. This target position is randomly generated in a volume of 30 cm × 30 cm × 30 cm. For PandaPush-v1,a cube placed on a table must be pushed to a target position on the table surface while the gripper is blocked. Here the target position and the initial position of the cube are randomly generated in a 30 cm × 30 cm square around the neutral position of the robot. PandaSlide-v1 simulation task consists of a  flat cylinder that must be moved to a target position on the surface of a table while the gripper is blocked. The target position is randomly generated in a 50 cm × 50 cm square located 40 cm in front of the neutral position of the robot. 

Since the target positions are out of reach of the robot, it is necessary to give an impulse to the object instead of just pushing it. For the PandaPickAndPlace-v1 simulation, a cube must be brought to a target position generated in a volume of 30 cm×30 cm×20 cm above the table. To lift the cube, it is necessary to pick it up with the fingers of the gripper. PandaStack-v1 Two cubes must be stacked at a target position on the table surface. The target position is generated in a square of 30 cm × 30 cm. The stacking must be done correctly: the red cube must be under the green cube. All these simulation challenges are still under research and yet to be completely solved with a perfect solution. 

Image Source

Getting Started With the Code 

In this article, we will try to perform two simulations from the Panda Gym Challenge and understand what it takes to develop and set up the environment. The following implementation is inspired by the creators of panda gym, whose official website link can be found here

Installing the Library 

To get started, we will first install the panda-gym library; you can run the following code to do so,

!pip install panda-gym
Importing Dependencies

Now we will be importing the dependencies required to set up the environment,

Fetch, Pick and Place

#importing dependencies
import gym
import panda_gym
Environment Setup and Simulation  
#assigning the simulation task to environment
env = gym.make('PandaPickAndPlace-v1')
state = env.reset()
#setting the environment
done = False
#rendering agent learnings
images = [env.render('rgb_array')]
while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)

To set up the environments, you can run the following lines of code,

The Hyperparameters can be further tuned according to the required performance; here, we will just perform a basic demo simulation.

Next up, we will install the numpngw library, a python package that defines the function write_png that writes a NumPy array to a PNG file and write_apng to a sequence f arrays of an animated PNG (APNG) file. 

#installing numpngw
!pip3 install numpngw
from numpngw import write_apng
write_apng('anim.png', images, delay = 100) # real-time rendering = 40 ms between frames

Displaying the results, 

#rendering the simulation
from IPython.display import Image

As we can observe, the gripper moves the block! Furthermore, you can see the two positions of the block. Although the simulation might not be very clear, it can be further hyperparameter tuned or run on an even better computational system for better render performance.

We can do the same for another simulation task of gripper slide.

Fetch and Slide

import gym
import panda_gym
env = gym.make('PandaSlide-v1')
state = env.reset()
done = False
images = [env.render('rgb_array')]
while not done:
    action = env.action_space.sample()
    state, reward, done, info = env.step(action)

!pip3 install numpngw
from numpngw import write_apng
write_apng('anim.png', images, delay = 70) # real-time rendering = 40 ms between frames

from IPython.display import Image

The rendering time and other learning parameters and environment can be further set up accordingly. This tool is very satisfying for testing deep reinforcement learning algorithms. However, some points can be limited, such as limitations in the gripper control, as it can only be controlled by high-level actions such as grasp and move. Additional work is required to allow the deployment of a policy learned in simulation. Also, the simulation is not completely realistic; the main concern is the gripper’s shape for picking up subjects in the environment.


Through this article, we understood the essence of a learning environment in the domain of Reinforcement Learning. We also tried to understand the panda gym problem and performed a basic demo simulation of two tasks rendering the Panda robotic arm, Franka Emika1. The following implementation can be found as a colab notebook which can be accessed using the link here. Happy Learning!


Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.