Guide To TensorForce: A TensorFlow-based Reinforcement Learning Framework

TensorForce is an open-source library for Reinforcement Learning, built on the top of the TensorFlow library. Python 3 is required for leveraging this deep RL framework. It is currently maintained by Alexander Kuhnle while its 0.4.2 and earlier versions were jointly introduced by Alexander Kuhnle, Michael Schaarschmidt and Kai Fricke.

A brief introduction to Tensorforce and several such RL frameworks can be found in this article.

`Register for our Workshop>>`

Highlighting Features of TensorForce

• It supports TensorBoard.
• It supports a wide range of neural network layers such as 1D and 2D convolutions, fully connected (FC) layers, pooling, embeddings and so on.
• It also supports L2 and entropy techniques of regularization.
• It allows parallel execution of multiple RL environments.
• It supports random replay memory and batch buffer memory.

What distinguishes TensorForce from similar RL libraries?

• The whole RL logic of TensorForce is implemented using TensorFlow to enable deployment of TensorFlow-based models and employing portable computation graphs without requiring application programming language.
• The modular design of the library has been made as easy as possible to apply and configure for general applications.
• RL algorithms applied using the library are independent of the virtual agent’s interaction with the environment as well as the nature of input states and output actions.

Practical implementation

Here’s a demonstration of creating an RL environment and agent for a temperature-controller using TensorForce. The thermostat environment comprises a room having a heater. When the heater is switched on, room temperature will reach 1.0 and when it’s turned off, the temperature drops to 0.0. The exponential heat decay constant ‘tau’ determines how fast the heater’s temperature reaches 0.0 or 1.0. The change in temperature is computed as:

temp[i + 1] = h[i] + (temp[i] – h[i]) * exp(-1/tau)                                                  …(i)

where,

temp[i] denotes temperature (between 0 and 1) at ith timestamp

h[i] represents applied heater state (0 or 1)

The code has been implemented using Google colab with Python 3.7.10 and Tensorforce 0.6.3 versions. Step-wise explanation of the code is as follows:

1. Install tensorforce

`!pip install tensorforce`

1. Import required libraries
``` import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math
from tensorforce.environments import Environment
from tensorforce.agents import Agent ```
1.  Calculate  response for current temperature and  given action
```  def respond(ac, curr_temp, tau):
return ac + (curr_temp - ac) * math.exp(-1.0/tau) ```
1. Define a series of actions (1:on, 0:off)
`act = pd.Series(np.array([1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0]))`
1. Initialize array for responses with zeros

`resp = np.zeros(act.size)`

Update this array with response to each action

``` for i in range(act.size):
#for 1st action, last response will be 0 (‘off’)
if i == 0:
lastResp = 0
"""
for next attempts, record previous response and update latest response by calling #respond() with current action, last response and tau value as parameters
"""
else:
lastResp = resp[i - 1]
resp[i] = respond(act[i], lastResp, 2.0) ```
1. Create dataframe of actions and corresponding responses
`df = pd.DataFrame(list(zip(act, resp)), columns=['Action', 'Response'])`

Sample condensed data frame:

Plot the actions and responses.

df.plot()

1. Create a reward function using which the agent tries to keep the temperature in [0.4,0.6] range.
``` def reward(temperature):
delta = abs(temperature - 0.5)
#if the temperature in [0.4,0.6] range, set the reward to 0
if delta < 0.1:
return 0.0
“””
If it’s not in the range, the agent sets the reward as the negative distance of the temperature from the nearest end of the range e.g. if the temperature is 0.7, it is nearer to 0.6 than 0.4; the difference between 0.7 and 0. Is 0.1 so the reward is set to -0.1. If the temperature is 0.35, it is nearer to 0.4 than 0.6; the difference between 0.4 and 0.35 is 0.05 so the reward is set to -0.05
“””
else:
return -delta + 0.1 ```

Create a list of temperatures from 0.0 to 1.0.

`tmp = [t * 0.01 for t in range(100)]`

DeepMind Has A Solution For AGI. Not Many Agree

Compute reward for each temperature value

`rew = [reward(t) for t in tmp]`

Plot temperature vs. reward graph

``` fig=plt.figure(figsize=(12, 4))
plt.scatter(tmp, rew)
plt.xlabel('Temp')
plt.ylabel('Reward')
plt.title('Reward vs. Temp') ```

Output:

1. Create a class defining thermostat environment
``` class TSEnv(Environment):
def __init__(self):
#Initialize tau and current temperature
self.tau = 3.0
self.curr_temp = np.random.random(size=(1,))
super().__init__()
"""
Define a function for state of the heater with minimum and maximum temperatures specified as 0.0 and 1.0 respectively
"""
def states(self):
return dict(type='float', shape=(1,), min_value=0.0, max_value=1.0)
#Define a function to specify action (0:off, 1:on)
def actions(self):
return dict(type='int', num_values=2)
#Define a function to set the heater’s state
def reset(self):
self.timestep = 0
self.curr_temp = np.random.random(size=(1,))
return self.curr_temp
#Define a function for agent’s response to the action.
def response(self, action):
return action + (self.curr_temp - action) * math.exp(-1.0 /
self.tau)
#Compute reward using the same logic as done in step (7)
def reward_compute(self):
delta = abs(self.curr_temp - 0.5)
if delta < 0.1:
return 0.0
else:
return -delta[0] + 0.1
#Define a function to execute the action
def execute(self, act):
# Check the action (whether heater is on or off)
assert act == 0 or act == 1
#Advance the environment b one step
self.timestep += 1
# Update current temperature according to the agent’s response
self.current_temp = self.response(actions)
#Calculate the reward
reward = self.reward_compute()
terminal = False   #episode is not over
#return the current temperature and computed reward
return self.curr_temp, terminal, reward ```
1. Create environment by specifying the thermostat environment class defined above and the maximum number of timestamps in each episode
``` environment = Environment.create( environment=TSEnv,
max_episode_timesteps=150) ```
1. Configure an agent to learn responding in the thermostat environment
``` ag = Agent.create(
agent='tensorforce', environment=environment, update=64,
) ```
1. Train the agent for 150 episodes
``` for _ in range(150):
#reset the environment first
states = environment.reset()
terminal = False
#while the episode is not over
while not terminal:
#record agent’s action on the heater’s current state
act = agent.act(states=states)
#execute the agent’s actions
states, terminal, rew = environment.execute(actions=act)
"""
act() method should be followed by observe() which observes the computed reward and checks whether the temperature has reached a terminal state
"""
agent.observe(terminal=terminal, reward=rew) ```
1. Check the trained agent’s performance
``` #Reset the environment
environment.reset()
#Initialize the current temperature, state and terminal
environment.curr_temp = np.array([1.0])
states = environment.curr_temp
intr = agent.initial_internals()
terminal = False
#Run one episode
temperature = [environment.curr_temp[0]]
#Till the episode is not over
while not terminal:
#Let the agent perform action on the current state
ac, internals = agent.act(states=states, internals=intr,
independent=True)
#Execute agents action and record rewars
states, terminal, reward = environment.execute(actions=ac)
temperature += [states[0]]
#Plot the agent’s response
plt.figure(figsize=(12, 4))
ax=plt.subplot()
#Limits of temperature
ax.set_ylim([0.0, 1.0])
#plot the temperature
plt.plot(range(len(temperature)), temperature)
#Draw red lines at temperatures 0.4 and 0.6 to see if temperature
#remains in the [0.4,0.6] range
plt.hlines(y=0.4, xmin=0, xmax=149, color='r')
plt.hlines(y=0.6, xmin=0, xmax=149, color='r')
plt.xlabel('Timestep')     #X-axis label
plt.ylabel('Temperature')  #Y-axis label
plt.title('Temperature vs. Timestep')  #Title of the plot
plt.show()   #Display the plot ```

Output:

The output plot shows that the agent keeps the temperature in the [0.4,0.6] range (shown in blue).

References

Refer to the following sources for detailed information on Tensorforce:

What Do You Think?