How To Automate The Stock Market Using FinRL (Deep Reinforcement Learning Library)?

FInRL library


First, let’s discuss all the buzzwords, and then we will move to the implementation part where we code a starter project in stock market trading.

Reinforcement learning

Reinforcement learning is one of the three basic paradigms of Machine learning alongside supervised and unsupervised learning. It concerned with how intelligent agents take action by themselves in order to maximize the notion and reward. It is more like a trial and error kind of approach.

reinforcement learning in FInRL
Reinforcement learning

It doesn’t take any labeled data or any sub-optimal actions to be corrected explicitly. Instead, it focuses on finding the balance between exploration and exploitation.

Deep reinforcement learning(Deep RL/DLR)

Deep reinforcement learning(Deep RL) is a subfield of AI and Machine Learning that combines Reinforcement learning(RL) and Deep learning. It integrates deep learning into the solution, allowing agents to make decisions from unstructured data. For example– Deep RL algorithms are able to render every pixel of the screen in a video game and decide what actions to perform to maximizing the game score.

Quantitative Finance

Quantitative Finance is referred to as the use of mathematical models and extremely large datasets to analyze financial market data and securities.

Evaluation of quantitative finance

Deep RL/DRL has been recognized as one of the most effective approaches in quantitative finance to find out how to train a practical DRL trading agent that decides where to trade, what price to trade, and what quantity to trade.



FinRL is a deep reinforcement learning(DRL) library by AI4Finance-LLC(open community to promote AI in Finance) that exposes beginners to do quantitative financial analysis and develop their own custom stock trading strategies. FinRL is a beginner library with fine-tuned DRL algorithms, and there are three primary principles discussed in its official research paper: FinRL: A Deep Reinforcement Learning Library for

Automated Stock Trading in Quantitative Financethat are:

  1. Completeness: This library covers all the major DRL framework completely.
  2. Hands-on tutorial: A detailed tutorial will be provided within the library.
  3. Reproducibility: FinRL ensures transparency to provide users with confidence.


FinRL library follows the three layer architecture that is the stock market environment(application layer), DRL trading agent, and stock trading application(finance market environment). The agent layer interacts with the environment layer in an exploration and exploitation manner, whether to make a repeat decision or to make a new action for greater rewards.

The lower layer provides the APIs for the upper layer, which makes the lower layer transparent to the upper layer.

architecture of FinRL

Deep Reinforcement Learning Agents

FinRL library contains fine-tuned DRL algorithms, namely: DQN, DDPG Multi-Agent DDPG, PPO, SAC, A2C, and TD3. This library also allows users to design their own custom DRL algorithms by adapting these algorithms, e.g., Adaptive DDPG, or employing ensemble methods. The comparison of DRL algorithms is shown in the below figure.

different algorithm in deep reinforcement learning agents


You can install the FinRL library using git or pip as follows:

git clone

Or, you can try installing the unstable version using PIP:

pip install git+

Installing Dependencies

pip install -r requirements.txt

For other dependencies issue or stable OpenAI Baseline(high-quality implementation of DLR algorithms), go here and to learn more about Baseline packages visit here


Deep Reinforcement Learning for Stock Trading from Scratch: Single Stock Trading

Let’s take an example to leverage the FinRL library with coding implementation. We are going to use Apple Inc. stock: AAPL – dataset, the problem is to design an automated trading solution for single stock trading. First, we will model the stock trading process as a Markov Decision Process(MDP), and then we will formulate it as our maximization problem.

There are four main components of a reinforcement learning environment:

  1. Action
  2. Reward
  3. State
  4. Environment

The data of Apple stocks are obtained from Yahoo Finance API, The data contains Open-High-Low-Close price and volume. We are going to use google colab for this demonstration as it provides free GPU for training and evaluation and the reference of below code is taken from the official GitHub repository of FinRL here

Install and import packages

!pip install git+

# import and if any module is not installed, install it using pip
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
import datetime
import os
from finrl.config import config
from finrl.marketdata.yahoodownloader import YahooDownloader
from finrl.preprocessing.preprocessors import FeatureEngineer
from import data_split
from finrl.env.env_stocktrading import StockTradingEnv
from finrl.model.models import DRLAgent
from import BackTestStats, BaselineStats, BackTestPlot
import sys

Create folders for data, result metrics, and tensorboard logs

if not os.path.exists("./" + config.DATA_SAVE_DIR):
    os.makedirs("./" + config.DATA_SAVE_DIR)
if not os.path.exists("./" + config.TRAINED_MODEL_DIR):
    os.makedirs("./" + config.TRAINED_MODEL_DIR)
if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR):
    os.makedirs("./" + config.TENSORBOARD_LOG_DIR)
if not os.path.exists("./" + config.RESULTS_DIR):
    os.makedirs("./" + config.RESULTS_DIR)

Download Apple Stocks data using Yahoo finance API

Worry not, FinRL comes with a class name YahooDownloader that is an easy way to download stock data from Yahoo Finance API. 

data_df = YahooDownloader(start_date = '2009-01-01',
                          end_date = '2021-01-01',
                          ticker_list = ['AAPL']).fetch_data()
apple stock data


Let’s do some feature engineering and data cleaning, FinRL contains classes like FeatureEngineer(provides methods for preprocessing the stock price data) and technical indicators.

## let’s store the technical indicator column names in
## you can add more technical indicators
## visit for different names

Passing parameter to FeatureEngineer for adding technical indicators.
fe = FeatureEngineer(
                    tech_indicator_list = tech_indicator_list,
                    user_defined_feature = False)

data_df = fe.preprocess_data(data_df)

Trading Environment building

This environment is based on OpenAI Gym framework, which simulates hte live stock market data with real market data. Let’s split the dataset into train(2009-01-01 to 2018-12-31) and trade(2019-01-01 to 2020-09-30) datasets.

train = data_split(data_df, start = '2009-01-01', end = '2019-01-01')
trade = data_split(data_df, start = '2019-01-01', end = '2021-01-01')

Initiate environment

stock_dimension = len(train.tic.unique())
state_space = 1 + 2*stock_dimension + len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension
print(f"Stock data Dimensions: {stock_dimension}, State Spaces: {state_space}")
env_kwargs = {
    "hmax": 100, 
    "initial_amount": 100000, 
    "transaction_cost_pct": 0.001, 
    "state_space": state_space, 
    "stock_dim": stock_dimension, 
    "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, 
    "action_space": stock_dimension, 
    "reward_scaling": 1e-4}
e_train_gym = StockTradingEnv(df = train, **env_kwargs)
env_train, _ = e_train_gym.get_sb_env()

Implement DRL Algorithms

FinRL library uses fine-tuned algorithms such as  DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C, and TD3. The implementation of DRL algorithms are based on OpenAI and Stable Baselines 

agent = DRLAgent(env = env_train)

Training on 5 different models

We are going to see implementation in 5 different models provided by FinRL: A2C, DDPG, PPO, TD3, and SAC

1. Model: A2C

agent = DRLAgent(env = env_train)
A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002}
model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS)
trained_a2c = agent.train_model(model=model_a2c, 

2. Model: DDPG

agent = DRLAgent(env = env_train)
DDPG_PARAMS = {"batch_size": 64, "buffer_size": 500000, "learning_rate": 0.0001}
model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS)

trained_ddpg = agent.train_model(model=model_ddpg, 

3. Model: PPO

agent = DRLAgent(env = env_train)
    "n_steps": 2048,
    "ent_coef": 0.005,
    "learning_rate": 0.0001,
    "batch_size": 128,}
model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS)
trained_ppo = agent.train_model(model=model_ppo, 

4. Model: TD3

agent = DRLAgent(env = env_train)
TD3_PARAMS = {"batch_size": 128, 
              "buffer_size": 1000000, 
              "learning_rate": 0.0003}

model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS)
trained_td3 = agent.train_model(model=model_td3, 

5. Model: SAC

agent = DRLAgent(env = env_train)
    "batch_size": 128,
    "buffer_size": 100000,
    "learning_rate": 0.0001,
    "learning_starts": 100,
    "ent_coef": "auto_0.1",

model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS)
trained_sac = agent.train_model(model=model_sac, 


We have trained five different models on our datasets now let’s trade using the environment class we inintialized above for creating trading environment, let’s assume that you are having $100K initial money on date 2019-01-01. We will use the TD3 trained model to trade AAPL.

trading stocks using deep reinforcement library finrl

Make a prediction and get the account value change

trade = data_split(data_df, start = '2019-01-01', end = '2021-01-01')
e_trade_gym = StockTradingEnv(df = trade, **env_kwargs)
env_trade, obs_trade = e_trade_gym.get_sb_env()

df_account_value, df_actions = DRLAgent.DRL_prediction(model=trained_td3, test_data = trade, test_env = env_trade, test_obs = obs_trade)

Backtesting Performance

For evaluating our model performance of a trading strategy. An automated backtesting tool is preferred because it reduces all human error. We will be using the Quantopian pyfolio package to backtest our trading strategies.

now ='%Y%m%d-%Hh%M')

perf_stats_all = BackTestStats(account_value=df_account_value)
perf_stats_all = pd.DataFrame(perf_stats_all)
result of training finrl


%matplotlib inline
BackTestPlot(account_value=df_account_value, baseline_ticker = 'AAPL',
             baseline_start = '2019-01-01', baseline_end = '2021-01-01')
plotting apple stocks
data analysis of apple stock
plotting apple stocks data using finrl


We have discussed all FinRL libraries that are a Deep Reinforcement Learning(DRL) library designed specifically for automated stock trading and are open-sourced for educational and demonstrative purposes. We have seen different training models and output metrics and plots. The classes are very handy to use if you want to use the custom data you can just need to convert the dataset into FinRL format and you are good to go for training and evaluation.

The maximum drawdown in FinRL performance was largely due to the Covid-19 market crash; you can see the after-effect in table 1 in the official research paper here. On page 8.

To learn more about FinRL, here are some of the resources you can follow:

Download our Mobile App

Mohit Maithani
Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox