Table of contents
Preface
First, let’s discuss all the buzzwords, and then we will move to the implementation part where we code a starter project in stock market trading.
Reinforcement learning
Reinforcement learning is one of the three basic paradigms of Machine learning alongside supervised and unsupervised learning. It concerned with how intelligent agents take action by themselves in order to maximize the notion and reward. It is more like a trial and error kind of approach.
It doesn’t take any labeled data or any sub-optimal actions to be corrected explicitly. Instead, it focuses on finding the balance between exploration and exploitation.
Deep reinforcement learning(Deep RL/DLR)
Deep reinforcement learning(Deep RL) is a subfield of AI and Machine Learning that combines Reinforcement learning(RL) and Deep learning. It integrates deep learning into the solution, allowing agents to make decisions from unstructured data. For example– Deep RL algorithms are able to render every pixel of the screen in a video game and decide what actions to perform to maximizing the game score.
Quantitative Finance
Quantitative Finance is referred to as the use of mathematical models and extremely large datasets to analyze financial market data and securities.
Deep RL/DRL has been recognized as one of the most effective approaches in quantitative finance to find out how to train a practical DRL trading agent that decides where to trade, what price to trade, and what quantity to trade.
FinRL
FinRL is a deep reinforcement learning(DRL) library by AI4Finance-LLC(open community to promote AI in Finance) that exposes beginners to do quantitative financial analysis and develop their own custom stock trading strategies. FinRL is a beginner library with fine-tuned DRL algorithms, and there are three primary principles discussed in its official research paper: “FinRL: A Deep Reinforcement Learning Library for
Automated Stock Trading in Quantitative Finance” that are:
- Completeness: This library covers all the major DRL framework completely.
- Hands-on tutorial: A detailed tutorial will be provided within the library.
- Reproducibility: FinRL ensures transparency to provide users with confidence.
Architecture
FinRL library follows the three layer architecture that is the stock market environment(application layer), DRL trading agent, and stock trading application(finance market environment). The agent layer interacts with the environment layer in an exploration and exploitation manner, whether to make a repeat decision or to make a new action for greater rewards.
The lower layer provides the APIs for the upper layer, which makes the lower layer transparent to the upper layer.
Deep Reinforcement Learning Agents
FinRL library contains fine-tuned DRL algorithms, namely: DQN, DDPG Multi-Agent DDPG, PPO, SAC, A2C, and TD3. This library also allows users to design their own custom DRL algorithms by adapting these algorithms, e.g., Adaptive DDPG, or employing ensemble methods. The comparison of DRL algorithms is shown in the below figure.
Installation
You can install the FinRL library using git or pip as follows:
git clone https://github.com/AI4Finance-LLC/FinRL-Library.git
Or, you can try installing the unstable version using PIP:
pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git
Installing Dependencies
pip install -r requirements.txt
For other dependencies issue or stable OpenAI Baseline(high-quality implementation of DLR algorithms), go here and to learn more about Baseline packages visit here
Implementation
Deep Reinforcement Learning for Stock Trading from Scratch: Single Stock Trading
Let’s take an example to leverage the FinRL library with coding implementation. We are going to use Apple Inc. stock: AAPL – dataset, the problem is to design an automated trading solution for single stock trading. First, we will model the stock trading process as a Markov Decision Process(MDP), and then we will formulate it as our maximization problem.
There are four main components of a reinforcement learning environment:
- Action
- Reward
- State
- Environment
The data of Apple stocks are obtained from Yahoo Finance API, The data contains Open-High-Low-Close price and volume. We are going to use google colab for this demonstration as it provides free GPU for training and evaluation and the reference of below code is taken from the official GitHub repository of FinRL here
Install and import packages
!pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git # import and if any module is not installed, install it using pip import pandas as pd import matplotlib.pyplot as plt import numpy as np import matplotlib matplotlib.use('Agg') import datetime import os from finrl.config import config from finrl.marketdata.yahoodownloader import YahooDownloader from finrl.preprocessing.preprocessors import FeatureEngineer from finrl.preprocessing.data import data_split from finrl.env.env_stocktrading import StockTradingEnv from finrl.model.models import DRLAgent from finrl.trade.backtest import BackTestStats, BaselineStats, BackTestPlot import sys sys.path.append("../FinRL-Library")
Create folders for data, result metrics, and tensorboard logs
if not os.path.exists("./" + config.DATA_SAVE_DIR): os.makedirs("./" + config.DATA_SAVE_DIR) if not os.path.exists("./" + config.TRAINED_MODEL_DIR): os.makedirs("./" + config.TRAINED_MODEL_DIR) if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR): os.makedirs("./" + config.TENSORBOARD_LOG_DIR) if not os.path.exists("./" + config.RESULTS_DIR): os.makedirs("./" + config.RESULTS_DIR)
Download Apple Stocks data using Yahoo finance API
Worry not, FinRL comes with a class name YahooDownloader that is an easy way to download stock data from Yahoo Finance API.
data_df = YahooDownloader(start_date = '2009-01-01', end_date = '2021-01-01', ticker_list = ['AAPL']).fetch_data() data_df.head()
Preprocessing
Let’s do some feature engineering and data cleaning, FinRL contains classes like FeatureEngineer(provides methods for preprocessing the stock price data) and technical indicators.
## let’s store the technical indicator column names in config.py tech_indicator_list=config.TECHNICAL_INDICATORS_LIST ## you can add more technical indicators ## visit https://github.com/jealous/stockstats for different names tech_indicator_list=tech_indicator_list+['kdjk','open_2_sma','boll','close_10.0_le_5_c','wr_10','dma','trix'] print(tech_indicator_list) Passing parameter to FeatureEngineer for adding technical indicators. fe = FeatureEngineer( use_technical_indicator=True, tech_indicator_list = tech_indicator_list, use_turbulence=False, user_defined_feature = False) data_df = fe.preprocess_data(data_df) data_df.head()
Trading Environment building
This environment is based on OpenAI Gym framework, which simulates hte live stock market data with real market data. Let’s split the dataset into train(2009-01-01 to 2018-12-31) and trade(2019-01-01 to 2020-09-30) datasets.
train = data_split(data_df, start = '2009-01-01', end = '2019-01-01') trade = data_split(data_df, start = '2019-01-01', end = '2021-01-01')
Initiate environment
stock_dimension = len(train.tic.unique()) state_space = 1 + 2*stock_dimension + len(config.TECHNICAL_INDICATORS_LIST)*stock_dimension print(f"Stock data Dimensions: {stock_dimension}, State Spaces: {state_space}") env_kwargs = { "hmax": 100, "initial_amount": 100000, "transaction_cost_pct": 0.001, "state_space": state_space, "stock_dim": stock_dimension, "tech_indicator_list": config.TECHNICAL_INDICATORS_LIST, "action_space": stock_dimension, "reward_scaling": 1e-4} e_train_gym = StockTradingEnv(df = train, **env_kwargs) env_train, _ = e_train_gym.get_sb_env() print(type(env_train))
Implement DRL Algorithms
FinRL library uses fine-tuned algorithms such as DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C, and TD3. The implementation of DRL algorithms are based on OpenAI and Stable Baselines
agent = DRLAgent(env = env_train)
Training on 5 different models
We are going to see implementation in 5 different models provided by FinRL: A2C, DDPG, PPO, TD3, and SAC
1. Model: A2C
agent = DRLAgent(env = env_train) A2C_PARAMS = {"n_steps": 5, "ent_coef": 0.005, "learning_rate": 0.0002} model_a2c = agent.get_model(model_name="a2c",model_kwargs = A2C_PARAMS) trained_a2c = agent.train_model(model=model_a2c, tb_log_name='a2c', total_timesteps=50000)
2. Model: DDPG
agent = DRLAgent(env = env_train) DDPG_PARAMS = {"batch_size": 64, "buffer_size": 500000, "learning_rate": 0.0001} model_ddpg = agent.get_model("ddpg",model_kwargs = DDPG_PARAMS) trained_ddpg = agent.train_model(model=model_ddpg, tb_log_name='ddpg', total_timesteps=30000)
3. Model: PPO
agent = DRLAgent(env = env_train) PPO_PARAMS = { "n_steps": 2048, "ent_coef": 0.005, "learning_rate": 0.0001, "batch_size": 128,} model_ppo = agent.get_model("ppo",model_kwargs = PPO_PARAMS) trained_ppo = agent.train_model(model=model_ppo, tb_log_name='ppo', total_timesteps=80000)
4. Model: TD3
agent = DRLAgent(env = env_train) TD3_PARAMS = {"batch_size": 128, "buffer_size": 1000000, "learning_rate": 0.0003} model_td3 = agent.get_model("td3",model_kwargs = TD3_PARAMS) trained_td3 = agent.train_model(model=model_td3, tb_log_name='td3', total_timesteps=30000)
5. Model: SAC
agent = DRLAgent(env = env_train) SAC_PARAMS = { "batch_size": 128, "buffer_size": 100000, "learning_rate": 0.0001, "learning_starts": 100, "ent_coef": "auto_0.1", } model_sac = agent.get_model("sac",model_kwargs = SAC_PARAMS) trained_sac = agent.train_model(model=model_sac, tb_log_name='sac', total_timesteps=30000)
Trading
We have trained five different models on our datasets now let’s trade using the environment class we inintialized above for creating trading environment, let’s assume that you are having $100K initial money on date 2019-01-01. We will use the TD3 trained model to trade AAPL.
trade.head()
Make a prediction and get the account value change
trade = data_split(data_df, start = '2019-01-01', end = '2021-01-01') e_trade_gym = StockTradingEnv(df = trade, **env_kwargs) env_trade, obs_trade = e_trade_gym.get_sb_env() df_account_value, df_actions = DRLAgent.DRL_prediction(model=trained_td3, test_data = trade, test_env = env_trade, test_obs = obs_trade)
Backtesting Performance
For evaluating our model performance of a trading strategy. An automated backtesting tool is preferred because it reduces all human error. We will be using the Quantopian pyfolio package to backtest our trading strategies.
print("==============Results===========") now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M') perf_stats_all = BackTestStats(account_value=df_account_value) perf_stats_all = pd.DataFrame(perf_stats_all)
BackTestPlot
%matplotlib inline BackTestPlot(account_value=df_account_value, baseline_ticker = 'AAPL', baseline_start = '2019-01-01', baseline_end = '2021-01-01')
Conclusion
We have discussed all FinRL libraries that are a Deep Reinforcement Learning(DRL) library designed specifically for automated stock trading and are open-sourced for educational and demonstrative purposes. We have seen different training models and output metrics and plots. The classes are very handy to use if you want to use the custom data you can just need to convert the dataset into FinRL format and you are good to go for training and evaluation.
The maximum drawdown in FinRL performance was largely due to the Covid-19 market crash; you can see the after-effect in table 1 in the official research paper here. On page 8.
To learn more about FinRL, here are some of the resources you can follow: