Now Reading
Reinforcement Learning In Finance – A Newbie In Portfolio Selection And Allocation

Reinforcement Learning In Finance – A Newbie In Portfolio Selection And Allocation


Ever heard about financial use cases of reinforcement learning, yes but very few. One such use case of reinforcement learning is in portfolio management. Earlier Markowitz models were used, then came the Black Litterman models but now with the advent of technology and new algorithms, reinforcement learning finds its place in the financial arena.

Portfolio selection and allocation have been a manual task majorly. Using reinforcement learning, the task of portfolio selection and allocation can be automated wherein the system will provide you with an optimum portfolio which will most likely give you maximum returns.

Reinforcement learning (RL) is a branch of Machine Learning where actions are taken in an environment to maximize the notion of a cumulative reward. It is one of the very important branches along with supervised learning and unsupervised learning. Reinforcement learning consists of several components – agent, state, policy, value function, environment and rewards/returns.

So, the agent is in a particular state and follows some policies to maximize the rewards in any environment. Depending on the actions the agent performs, the agent is either penalized or rewarded depending on his actions align with the objectives. We have always seen reinforcement learning applications in game theory where the player is the agent and simulation of the game works around the environment. The goal of a game could be to win maximum points or reach its destination at the earliest.

The objective of reinforcement learning of maximizing rewards is in line with game goals. Similarly, it can be applied in finance as well as investments which are based on the same goal of maximizing rewards. Chess, Atari, Go and many other similar games use reinforcement learning and are based on the same principles. 

Deep Reinforcement Learning in Obtaining Maximum Return from Stocks

Deep reinforcement learning policies can be applied for portfolio selection methods. I have performed an experiment for obtaining a portfolio of stocks that will give maximum returns. 

Some stocks and their basic OHLC data will form your dataset. This data for the various stocks can be picked up from any particular index which covers a good mix of stocks like Sensex, Nifty etc. The stock portfolio should be created in such a way that it has a mix of low beta value stocks as well as high beta value stocks.

Clustering would be a good option to obtain the different low beta and high beta valued stocks. Once the different clusters are obtained, try to create different combinations of the stocks using permutations and combinations. After the stock selection is done, apply the deep policy network reinforcement learning algorithm on each of those combinations. 

Setting up parameters to implement Reinforcement Learning

Let us define the reinforcement learning environment first. The agent will have parameters set up for the usual conventions – 

  1. The state will be the inputs and previous portfolio weights
  2. The action will consist of the investment weights
  3. The reward function will be based on the agent’s return – the baseline return and any other proportional returns 

Once the parameters are set, the deep reinforcement learning architecture is implemented. The architecture consists of neural network layers that will perform some calculations and provide us with the maximum returns. Four convolution layers can be used for implementing the algorithm, the input to the architecture would be the OHLC data for each of the stocks and 50 time periods of historical data.

See Also

Various mathematical operations are performed within the neural network and a cash bias is added to the last layer to make it balanced. A softmax activation function is used in the last layer. The output would be the current state, instant reward at the end of each iteration. When this model is trained by the RL agent, the portfolio weights are cumulatively displayed at the end of each iteration. So when the different combinations are used along with this policy, one portfolio is obtained which gives you the maximum rewards. This portfolio can be used for investments and for higher returns.

In the experiment that I conducted, I had taken a dataset of 150 stocks where the data was obtained from yahoo finance and 4 features namely, the OHLC data was considered. I have used various clustering options for getting different combinations of stocks. Stock selection plays a very important role here. I used a portfolio that would consist of 10 stocks and I had provided the budget of investment to be 10000.

After applying the deep reinforcement learning algorithm on the stocks, I obtained the best value of around 11993. A return of 20% is quite a good return considering the stock market. As RL requires very lesser data in order to predict future values, it is a good option to be considered when developing a portfolio across diverse stocks and domains.

This can become an automated utility that just needs to be fed with a dataset of a good number of stocks and will give you the portfolio weights after the stock selection is made. Reinforcement learning is a newbie and has not been utilized to its full potential. Better experiments will highlight more benefits of it in the finance domain and can make life much easier.


This article is presented by AIM Expert Network (AEN), an invite-only thought leadership platform for tech experts. Check your eligibility.

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top