Reinforcement Learning In Finance – A Newbie In Portfolio Selection And Allocation

Ever heard about financial use cases of reinforcement learning, yes but very few. One such use case of reinforcement learning is in portfolio management. Earlier Markowitz models were used, then came the Black Litterman models but now with the advent of technology and new algorithms, reinforcement learning finds its place in the financial arena.

Portfolio selection and allocation have been a manual task majorly. Using reinforcement learning, the task of portfolio selection and allocation can be automated wherein the system will provide you with an optimum portfolio which will most likely give you maximum returns.

Reinforcement learning (RL) is a branch of Machine Learning where actions are taken in an environment to maximize the notion of a cumulative reward. It is one of the very important branches along with supervised learning and unsupervised learning. Reinforcement learning consists of several components – agent, state, policy, value function, environment and rewards/returns.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

So, the agent is in a particular state and follows some policies to maximize the rewards in any environment. Depending on the actions the agent performs, the agent is either penalized or rewarded depending on his actions align with the objectives. We have always seen reinforcement learning applications in game theory where the player is the agent and simulation of the game works around the environment. The goal of a game could be to win maximum points or reach its destination at the earliest.

The objective of reinforcement learning of maximizing rewards is in line with game goals. Similarly, it can be applied in finance as well as investments which are based on the same goal of maximizing rewards. Chess, Atari, Go and many other similar games use reinforcement learning and are based on the same principles. 

Deep Reinforcement Learning in Obtaining Maximum Return from Stocks

Deep reinforcement learning policies can be applied for portfolio selection methods. I have performed an experiment for obtaining a portfolio of stocks that will give maximum returns. 

Some stocks and their basic OHLC data will form your dataset. This data for the various stocks can be picked up from any particular index which covers a good mix of stocks like Sensex, Nifty etc. The stock portfolio should be created in such a way that it has a mix of low beta value stocks as well as high beta value stocks.

Clustering would be a good option to obtain the different low beta and high beta valued stocks. Once the different clusters are obtained, try to create different combinations of the stocks using permutations and combinations. After the stock selection is done, apply the deep policy network reinforcement learning algorithm on each of those combinations. 

Setting up parameters to implement Reinforcement Learning

Let us define the reinforcement learning environment first. The agent will have parameters set up for the usual conventions – 

  1. The state will be the inputs and previous portfolio weights
  2. The action will consist of the investment weights
  3. The reward function will be based on the agent’s return – the baseline return and any other proportional returns 

Once the parameters are set, the deep reinforcement learning architecture is implemented. The architecture consists of neural network layers that will perform some calculations and provide us with the maximum returns. Four convolution layers can be used for implementing the algorithm, the input to the architecture would be the OHLC data for each of the stocks and 50 time periods of historical data.

Various mathematical operations are performed within the neural network and a cash bias is added to the last layer to make it balanced. A softmax activation function is used in the last layer. The output would be the current state, instant reward at the end of each iteration. When this model is trained by the RL agent, the portfolio weights are cumulatively displayed at the end of each iteration. So when the different combinations are used along with this policy, one portfolio is obtained which gives you the maximum rewards. This portfolio can be used for investments and for higher returns.

In the experiment that I conducted, I had taken a dataset of 150 stocks where the data was obtained from yahoo finance and 4 features namely, the OHLC data was considered. I have used various clustering options for getting different combinations of stocks. Stock selection plays a very important role here. I used a portfolio that would consist of 10 stocks and I had provided the budget of investment to be 10000.

After applying the deep reinforcement learning algorithm on the stocks, I obtained the best value of around 11993. A return of 20% is quite a good return considering the stock market. As RL requires very lesser data in order to predict future values, it is a good option to be considered when developing a portfolio across diverse stocks and domains.

This can become an automated utility that just needs to be fed with a dataset of a good number of stocks and will give you the portfolio weights after the stock selection is made. Reinforcement learning is a newbie and has not been utilized to its full potential. Better experiments will highlight more benefits of it in the finance domain and can make life much easier.


This article is presented by AIM Expert Network (AEN), an invite-only thought leadership platform for tech experts. Check your eligibility.

Ekta Shah
An engineer at the core, data science is my passion. I have a Masters in Data Science from NMIMS. I have worked on machine learning problems, image classification and reinforcement learning problems. Solving complex problems and thinking of easy solutions is what I practice. Avid reader and writer describe me the best.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR