Reinforcement Learning For Better Recommender Systems

Collaborative Interactive Recommenders (CIRs) are a class of recommender systems that emerged out of the need to make recommendations user-specific. The growth of online services demanded the service providers to up their game by developing strategies to maximise customer engagement.

However, it is machine learning and the results are only as good as the data it is fed. Since humans are the culmination of complex systems, decisions are usually made in a large hazy grey area. This makes human-like intelligence in machines a distant dream. That said, algorithms are being designed to come closer to human-like intelligence.

In an attempt to make better decisions and recommendations, ML developers from Google merged reinforcement learning and recommender systems.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The next generation of recommenders is forecasted to be modelled around sequential user interaction for optimising users’ long-term engagement and overall satisfaction. The importance of modelling the dynamics of user interaction when devising good algorithmic and modelling techniques for CIRs is plainly obvious. Setting aside questions of user interface design and natural language interaction, this makes CIRs a natural setting for the use of reinforcement learning (RL).

So the researchers have built a general-purpose simulation platform dubbed RecSim to facilitate the study of reinforcement learning algorithms in recommender systems. And, RecSim has been open-sourced

RecSim As A Platform

RecSim allows both researchers and practitioners to test the limits of existing RL methods in synthetic recommender settings.

RecSim simulates a recommender agent’s interaction with an environment where the agent interacts by doing some recommendations to users. Both the user and the subject of recommendations are simulated.

The simulations are done based on popularity, interests, demographics, frequency and other traits.

So, the question now would be how different is this from the conventional approaches?

When an RL agent recommends something to a user, then depending on the user’s acceptance, few traits are scored high. This still sounds like a typical recommendation system. However, with RecSim, a developer can author these traits. The features in a user choice model can be made more customised as the agent gets rewarded for making the right recommendation.

The team behind RecSim at Google believes that this simulation platform can be used to test algorithm performance and robustness to different assumptions about user behaviour.

RecSim was created to facilitate the following:

  • Investigate the intersection of RL and recommender systems; 
  • Encourages reproducibility and model-sharing; 
  • Rapidly test and refine models and algorithms in simulation, before incurring the potential cost of live experiments; and 
  • Ease up academic-industry collaboration through the release of “realistic” stylised models of user behaviour without revealing user data or sensitive industry strategies.

RecSim’s aim is to support simulations that mimic the user behaviour that is found in real recommender systems and serve as a controlled environment for developing and assessing recommender models and algorithms, especially reinforcement learning systems designed for sequential user-system interaction.

Future Direction

As Google researchers see a promising future in pursuing this reinforcement-recommender model investigation, they plan to develop the following add-ons:

  • Methodologies to fit stylised user models to usage logs to partially address the gap between reality and simulation
  • Develop APIs using TensorFlow to facilitate model specification and learning, 
  • Scaleup simulation and inference algorithms using accelerators and distributed execution; and 
  • Establish mixed-mode interaction models that will be the de facto standard for modern CIRs.

They posit that in their work that modern collaborative interactive recommenders will cover a variety of system actions such as preference elicitation, providing endorsements, navigation chips and user responses (e.g., example critiquing, indirect/direct feedback, query refinements), not to mention unstructured natural language interaction.

Researchers hope that RecSim will serve as a valuable resource that bridges the gap between recommender systems and RL research.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox