Last updated September 9, 2020

Reinforcement Learning For Better Recommender Systems

Published on November 21, 2019

by Ram Sagar

Collaborative Interactive Recommenders (CIRs) are a class of recommender systems that emerged out of the need to make recommendations user-specific. The growth of online services demanded the service providers to up their game by developing strategies to maximise customer engagement.

However, it is machine learning and the results are only as good as the data it is fed. Since humans are the culmination of complex systems, decisions are usually made in a large hazy grey area. This makes human-like intelligence in machines a distant dream. That said, algorithms are being designed to come closer to human-like intelligence.

In an attempt to make better decisions and recommendations, ML developers from Google merged reinforcement learning and recommender systems.

The next generation of recommenders is forecasted to be modelled around sequential user interaction for optimising users’ long-term engagement and overall satisfaction. The importance of modelling the dynamics of user interaction when devising good algorithmic and modelling techniques for CIRs is plainly obvious. Setting aside questions of user interface design and natural language interaction, this makes CIRs a natural setting for the use of reinforcement learning (RL).

So the researchers have built a general-purpose simulation platform dubbed RecSim to facilitate the study of reinforcement learning algorithms in recommender systems. And, RecSim has been open-sourced

RecSim As A Platform

RecSim allows both researchers and practitioners to test the limits of existing RL methods in synthetic recommender settings.

RecSim simulates a recommender agent’s interaction with an environment where the agent interacts by doing some recommendations to users. Both the user and the subject of recommendations are simulated.

The simulations are done based on popularity, interests, demographics, frequency and other traits.

So, the question now would be how different is this from the conventional approaches?

When an RL agent recommends something to a user, then depending on the user’s acceptance, few traits are scored high. This still sounds like a typical recommendation system. However, with RecSim, a developer can author these traits. The features in a user choice model can be made more customised as the agent gets rewarded for making the right recommendation.

The team behind RecSim at Google believes that this simulation platform can be used to test algorithm performance and robustness to different assumptions about user behaviour.

RecSim was created to facilitate the following:

Investigate the intersection of RL and recommender systems;
Encourages reproducibility and model-sharing;
Rapidly test and refine models and algorithms in simulation, before incurring the potential cost of live experiments; and
Ease up academic-industry collaboration through the release of “realistic” stylised models of user behaviour without revealing user data or sensitive industry strategies.

RecSim’s aim is to support simulations that mimic the user behaviour that is found in real recommender systems and serve as a controlled environment for developing and assessing recommender models and algorithms, especially reinforcement learning systems designed for sequential user-system interaction.

Future Direction

As Google researchers see a promising future in pursuing this reinforcement-recommender model investigation, they plan to develop the following add-ons:

Methodologies to fit stylised user models to usage logs to partially address the gap between reality and simulation
Develop APIs using TensorFlow to facilitate model specification and learning,
Scaleup simulation and inference algorithms using accelerators and distributed execution; and
Establish mixed-mode interaction models that will be the de facto standard for modern CIRs.

They posit that in their work that modern collaborative interactive recommenders will cover a variety of system actions such as preference elicitation, providing endorsements, navigation chips and user responses (e.g., example critiquing, indirect/direct feedback, query refinements), not to mention unstructured natural language interaction.

Researchers hope that RecSim will serve as a valuable resource that bridges the gap between recommender systems and RL research.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place