MITB Banner

A Guide to Building Hybrid Recommendation Systems for Beginners

A hybrid recommendation system is a special type of recommendation system which can be considered as the combination of the content and collaborative filtering method.

Share

Recommendation systems are widely used in a variety of applications for recommending products or items to the user. There are two popular methods used for filtering the recommendations, content-based and collaborative filtering. These methods face the issue when there is not enough data to learn the relation between user and items. In such cases, the third type of approach is used to build the recommendation systems named as Hybrid Recommendation System. This approach overcomes the limitations of both content-based and collaborative filtering methods. In this article, we will discuss the hybrid recommendation systems in detail and we will learn how to build a hybrid recommendation system using a Python implementation named LightFM. The major points to be covered in the article are listed below.

Table of Contents

  1. Hybrid Recommendation System
  2. Types of Data for Generating Recommendation Systems
  3. Losses used by Recommendation Systems
    1. Bayesian Personalised Ranking(BPR)
    2. Weighted Approximate-Rank Pairwise (WARP)
  4. Implementing a Hybrid Recommendation System
    1. Fitting the Model with BPR Loss
    2. Fitting the Model with WARP Loss

Hybrid Recommendation System

A hybrid recommendation system is a special type of recommendation system which can be considered as the combination of the content and collaborative filtering method. Combining collaborative and content-based filtering together may help in overcoming the shortcoming we are facing at using them separately and also can be more effective in some cases. Hybrid recommender system approaches can be implemented in various ways like by using content and collaborative-based methods to generate predictions separately and then combining the prediction or we can just add the capabilities of collaborative-based methods to a content-based approach (and vice versa). 

There are several studies that compare the performance of the conventional approaches with the hybrid methods and say that by using the hybrid methods we can generate more accurate recommendations.

Image source

Types of Data for Generating a Recommendation System

In accordance with the approaches, we can divide the data into two types using which we can generate a recommendation system:-

  • Explicit Feedback: the data which contains the user’s explicit feedback. Explicit feedback can be a kind of rating from the user to the item which tells about the status of the user whether he liked the product or not.
  • Implicit Feedback: this data is not about the rating or score which is provided by the user, it can be some information that can inform about clicks, watched movies, played songs, etc.

Here in this article, we are going to build a recommendation system on implicit feedback so here we will talk about the importance of implicit feedback. In the above section, we tried to define both explicit and implicit feedback. So let’s talk about a recommender that is based on explicit feedback and it is providing recommendations based on the rating for example we can say recommendation of books based on the rating by the user. Now the focus of the recommender is on the rating but it is not considering which book a user chooses to read in the first place and also if the ratings are not available that can lead the recommendation to a state where the absence of information can occur. 

Information such as which book has not been chosen by anyone or which book has been chosen by most of the people can be a good source of information for a recommendation system to recommend the best out of itself. 

It is very simple to understand that the ratings that are missing are more likely to be negative precisely because the user chooses which items to rate and the left gets a blank rating. Or we can say things which are not expected to be liked by us. We left them without giving the ratings.

These observations led us to make a model which can work with implicit feedback. And there is a library called LightFM that can help us on making a recommendation system on Implicit feedback.

Losses used by Recommendation Systems

 We can build recommendation systems in two ways using two different loss approaches:-

  • Bayesian Personalised Ranking(BPR) pairwise loss – this method can be used when the positive interaction from the user on the data is presented and we are required to optimize the ROC AUC. in this using the pairwise loss we try to maximize the prediction difference between positive feedback and a randomly selected negative feedback.
  • Weighted Approximate-Rank Pairwise(WARP) loss: this is useful when the positive interaction is available in the feedback and we are required to optimize some top recommendations. Here it repeatedly samples the negative feedback until it finds the one feedback which is violating the rank and this procedure maximizes the rank of positive feedback. 

Implementing a Hybrid Recommendation System

Let us build a hybrid recommendation system using the python implementation named LightFM. In this implementation, we are going to see how we can estimate the above-given model (BPR and WARP). Before going for implementation we may require to install the library which we are required to use in the implementation 

Installing LightFM

Using the following code we can install the library using the pip.

!pip install lightfm

Output:

Here for the implementation, I am using the Movielens data where the data consists of:

  • 100,000 ratings (1-5) from 943 users on 1682 movies. 
  • Each user has rated at least 20 movies. 
  • Simple demographic info for the users (age, gender, occupation, zip)

Other information about the data is given in the below image:

Fortunately, the data is available in the LightFM library for practice purposes. We can call it for further implementation.

Importing libraries and dataset:

import numpy as np
from lightfm.datasets import fetch_movielens
data = fetch_movielens()

Checking the dictionaries and their size in  the data:

for key, value in data.items():
    print(key, value.shape)

Output:

Defining the train and test data for training and testing purposes:

train = data['train']
test = data['test']

Here the test and train consist of raw ratings where each row is a piece of user information and columns are item or movie information and the ratings between 1 to 5 are the entries. 

Fitting the Model with BPR Loss

from lightfm import LightFM
model = LightFM(learning_rate=0.05, loss='bpr')
model.fit(train, epochs=10)

Output:

Next, We are using two metrics of accuracy: k(precision) and ROC AUC. These ranking metrics help in checking the accuracy of recommendations. To compute these metrics libraries’ algorithm will build the list of recommendations for every user and cross-check the ranking for movies that are already known as positive movies. The k will tell us whether the predictions made by the model are lying within the first k results on the list or not. And the AUC score is a measure that tells the probability that any known positive is in a higher place on the list than the randomly selected negative example.

from lightfm.evaluation import precision_at_k, auc_score
train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10, train_interactions=train).mean()
train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test, train_interactions=train).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

Output:

Fitting the Model with WARP Loss

As we have discussed, the WARP loss procedure maximizes the rank of positive feedback so the precision of the model using WARP is expected to be higher than the BPR loss model. we can implement the  WARP by just replacing the loss = “bpr” with loss = “warp”.

model = LightFM(learning_rate=0.05, loss='warp')
model.fit_partial(train, epochs=10)

Checking for the precession and AUC:

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10, train_interactions=train).mean()
train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test, train_interactions=train).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

Output:

Here we can see that the precision in the WARP loss model is higher than the BPR loss model and also there is some increment in the AUC score.

Final Words

In this article, we discussed what hybrid recommendation systems are and how they are useful. These systems overcome the limitations of other recommendation filtering approaches, content-based and collaborative filtering. We also discussed what types of data can be used to build the recommendation systems and what are the different loss functions used by these systems. The most important part of this article is that we understood how to build hybrid recommendation systems and how to evaluate their performance. Hope this article will be helpful to any beginner who wants to learn the implementation of hybrid recommendation systems.

References 

Share
Picture of Yugesh Verma

Yugesh Verma

Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India