# A Guide to Building Hybrid Recommendation Systems for Beginners

A hybrid recommendation system is a special type of recommendation system which can be considered as the combination of the content and collaborative filtering method.

Recommendation systems are widely used in a variety of applications for recommending products or items to the user. There are two popular methods used for filtering the recommendations, content-based and collaborative filtering. These methods face the issue when there is not enough data to learn the relation between user and items. In such cases, the third type of approach is used to build the recommendation systems named as Hybrid Recommendation System. This approach overcomes the limitations of both content-based and collaborative filtering methods. In this article, we will discuss the hybrid recommendation systems in detail and we will learn how to build a hybrid recommendation system using a Python implementation named LightFM. The major points to be covered in the article are listed below.

#### THE BELAMY

1. Hybrid Recommendation System
2. Types of Data for Generating Recommendation Systems
3. Losses used by Recommendation Systems
1. Bayesian Personalised Ranking(BPR)
2. Weighted Approximate-Rank Pairwise (WARP)
4. Implementing a Hybrid Recommendation System
1. Fitting the Model with BPR Loss
2. Fitting the Model with WARP Loss

Hybrid Recommendation System

A hybrid recommendation system is a special type of recommendation system which can be considered as the combination of the content and collaborative filtering method. Combining collaborative and content-based filtering together may help in overcoming the shortcoming we are facing at using them separately and also can be more effective in some cases. Hybrid recommender system approaches can be implemented in various ways like by using content and collaborative-based methods to generate predictions separately and then combining the prediction or we can just add the capabilities of collaborative-based methods to a content-based approach (and vice versa).

There are several studies that compare the performance of the conventional approaches with the hybrid methods and say that by using the hybrid methods we can generate more accurate recommendations.

Image source

Types of Data for Generating a Recommendation System

In accordance with the approaches, we can divide the data into two types using which we can generate a recommendation system:-

• Explicit Feedback: the data which contains the user’s explicit feedback. Explicit feedback can be a kind of rating from the user to the item which tells about the status of the user whether he liked the product or not.
• Implicit Feedback: this data is not about the rating or score which is provided by the user, it can be some information that can inform about clicks, watched movies, played songs, etc.

Here in this article, we are going to build a recommendation system on implicit feedback so here we will talk about the importance of implicit feedback. In the above section, we tried to define both explicit and implicit feedback. So let’s talk about a recommender that is based on explicit feedback and it is providing recommendations based on the rating for example we can say recommendation of books based on the rating by the user. Now the focus of the recommender is on the rating but it is not considering which book a user chooses to read in the first place and also if the ratings are not available that can lead the recommendation to a state where the absence of information can occur.

Information such as which book has not been chosen by anyone or which book has been chosen by most of the people can be a good source of information for a recommendation system to recommend the best out of itself.

It is very simple to understand that the ratings that are missing are more likely to be negative precisely because the user chooses which items to rate and the left gets a blank rating. Or we can say things which are not expected to be liked by us. We left them without giving the ratings.

These observations led us to make a model which can work with implicit feedback. And there is a library called LightFM that can help us on making a recommendation system on Implicit feedback.

Losses used by Recommendation Systems

We can build recommendation systems in two ways using two different loss approaches:-

• Bayesian Personalised Ranking(BPR) pairwise loss – this method can be used when the positive interaction from the user on the data is presented and we are required to optimize the ROC AUC. in this using the pairwise loss we try to maximize the prediction difference between positive feedback and a randomly selected negative feedback.
• Weighted Approximate-Rank Pairwise(WARP) loss: this is useful when the positive interaction is available in the feedback and we are required to optimize some top recommendations. Here it repeatedly samples the negative feedback until it finds the one feedback which is violating the rank and this procedure maximizes the rank of positive feedback.

Implementing a Hybrid Recommendation System

Let us build a hybrid recommendation system using the python implementation named LightFM. In this implementation, we are going to see how we can estimate the above-given model (BPR and WARP). Before going for implementation we may require to install the library which we are required to use in the implementation

Installing LightFM

Using the following code we can install the library using the pip.

!pip install lightfm

Output:

Here for the implementation, I am using the Movielens data where the data consists of:

• 100,000 ratings (1-5) from 943 users on 1682 movies.
• Each user has rated at least 20 movies.
• Simple demographic info for the users (age, gender, occupation, zip)

Other information about the data is given in the below image:

Fortunately, the data is available in the LightFM library for practice purposes. We can call it for further implementation.

Importing libraries and dataset:

import numpy as np
from lightfm.datasets import fetch_movielens
data = fetch_movielens()

Checking the dictionaries and their size in  the data:

for key, value in data.items():
print(key, value.shape)

Output:

Defining the train and test data for training and testing purposes:

train = data['train']
test = data['test']

Here the test and train consist of raw ratings where each row is a piece of user information and columns are item or movie information and the ratings between 1 to 5 are the entries.

Fitting the Model with BPR Loss

from lightfm import LightFM
model = LightFM(learning_rate=0.05, loss='bpr')
model.fit(train, epochs=10)

Output:

Next, We are using two metrics of accuracy: k(precision) and ROC AUC. These ranking metrics help in checking the accuracy of recommendations. To compute these metrics libraries’ algorithm will build the list of recommendations for every user and cross-check the ranking for movies that are already known as positive movies. The k will tell us whether the predictions made by the model are lying within the first k results on the list or not. And the AUC score is a measure that tells the probability that any known positive is in a higher place on the list than the randomly selected negative example.

from lightfm.evaluation import precision_at_k, auc_score
train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10, train_interactions=train).mean()
train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test, train_interactions=train).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

Output:

Fitting the Model with WARP Loss

As we have discussed, the WARP loss procedure maximizes the rank of positive feedback so the precision of the model using WARP is expected to be higher than the BPR loss model. we can implement the  WARP by just replacing the loss = “bpr” with loss = “warp”.

model = LightFM(learning_rate=0.05, loss='warp')
model.fit_partial(train, epochs=10)

Checking for the precession and AUC:

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10, train_interactions=train).mean()
train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test, train_interactions=train).mean()
print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

Output:

Here we can see that the precision in the WARP loss model is higher than the BPR loss model and also there is some increment in the AUC score.

Final Words

In this article, we discussed what hybrid recommendation systems are and how they are useful. These systems overcome the limitations of other recommendation filtering approaches, content-based and collaborative filtering. We also discussed what types of data can be used to build the recommendation systems and what are the different loss functions used by these systems. The most important part of this article is that we understood how to build hybrid recommendation systems and how to evaluate their performance. Hope this article will be helpful to any beginner who wants to learn the implementation of hybrid recommendation systems.

References

## More Great AIM Stories

### TensorFlow 2.5.0 Released: All Major Updates & Features

Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

##### MORE FROM AIM

LTI and Mindtree both play in Analytics services businesses, just like most other large IT/ITes service providers. But, what would the analytics services business of the merged entity look like?

##### GitHub now offers math support in markdown

GitHub’s math rendering capability uses MathJax; an open-source, JavaScript-based display engine.

Meta recently organised messaging event called ‘Conversations.’

##### Wipro announces 40,000 sq.ft. Innovation Studio in Texas

The studio will leverage Wipro’s deep reservoir of IPs, patents, and innovation DNA.

##### Google’s facial recognition tech to replace smart cards in Bengaluru metro trains￼

BMRCL plans to introduce the technology at its automatic fare collection gates.

##### Data science hiring process at DealShare

In the next few months, DealShare looks to grow its data science team by 15-20 members.

##### DeepMind’s AlphaFold 2 is half of the story

The idea was if I give you a sequence of amino acids, can you predict what will be the structure or the shape that it will take in the 3D space?

##### Lenskart invests USD 2 Mn in location intelligence platform GeoIQ

GeoIQ’s AI-based location tool will help Lenskart with its aggressive store rollout strategy.

##### TensorFlow v2.9 released: Major highlights

The main highlights of this release are performance enhancement with oneDNN and the release of a new API for model distribution, called DTensor