MITB Banner

Hands-On Guide To Recommendation System Using Collaborative Filtering

Recommendation systems expect to foresee clients' inclinations and predict the most likely product that the users are most likely to purchase and are of interest to them.
Recommendation System

Recommendation systems expect to foresee clients’ inclinations and predict the most likely product that the users are most likely to purchase and are of interest to them. Organizations utilizing recommendation frameworks centre around expanding deals because of exceptionally customized offers and an upgraded client experience. Netflix, Amazon, and so forth use recommender frameworks to assist their clients with recognizing the right item or films for them.

In this article, we will discuss the recommendation system with its types where we will cover the collaborative filtering method in detail with implementations.

Types Of Recommendation System

1. Collaborative Filtering

Collaborative filtering is used to find similar users or items and provide multiple ways to calculate rating based on ratings of similar users.

   User-Based: The system finds out the users who have rated various items in the same way. Suppose User A likes 1,2,3 and B likes 1,2 then the system will recommend movie 3 to B. 

   Item Based: Here, the system tries to find users who bought similar items. For example, A and B like movie 1 and 3 and C likes 3 then, the system will recommend movie 1 to user C.

2. Content Based Filtering

It works on the principle of similar content. If a user is watching a movie of one genre and rates it high, then the system will try to find movies of the same genre with good ratings and recommend it to the user.

In this article, we will cover the item-based collaborative filtering approach to recommend items to the user.

Code Implementation

The movie dataset can be downloaded from the following link.

Import all the libraries required for this project.

import pandas as pd
movies = pd.read_csv("movies.csv",encoding="Latin1")
Ratings = pd.read_csv("ratings.csv")
Tags = pd.read_csv("tags.csv",encoding="Latin1")
movies.head()

Now we need to merge the two dataset movies and ratings.

ratings = pd.merge(movies,Ratings).drop(['genres','timestamp'],axis=1)
print(ratings.shape)
ratings.head()
UserRatings = ratings.pivot_table(index=['userId'],columns=['title'],values='rating')
UserRatings.head()
print("Before: ",UserRatings.shape)
UserRatings = UserRatings.dropna(thresh=10, axis=1).fillna(0,axis=1)
#userRatings.fillna(0, inplace=True)
print("After: ",UserRatings.shape)

In the case-1 Suppose we measure the distance between the two points using euclidean distance. The calculated distance will be large. To overcome this problem there is a need to calculate the Angular distance between the points rather than the Euclidean distance. This approach to finding the similarity between users is called Cosine distance. Another approach is Pearson correlation which is a modified version of cosine distance but adjusted to subtract the means.

Let’s implement this using Pearson Correlation Approach.

corrMatrix = UserRatings.corr(method='pearson')
corrMatrix.head(10)
def get_similar(movie_name,rating):
    similar_ratings = corrMatrix[movie_name]*(rating-2.5)
    similar_ratings = similar_ratings.sort_values(ascending=False)
    #print(type(similar_ratings))
    return similar_ratings

Here, we calculate the Pearson correlation of all the romantic movies that are similar to movies: Reader, Alice in Wonderland.

romantic_movies= [("Reader, The (2008)",5),("Alice in Wonderland (2010)",3),("Aliens (1986)",1),("2001: A Space Odyssey (1968)",2)]
similar_movies = pd.DataFrame()
for movie,rating in romantic_movies:
    similar_movies = similar_movies.append(get_similar(movie,rating),ignore_index = True)
similar_movies.head(10)
similar_movies.sum().sort_values(ascending=False).head(20)

Let’s calculate the Pearson correlation of all the action movies that are similar to movies:Skyfall,Mission Impossible.

action_movies = [("Skyfall (2012)",5),("Mission: Impossible III (2006)",4),("Toy Story 3 (2010)",2),("2 Fast 2 Furious (Fast and the Furious 2, The) (2003)",4)]
similar_movies = pd.DataFrame()
for movie,rating in action_movies:
    similar_movies = similar_movies.append(get_similar(movie,rating),ignore_index = True)
similar_movies.head(10)
similar_movies.sum().sort_values(ascending=False).head(20)

Conclusion

I would conclude this article by stating that I hope you have got a basic idea of how item-based collaborative filtering of recommendation systems works. Further, we can research on user-based collaborative filtering, Hybrid model and content-based filtering approach. Now we can build our own recommendation system. Hope this article is useful to you.
The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Ankit Das

Ankit Das

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories