MITB Banner

Watch More

Collaborative Filtering Vs Content-Based Filtering for Recommender Systems

Recommender systems are information filtering systems that help deal with the problem of information overload by filtering and segregating information and creating fragments out of large amounts of dynamically generated information according to user’s preferences, interests, or observed behavior about a particular item or items. A Recommender system has the ability to predict whether a particular user would prefer an item or not based on the user’s profile and its historical information.
Listen to this story

The Internet is the new digital market, where it presents us with a number of choices, sometimes too overwhelming to choose from. Today everything we need or want to buy can easily be accessible to us through this new digital market. From content in entertainment to groceries and clothes, every basic necessity or luxury is available at the tip of our hands. With a plethora of options available, there always has been a serious need to filter, prioritize and efficiently deliver relevant information in order to overcome the potential problem of information overload, creating confusion to many Internet users. Recommender systems can help solve such a problem by processing through a large volume of dynamically generated information to provide the users with personalized content and services which in turn allows better communication and understanding between the users and the parent organization. 

What Is a Recommender System?

Recommender systems are information filtering systems that help deal with the problem of information overload by filtering and segregating information and creating fragments out of large amounts of dynamically generated information according to user’s preferences, interests, or observed behavior about a particular item or items. A Recommender system has the ability to predict whether a particular user would prefer an item or not based on the user’s profile and its historical information. Recommendation systems have also proved to improve the decision making processes and quality. In large e-commerce settings, recommender systems enhance the revenues for marketing, for the fact that they are effective means of selling more products. In scientific libraries, recommender systems support and allow users to move beyond the generic catalogue searches. Therefore, the need to use efficient and accurate recommendation techniques within a system that provides relevant and dependable recommendations for users cannot be neglected. 

Conglomerates like Netflix use a recommendation engine to present their viewers with movie and show suggestions. Amazon, on the other hand, uses its recommendation engine to present customers with product recommendations. While each uses the one for slightly different purposes, both in general have the same goal: to drive sales, boost engagement and customer retention, and deliver more personalized customer experiences. Recommendations typically speed up the searches and make it easier for users to access the content they have always been interested in, and surprise them with several offers they would have never searched for. Doing companies are able to gain new customers by sending out customized emails with links to new offers that meet the recipients’ interests, or suggestions of films and TV shows that suit their particular profiles.

Image Source

Approaches to build Recommender Systems

There are two main types of recommendation engines; namely collaborative filtering and content-based filtering.

Collaborative filtering 

The Collaborative filtering method for recommender systems is a method that is solely based on the past interactions that have been recorded between users and items, in order to produce new recommendations. Collaborative Filtering tends to find what similar users would like and the recommendations to be provided and in order to classify the users into clusters of similar types and recommend each user according to the preference of its cluster. The main idea that governs the collaborative methods is that through past user-item interactions when processed through the system, it becomes sufficient to detect similar users or similar items to make predictions based on these estimated facts and insights. 

Such memory-based approaches directly work with the values of recorded interactions or data and are essentially core based on nearest neighbours search, i.e finding the closest users from a user of interest and suggest the most popular items among these neighbours. The created model approaches assuming there is an underlying “generative” insight that explains the user-item interactions and tries to discover it in order to make new predictions. It recommends an item to user A based on the interests of a similar user B. Furthermore, the embeddings can be learned automatically, without relying on hand-engineering of features. The collaborative filtering method does not need the features of the items to be given. Every user and item is described by a feature vector or embedding. 

The standard method used by Collaborative Filtering is known as the Nearest Neighborhood algorithm. There are several types of filtering such as user-based and Item-based Collaborative Filtering. Considering an example of User-based Collaborative Filtering, If we have an n × m matrix of ratings, with user u, i = 1, …n, and item p, j=1, …m. and we want to predict the rating r if the target user i did not watch/rate an item j. The process is to calculate the similarities between target user i and all other users will be to select the top X similar users and take the weighted average of ratings from these X users with similarities as weights. 

Challenges with Collaborative Filtering

The only issue with this method is that the prediction of the model for a given user, item pair is the dot product of the corresponding embeddings. So, if an item is not seen during training, the system cannot generally create an embedding for it and hence cannot query the model with this item. This issue is known as the cold-start problem.

Content-Based Filtering 

The content-based approach uses additional information about users and/or items. This filtering method uses item features to recommend other items similar to what the user likes and also based on their previous actions or explicit feedback. If we consider the example for a movies recommender system, the additional information can be, the age, the sex, the job or any other personal information for users as well as the category, the main actors, the duration or other characteristics for the movies i.e the items. 

The main idea of content-based methods is to try to build a model, based on the available “features”, that explain the observed user-item interactions. Still considering users and movies, we can also create the model in such a way that it could provide us with an insight into why so is happening. Such a model helps us in making new predictions for a user pretty easily, with just a look at the profile of this user and based on its information, to determine relevant movies to suggest. 

We can make use of a Utility Matrix for Content-Based Methods. A Utility Matrix can help signify the user’s preference for certain items. With the data gathered from the user, we can find a relation between the items which are liked by the user as well as those which are disliked, for this purpose the utility matrix can be put to best use. We assign a particular value to each user-item pair, this value is known as the degree of preference and a matrix of the user is drawn with the respective items to identify their preference relationship. 

Challenges faced with Content-based filtering

Content-based methods seem to suffer far less from the cold start problem than collaborative approaches because new users or items can be described by their characteristics i.e the content and so relevant suggestions can be done for these new entities. Only new users or items with previously unseen features will logically suffer from this drawback, but once the system is trained enough, this has little to no chance to happen. Basically, it hypothesizes that if a user was interested in an item in the past, they will once again be interested in the same thing in the future. Similar items are usually grouped based on their features. User profiles are constructed using historical interactions or by explicitly asking users about their interests. There are other systems, not considered purely content-based, which utilize user personal and social data.

An Example For Item Based Filtering 

Below is an example of Item Based Content Filtering where a movie recommendation system recommends movies based on user ratings and sorts recommendations according to it.

This can be easily performed using pandas and the MovieLens Library. We are using the data and the item based files, which you can access and download using the link here

Creating a pivot table function on such a DataFrame will help us construct a user/movie rating matrix. 

The Pandas’ corrwith can be used here, the function makes it really easy to compute the pairwise correlation of the vector of user rating with every other movie! Any results that have no data can be dropped, a new DataFrame of movies and their correlation score (similarity) to one Movie, in particular, can be constructed.

To get the correct status of the recommendation, movies rated by fewer than 100 people should be avoided in such cases,

The resultant product will be a Dataframe with a correlated similarity matrix that will define our results!

Collaborative Filtering Vs Content-Based Filtering

Here is a list of points that differentiate Collaborative Filtering and Content-Based Filtering from each other :

  • The Content-based approach requires a good amount of information about items’ features, rather than using the user’s interactions and feedback. They can be movie attributes such as genre, year, director, actor etc. or textual content of articles that can be extracted by applying Natural Language Processing. Collaborative Filtering, on the other hand, doesn’t need anything else except the user’s historical preference on a set of items to recommend from, and because it is based on historical data, the core assumption made is that the users who have agreed in the past will also tend to agree in the future.  
  • Domain knowledge in the case of Collaborative Filtering is not necessary because the embeddings are automatically learned, but in the case of a Content-based approach, since the feature representation of the items is hand-engineered to an extent, this technique requires a lot of domain knowledge to be fed with.  
  • The collaborative filtering model can help users discover new interests and although the ML system might not know the user’s interest in a given item, the model might still recommend it because similar users are interested in that item. On the other hand, A Content-based model can only make recommendations based on the existing interests of the user and the model hence only has limited ability to expand on the users’ existing interests. 
  • A Content-Based filtering model does not need any data about other users, since the recommendations are specific to a particular user. This makes it easier to scale down the same to a large number of users. A similar cannot be said or done for Collaborative Filtering Methods. 
  • The collaborative algorithm uses only user behavior for recommending items while for Content-based filtering we have to know the content of both user and item.

Conclusion

In this article we understood how the Recommendation System works and the difference between the Collaborative Filtering vs Content-Based Filtering models and their working. Both methodologies have their own set of advantages, disadvantages and similarly particular use cases which we tried to explore and discuss. We also saw a small example for Item Based Content Filtering, you can find the whole implementation in a Colab Notebook using the link here. 

Happy Learning!

References 

Collaborative Filtering vs Content-Based Filtering paper

Basics of Filtering Methods

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Victor Dey

Victor Dey

Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories