Recommendation systems (RS) are the systems that are used for recommendation purposes. These recommendations can be anything from articles, blogs to different products. Every person these days is fascinated by how a recommendation engine works? We see a product on an e-commerce website and start seeing products related to the same product we just saw. You search for one plus 7 phones and get recommended similar variants of phones. This all became possible because of recommendation engines that are providing personalized content to the end-user resulting in increasing sales of the business. Many companies these days are using recommendations for different purposes like Netflix uses RS to recommend movies, e-commerce websites use it for a product recommendation, etc.
Through this article, we will explore the core concepts of the recommendation system by building a recommendation engine that will be able to recommend 10 movies similar to the movie you are watching. We will make use of The Movies Data set that is publicly available on Kaggle.
What we will learn from this article?
- What is the Recommendation System?
- What is the Popularity Based and Content-Based Recommendation system?
- How similarity is computed in the Recommendation System?
- How to build your own movie recommendation system?
- What are Recommendation Systems
Recommendation systems are the systems that are used to gain more user attraction by understanding the user’s taste. These systems have now become popular because of their ability to provide personalized content to users that are of the user’s interest. These days millions of products are listed on e-commerce websites that make it impossible to find out a product of our desired choice. This is where these systems help us by quickly recommending us with the desired products. Also, Netflix suggests the same genre movies to us by understanding our interest/ choice of movies we like similarly Youtube recommends videos to us. There are many different recommendation engines that work backends to make it possible.
- What is Popularity Based and Content-Based Recommendation System?
Popularity Based recommendation systems are systems that do recommendation on the basis of popularity or trends. The best example of this system is Google News. There is an option of Top Stories as shown in the below image. The news that is displayed there is all related to news that is in trend or is currently popular. This is exactly how popularity based systems work; they recommend those things that are currently trending. Youtube trending videos are another example of such systems.
Content-Based Recommendations systems are the systems that look for similarity before recommending something. We all have seen whenever we are looking for a movie or web series on Netflix, we get the same genre movie recommended by Netflix. But how does this work? How does Netflix compute what I like? This is all done through content-based systems. The similarity of different movies is computed to the one you are currently watching and all the similar movies are recommended to us. In the case of e-commerce website similarity in terms of products is calculated. Considering I am looking for a MacBook then the website will look for all similar products that are similar to MacBook and straight away will recommend us.
- How is the Similarity Computed between the different products?
The similarity is the main key fundamental in the case of content-based recommendation systems. A most similar thing to what we are currently watching gets recommended to us. The question is how? Let us understand how similarity between different products is computed. There are different techniques or similarity measures that are used to compute the similarity. Let us understand them:-
Euclidean Distance:- This distance metric is used when we have numeric data. For example, if I want to compute the similarity between One plus 6 and other one plus variants based on ram and camera. The values of ram and camera for each variant would be in numbers. In these cases, we calculate Euclidean distance if the results of this distance come out to be 0 then both the two are considered to be similar whereas if the distance is anything other than 0 then are not similar.
Cosine Similarity:- This type of metric is used to compute the similarity textual data. Consider an example where we have to find similar news or similar movies. How is it done? We convert these textual data in the form of vectors and check for cosine angle between those two vectors if the angle between them is 0. It means they are similar or else they are not. Most used similarity measures when we talk about the similarity between any textual content. There are other different metrics as well like Jaccard Similarity that is used when we have categorical data.
- How To Make Your Own Movie Recommendation System?
We have now seen the different metrics that are used for computing similarity between the products/ movies. We will now build our own recommendation system that will recommend movies that are of interest and choice. First, we need to define the required library and import the data. Let’s import it and explore the movie’s data set. Use the below code to do the same.
import pandas as pd
df = pd.read_csv('movies.csv')
We have around 24 columns in the data set that have 45466 rows. We will only be working with 2 columns for now that are ‘original_title’ that is the movie title and ‘overview’ that describes the movie. We can use other numeric columns also. Use the below code to create a new data frame with these two columns.
df = df[['original_title','overview']]
We will now check whether there are any missing values and remove them. Use the below code to do the same. There were many missing values present in the data. We have removed all those rows.
Now we will transform the overview column in the vector form so that we can compute similarity. Use the below code to convert it. We have used TFidfVectorizer for the same.
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
matrix = tf.fit_transform(df['overview'])
Now we are ready to compute cosine similarity to check what all movies are of the same content on the basis of the overview column that was present in the data set. Use the below code to do so.
from sklearn.metrics.pairwise import linear_kernel
cosine_similarities = linear_kernel(matrix,matrix)
After this, we will reset the index with the movie name that is the original title and will define a function for the recommendation that will search for similar movies by checking cosine similarities and will return us. Use the below code to do so.
movie_title = df['original_title']
indices = pd.Series(df.index, index=df['original_title'])
idx = indices[original_title]
sim_scores = list(enumerate(cosine_similarities[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x, reverse=True)
sim_scores = sim_scores[1:31]
movie_indices = [i for i in sim_scores]
Now we will compute top 10 recommendations for 3 different movies and check the results. Use the below code to the same.
movie_recommend('Waiting to Exhale').head(10)
I would conclude this article by stating that I hope you have got a basic idea of how this recommendation system works. We have discussed mainly two recommendation systems that were popularity based and content based whereas there are several other systems that are used for recommendation purposes like Collaborative filtering, Hybrid models, also neural networks based approaches. Recommendation systems are very effective systems that are tremendous. People are trying to implement recommendations in various different applications. Now you can also start building your own recommendation system. Maybe a Job recommendation system that will recommend your jobs on the basis of your profile by using a similar approach that is discussed above. Check this article for more on building a recommendation system.