Building a recommender system from scratch is a tedious task as it involves a lot of preprocessing steps and requires sophisticated coding skills. There are plenty of open-source toolkits available which give a state-of-the-art performance for a variety of recommendation systems. In contrast to the low code base, in this post, we will see how to build a state-of-the-art recommendation system using the Python Scikit package called Surprise. Before directly jumping to the implementation with surprise, first, we will see the context of the recommendation system and packages that can be used to build the system. The major points to be covered in this article are outlined below.
Table of Contents
- Recommenders System
- Approaches of Recommender System
- Open Source Packages for Recommender Systems
- The Surprise Package
- Implementing Recommender Systems with Surprise
First of all, we will quickly understand the recommender system with its benefits and approaches.
Recommender systems are computer programs that make recommendations to users based on a variety of parameters. These systems forecast the most likely product that users will buy and that they will be interested in. Netflix, Amazon, and other companies employ recommender systems to assist their users to find the right product or movie for them.
Sign up for your weekly dose of what's up in emerging technology.
The recommender system filters a large volume of data by focusing on the most significant information based on the information provided by the user as well as other factors such as the user’s preferences and interests. In order to offer recommendations, it determines the compatibility of the user and the object, as well as the similarities between users and products.
Recommender systems in use include playlist generators for video and music services, product recommenders for online businesses, content recommenders for social media platforms, and open web content recommenders. Within and across platforms, these systems can work with a single input, such as music, or multiple inputs, such as news, books, and search queries.
It has the following benefits:
- Users benefit from being able to find items that are of interest to them.
- Assist item providers in getting their products to the correct people.
- Users will be able to identify products that are most relevant to them.
- Content that is tailored to the individual.
- Assist websites in increasing user engagement.
Approaches of Recommenders System
Collaborative filtering is a popular method for the creation of recommender systems. Collaborative filtering is predicated on the premise that people who agreed in the past will agree in the future, and that they will prefer comparable types of goods in the past. The technology creates suggestions based solely on rating profiles for various persons or things. They generate recommendations utilizing this neighbourhood by seeking peer users/items with rating histories similar to the current user or item.
Another method that is widely utilized when building recommender systems is content-based filtering. The description of an item and a profile of the user’s preferences are used in content-based filtering systems. When there is known data about an item (name, location, description, etc.) but not on the user, these strategies perform well. Content-based recommenders treat suggestions as a user-specific classification issue, developing a classifier for a user’s likes and dislikes based on an item’s properties.
Session-Based Recommender System
The interactions of a user during a session are used to produce recommendations in these recommender systems. Youtube and Amazon both utilize session-based recommender systems. When a user’s history (such as previous clicks or transactions) is not available or relevant in the current user session, they are particularly valuable. Video, e-commerce, travel, music, and other domains are all examples of when session-based suggestions are useful. Most session-based recommender systems rely on the sequence of recent interactions inside a session without requiring any further information about the user (history, demographics).
Multi-Criteria Recommender System
Multi-criteria recommender systems (MCRS) are recommender systems that take into account many factors when making recommendations. Rather than developing recommendation techniques based on a single criterion value, such as user u’s overall preference for an item I, these systems attempt to predict a rating for unexplored items of u by leveraging preference information on multiple criteria that influence this overall preference value. Several researchers see MCRS as a multi-criteria decision-making (MCDM) problem and construct MCRS systems using MCDM approaches and techniques.
Open Source Packages for Recommender Systems
Let us have a look at the top python packages that are used for building a recommenders system by the community and researchers.
LensKit is a free and open-source framework for developing, investigating, and learning about recommender systems. It supports developing, running, and assessing recommender algorithms in a flexible manner appropriate for research and education. LensKit for Python (LKPY) is the Python-based successor of the Java-based LensKit toolkit and a component of the LensKit project. LKPY allows creating robust, adaptable, and reproducible experiments that leverage the broad and developing PyData and Scientific Python ecosystems, such as scikit-learn and TensorFlow.
Crab is a Python recommender engine that combines classic information filtering recommendation methods into a variety of scientific Python libraries, including Numpy, Scipy, and Matplotlib. It’s also known as Scikits recommender, and it seeks to give a comprehensive set of components from which one may build a personalised recommender system from a set of algorithms that can be utilized in a variety of situations. User-based filtering, item-based filtering, and other capabilities are available in Crab.
TensorRec is a Python recommendation system that lets you quickly create and customize recommendation systems using TensorFlow. User features, item features, and interactions are the three types of data that a TensorRec system consumes. It learns to produce and rank recommendations using this data. TensorRec learns by comparing the scores it generates to real-world interactions between users and items, such as likes and dislikes.
To have more details about similar packages you can follow this post.
The Surprise Package
Surprise is a Python module that allows you to create and test rate prediction systems. It was created to closely resemble the scikit-learn API, which users familiar with the Python machine learning ecosystem should be comfortable with. Surprise includes a set of estimators (or prediction algorithms) for evaluating predictions. Classic techniques, such as the main similarity-based algorithms, as well as matrix factorization algorithms like SVD and NMF, are implemented.
It also includes tools for model evaluation, such as cross-validation iterators and scikit-built-in learned metrics, as well as grid search and randomized search for model selection and automatic hyper-parameter search. Users can develop their own recommendation technique with fewer codes thanks to basic primitives and a light API.
Traditional datasets, such as the MovieLens datasets, are immediately available in the package, but user-defined datasets can be loaded as CSV files or used with pandas data frames. Surprise is primarily written in Python, with Cython being used to optimize the computationally heavy bits. Internally, Surprise uses NumPy arrays and built-in Python data structures (mostly dictionaries).
The surprise was created to help researchers quickly test novel recommendation ideas by allowing them to create bespoke prediction algorithms, but it can also be used as a learning resource for students and less experienced users due to its extensive documentation.
Implementing Recommender Systems with Surprise
Here we will look at a quick example of how to download a dataset, split it into four folds for cross-validation, and compute the SVD algorithm’s Mean Average Error (MAE) and Root Mean Squared Error (RMSE).
! pip install surprise from surprise import SVD from surprise import Dataset from surprise.model_selection import cross_validate # load the data data = Dataset.load_builtin('ml-100k') # load algorithm algo =SVD() # train and validate cross_validate(algo, data, measures=['RMSE','MAE'], cv=5, verbose=True)
If the movielens-100k dataset has not already been downloaded, the load_builtin() method will offer to download it and save it in the .surprise data folder in your home directory (you can also choose to save it somewhere else).
We’re using the well-known SVD algorithm here, although there are plenty of alternative options. For further information, see Using prediction algorithms. The cross-validate() function computes several accuracy metrics and executes a cross-validation procedure according to the cv argument. We’re using a traditional 5-fold cross-validation method here, although other iterators can be utilized.
Through this article, we have seen what exactly is known as the recommenders system and what are the different approaches that are taken based on the type of system needed. Apart from the approaches, we have seen the common and widely used Python toolkit to build a SOTA system that facilitates developers to have a low code base. Lastly, we have seen a similar kind of Toolkit called Surprise built on top of sci-kit learn which gives us an easy approach and allows us to use almost all functionality given by the sci-kit learn package.