Table of contents
Product recommendation in Machine Learning refers to the task of recommending product(s) to a customer based on his purchase history. A product recommender system is an ML model which suggests some items, content or services that a specific user would like to buy or indulge in. Here, we have used Amazon’s product co-purchasing network dataset to create a C# .NET Core console application which works as a product recommender system.
In our previous articles, we have already covered the basics of ML.NET and implementation of an image classifier using the framework. Let us move forward to another use case of ML.NET i.e. product recommendation.
Two types of recommendation systems
Current recommendation systems can be broadly divided into two categories:
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
- Content-based filters
Such filters use information/features related to the products themselves rather than using users’ preferences. For instance, using movies’ genre, star cast, year of release, duration and so on as features to recommend movies to the viewers.
- Collaborative filters
Unlike content-based ones, these filters take users’ choices and feedback into consideration. Recommending movies to a viewer based on the historical data of ratings given by different viewers to different movies is an example of collaborative filtering.
Prerequisites
- Use Visual Studio 2019 or higher version
- Or use Visual Studio 2017 version 15.6 or higher with the .NET Core cross-platform development workload installed
Algorithm used
ML.NET uses collaborative filtering methods for building recommendation systems. It does so by providing an algorithm called Matrix Factorization (MF) which you can implement using the MatrixFactorizationTrainer class.
Visit this page to understand what Matrix Factorization is and how it works.
Dataset used
Amazon’s dataset used here consists of product IDs of various articles and that of the corresponding co-purchased product. It originally comes from the Stanford Network Analysis Platform (SNAP). The data is based on the Amazon website’s well-known feature called ‘Customers Who Bought This Item Also Bought’.
Visit this page of SNAP where you will find the dataset or click here to download the file directly.
Implementation steps
Create a C# .NET Core console application. Then install the Microsoft.ML NuGet Package. Click here for its installation.
Open the Program.cs file and replace the ‘using’ statements with the following ones:
using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;
Path definitions
Define the path locations of your dataset and model.
private static string DatasetPath = @”../../../Data”; //Relative path of the dataset private static string TrainDataRelPath = $”{DatasetPath}/Amazon-302.txt”; //Absolute location of the dataset private static string TrainDataAbsPath = GetAbsolutePath(TrainDataPath); private static string Model = @”../../../Model”; //Relative path of the model private static string ModelRelPath = $”{Model}/model.zip”; //Absolute location of the model private static string ModelAbsPath = GetAbsolutePath(ModelRelPath);
where, GetAbsolutePath() function is defined as follows:
public static string GetAbsolutePath(string relativePath) { FileInfo root = new FileInfo(typeof(Program).Assembly.Location; string FolderPath = root.Directory.FullName; string fullPath = Path.Combine(FolderPath, relativePath)’ return fullPath; }
Click here to understand the FileInfo class.
Context creation
Inside the main() method, instantiate MLContext class.
MLContext myContext = new MLContext();
The ‘myContext’ object will be shared across all the objects involved in the model creation workflow.
Data loading
Replace amazon0302.txt with the dataset from https://snap.stanford.edu/data/amazon0302.html
Change the column’s names so that the dataset looks as follows:
ProductID CoPurchaseProductID 0 1 0 20 1 32
Read the trained data using TextLoader by defining the schema for reading the product co-purchase dataset
var trainData = myContext.Data.LoadFromTextFile(path:TrainDataAbsPath, //define the schema columns: new[] { //column for target label new TextLoader.Column("Label", DataKind.Single, 0), //column for ProductID new TextLoader.Column(name:nameof(ProductEntry.ProductID), dataKind:DataKind.UInt32, source: new [] { new TextLoader.Range(0) }, keyCount: new KeyCount(262111)), //column for CoPurchasedProductID new TextLoader.Column (name:nameof (ProductEntry.CoPurchaseProductID), dataKind:DataKind.UInt32, source: new [] { new TextLoader.Range(1) }, keyCount: new KeyCount(262111)) }, hasHeader: true, separatorChar: '\t');
Among the parameters of TextLoader.Column(), ‘dataKind’ refers to the data type of items in the column, ‘source’ defines source index ranges of the column and ‘keyCount’ means a range of values in the key column.
Click here to know more about the TextLoader.Column class.
Define the model training pipeline
As the data is already in encoded form, we need not specify the required options of the MatrxiFactorizationTrainer; only optional ones and a few extra hyperparameters need to be specified.
MatrixFactorizationTrainer.Options opt = new MatrixFactorizationTrainer.Options(); options.MatrixColumnIndexColumnName = nameof(ProductEntry.ProductID); options.MatrixRowIndexColumnName = nameof(ProductEntry.CoPurchaseProductID); options.LabelColumnName= "Label"; options.LossFunction = MatrixFactorizationTrainer.LossFunctionType.SquareLossOneClass; //hyperparameters options.Alpha = 0.01; options.Lambda = 0.025; options.K = 100; options.C = 0.00001;
Pass the options to the MatrixFactorization trainer
var estimator = myContext.Recommendation().Trainers.MatrixFactorization(opt);
Train the estimator on the training data
ITransformer model = estimator.Fit(trainData);
Use the model for predictions
Define two classes to be fed as input to the prediction engine.
public class Copurchase_prediction { //predicted score for co-purchased product public float Score { get; set; } } public class ProductEntry { [KeyType(count : 262111)] public uint ProductID { get; set; } [KeyType(count : 262111)] public uint CoPurchaseProductID { get; set; } }
Create Prediction Engine
var predeng = myContext.Model.CreatePredictionEngine<ProductEntry, Copurchase_prediction>(model);
Using the product engine, predict score for product #50 being the co-purchased product of product #2
var pred = predeng.Predict( new ProductEntry() { ProductID = 2, CoPurchaseProductID = 50 });
Run the console application.
Output interpretation
The output score of matrix factorization trainer is a numerical representation of the likelihood of one product being bought together with the other product. There is no probability information directly indicated in the result. It is considered that the higher the score value, the higher is the probability. For a given product, scores of multiple other products are computed and the one with the highest score is recommended as the co-purchased one.
- Refer to the GitHub repository and dive deeper into such an interesting use case of ML.NET – product recommendation!