Now Reading
Hands-On Guide To Market Basket Analysis With Python Codes

Hands-On Guide To Market Basket Analysis With Python Codes

Dr. Vaibhav Kumar
market basket analysis using association rule learning

Machine learning is helping the retail industry in many ways. From forecasting the sales performance to identifying the prospective buyers, there are a lot of applications of machine learning in the retail industry. Market basket analysis is one of the key applications of machine learning in retail. By analysing the past buying behaviour of customers, one can find out which are the products that are bought together by the customers. For example, bread and butter are sold together, baby diapers and baby massage oil are sold together, etc. That means one can analyze the association among products. If the retails management can find this association, while placing the products in the shop, these associated products can be put together. Or, when seeing that a customer is buying a product, the salesman can offer the associated product to the customer. 

This process of analyzing the association is called the Association Rule Learning and analyzing the products bought together by the customers is called the Market Basket Analysis. In this article, we will discuss the association rule learning method with a practical implementation of market basket analysis in python. We will use the Apriori algorithm as an association rule method for market basket analysis.

What is Association Rule Learning?

The association rule learning is a rule-based machine learning approach that generates the relationship between variables in a dataset. It has major applications in the retail industry including E-Commerce retail businesses. Using this strategy, the products sold in an association can be explored and can be offered to customers to buy together. For example, it can be discovered that if the customers have bought onion and potato together, then most likely they have bought tomato also. It can be given a rule in the form of  {onion, potato} -> tomato. These rules are called association rules. The Association Learning methods discover these types of rules from the dataset comprising the list of transactions. 



The association rule learning has three popular algorithms – Apriori, Eclat, and FP-Growth. In this article, we will discuss the Apriori method of association learning.

Apriori Algorithm in Market Basket Analysis

Apriori is a popular algorithm used in market basket analysis. This algorithm is used with relational databases for frequent itemset mining and association rule learning. It uses a bottom-up approach where frequent items are extended one item at a time and groups of candidates are tested against the available dataset. This process continues until no further extensions are found. It uses the concept of Support, Confidence and Lift where,



The steps of working of the apriori algorithm can be given as:-

  1. Define the minimum support and confidence for the association rule
  2. Take all the subsets in the transactions with higher support than the minimum support
  3. Take all the rules of these subsets with higher confidence than minimum confidence
  4. Sort the association rules in the decreasing order of lift. 
  5. Visualize the rules along with confidence and support.

The Dataset

In this implementation, we have used the Market Basket Optimization dataset that is publicly available on Kaggle. This dataset comprises the list of transactions of a retail company over the period of one week. It contains a total of 7501 transaction records where each record consists of the list of items sold in one transaction. Using this record of transactions and items in each transaction, we will find the association rules between items.

Market Basket Analysis using the Apriori method

We need to import the required libraries. Python provides the apyori as an API which needs to be imported to run the apriori algorithm. 

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

Now we are reading the dataset that is downloaded from Kaggle. As there is no header in the dataset and the first row contains the first transaction, that is why we have mentioned header = None here.

# Data Preprocessing
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)

market basket analysis dataset

Once we have read the dataset, we need to get the list of items in each transaction. SO we will run two loops here. One for the total number of transactions, and other for the total number of columns in each transaction. This list will work as a training set from where we can generate the list of association rules.

#Getting the list of transactions from the dataset
transactions = []
for i in range(0, 7501):
   transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

market basket analysis list of items

Now once we are ready with the list of items in our training set, we need to run the apriori algorithm which will learn the list of association rules from the training set. Suppose we want to find the association of items with a product which is sold at least 3 times a day. So, the minimum support here will be 3 items per day multiplied by 7 days of weak and divided by the total number of transactions. That means (3*7)/7501 =  0.00279. So the equivalent 0.003 is taken here as support. Now let us we are looking for a 30% confidence in the association rule so we have kept 0.3 as the minimum confidence. The minimum lift is taken as 3 and the minimum length is considered as 2 because we want to find an association between a minimum of two items. These hyperparameters can be tuned depending on the business requirements. 

See Also

# Training Apriori algorithm on the dataset
rule_list = apriori(transactions, min_support = 0.003, min_confidence = 0.3, min_lift = 3, min_length = 2)

After executing the above line of code, we have generated the list of association rules between the items of the retail. To see these rules, the below line of code needs to be executed.

# Visualizing the list of rules
results = list(rule_list)
for i in results:
   print('\n')
   print(i)
   print('**********') 

The list of rules can be seen in the below screenshot.

market basket analysis association rules

As we can see in the above output screenshot, there are rules generated along with confidence. The first rule indicates an association between mushroom cream sauce and escalope with a confidence of 30%. The next rule shows an association between escalope and pasta with a confidence of 37.28%. There are 102 rules generated in this experiment. The number of generated rules depends on the values of hyperparameters. We can increase the minimum confidence value and find the rules accordingly.

So, this is a way of market basket analysis association rule learning. In this experiment, we have used the apriori algorithms. We can also use other algorithms such as Eclat and FP-Growth for the same purpose. 

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top