Hands-On Guide To Market Basket Analysis With Python Codes

In this article, we will discuss the association rule learning method with a practical implementation of market basket analysis in python. We will use the Apriori algorithm as an association rule method for market basket analysis.
market basket analysis using association rule learning

Machine learning is helping the retail industry in many ways. From forecasting the sales performance to identifying the prospective buyers, there are a lot of applications of machine learning in the retail industry. Market basket analysis is one of the key applications of machine learning in retail. By analysing the past buying behaviour of customers, one can find out which are the products that are bought together by the customers.

For example, bread and butter are sold together, baby diapers and baby massage oil are sold together, etc. That means one can analyze the association among products. If the retails management can find this association, while placing the products in the shop, these associated products can be put together. Or, when seeing that a customer is buying a product, the salesman can offer the associated product to the customer. 

This process of analyzing the association is called the Association Rule Learning and analyzing the products bought together by the customers is called the Market Basket Analysis. In this article, we will discuss the association rule learning method with a practical implementation of market basket analysis in python. We will use the Apriori algorithm as an association rule method for market basket analysis.

What is Association Rule Learning?

The association rule learning is a rule-based machine learning approach that generates the relationship between variables in a dataset. It has major applications in the retail industry including E-Commerce retail businesses. Using this strategy, the products sold in an association can be explored and can be offered to customers to buy together. For example, it can be discovered that if the customers have bought onion and potato together, then most likely they have bought tomato also. It can be given a rule in the form of  {onion, potato} -> tomato. These rules are called association rules. The Association Learning methods discover these types of rules from the dataset comprising the list of transactions. 

The association rule learning has three popular algorithms – Apriori, Eclat, and FP-Growth. In this article, we will discuss the Apriori method of association learning.

Apriori Algorithm in Market Basket Analysis

Apriori is a popular algorithm used in market basket analysis. This algorithm is used with relational databases for frequent itemset mining and association rule learning. It uses a bottom-up approach where frequent items are extended one item at a time and groups of candidates are tested against the available dataset. This process continues until no further extensions are found. It uses the concept of Support, Confidence and Lift where,

The steps of working of the apriori algorithm can be given as:-

  1. Define the minimum support and confidence for the association rule
  2. Take all the subsets in the transactions with higher support than the minimum support
  3. Take all the rules of these subsets with higher confidence than minimum confidence
  4. Sort the association rules in the decreasing order of lift. 
  5. Visualize the rules along with confidence and support.

The Dataset

In this implementation, we have used the Market Basket Optimization dataset that is publicly available on Kaggle. This dataset comprises the list of transactions of a retail company over the period of one week. It contains a total of 7501 transaction records where each record consists of the list of items sold in one transaction. Using this record of transactions and items in each transaction, we will find the association rules between items.

Market Basket Analysis using the Apriori method

We need to import the required libraries. Python provides the apyori as an API which needs to be imported to run the apriori algorithm. 

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori

Now we are reading the dataset that is downloaded from Kaggle. As there is no header in the dataset and the first row contains the first transaction, that is why we have mentioned header = None here.

# Data Preprocessing
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)

market basket analysis dataset

Once we have read the dataset, we need to get the list of items in each transaction. SO we will run two loops here. One for the total number of transactions, and other for the total number of columns in each transaction. This list will work as a training set from where we can generate the list of association rules.

#Getting the list of transactions from the dataset
transactions = []
for i in range(0, 7501):
   transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

market basket analysis list of items

Now once we are ready with the list of items in our training set, we need to run the apriori algorithm which will learn the list of association rules from the training set. Suppose we want to find the association of items with a product which is sold at least 3 times a day. So, the minimum support here will be 3 items per day multiplied by 7 days of weak and divided by the total number of transactions. That means (3*7)/7501 =  0.00279. So the equivalent 0.003 is taken here as support. Now let us we are looking for a 30% confidence in the association rule so we have kept 0.3 as the minimum confidence. The minimum lift is taken as 3 and the minimum length is considered as 2 because we want to find an association between a minimum of two items. These hyperparameters can be tuned depending on the business requirements. 

# Training Apriori algorithm on the dataset
rule_list = apriori(transactions, min_support = 0.003, min_confidence = 0.3, min_lift = 3, min_length = 2)

After executing the above line of code, we have generated the list of association rules between the items of the retail. To see these rules, the below line of code needs to be executed.

# Visualizing the list of rules
results = list(rule_list)
for i in results:

The list of rules can be seen in the below screenshot.

market basket analysis association rules

As we can see in the above output screenshot, there are rules generated along with confidence. The first rule indicates an association between mushroom cream sauce and escalope with a confidence of 30%. The next rule shows an association between escalope and pasta with a confidence of 37.28%. There are 102 rules generated in this experiment. The number of generated rules depends on the values of hyperparameters. We can increase the minimum confidence value and find the rules accordingly.

So, this is a way of market basket analysis association rule learning. In this experiment, we have used the apriori algorithms. We can also use other algorithms such as Eclat and FP-Growth for the same purpose. 

Download our Mobile App

Dr. Vaibhav Kumar
Dr. Vaibhav Kumar is a seasoned data science professional with great exposure to machine learning and deep learning. He has good exposure to research, where he has published several research papers in reputed international journals and presented papers at reputed international conferences. He has worked across industry and academia and has led many research and development projects in AI and machine learning. Along with his current role, he has also been associated with many reputed research labs and universities where he contributes as visiting researcher and professor.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.