A guide to interpretable association rule mining using PyCaret

Making association rule mining interpretable and explainable plays an important role in decision making. In this article, we will discuss association rule mining and we will do a hands-on implementation of this technique using the PyCaret library.

Association rule mining is one of the major concepts in the field of data science that helps mainly in making marketing-related decisions and requires transactional data. Making this procedure interpretable and explainable plays an important role in decision making. In this article, we will discuss association rule mining and we will do a hands-on implementation of this technique using the PyCaret library. Using PyCaret for this task makes it more interpretable and explaining. The major points to be discussed in the article are listed below.

Table of contents 

  1. What is PyCaret?
  2. What is association rule mining?
  3. Module for association rule mining
  4. Dataset for association rule mining
  5. Data conversion
  6. Modelling association rules
  7. Visualizing association rule mining  

What is PyCaret?

PyCaret is one of the open-source libraries that provide machine learning solutions with the aim of low coding in modelling and hypothesis testing. This library can be utilized in a variety of end-to-end machine learning experiments. Its low coding feature makes the modelling procedure very efficient and low time-consuming. Also, one thing that is noticeable about the library is that the module designed under the library is faster than the manual models. 

With these all features, this library also provides several interactive visualizations of models and data that can also be used to make the machine learning procedure highly interpretable and explainable. In this article, we will discuss how we can perform association rule mining using the PyCaret library. We can install this library in the Google Colab environment using the following lines of code:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

!pip install pycaret

What is association rule mining?

Association rule mining is a rule-generating machine learning method where rules tell us about the strength of the relationship between variables in a large dataset. We mainly find usage of association rules in market basket analysis where a strong positive relation between two products makes the seller sell them together and earn more profit.  Even the name of this machine learning method explains what we are trying to do. We are finding association rules between variables from a large dataset. 

This method mostly intended to find strong rules from a large dataset or database by defining and using some measure of interestingness. For example, if {corn, cheese} → {pizza base} found in the rules that we are mining, will indicate that customers buying cheese and corn together are more likely to also buy pizza. Association rules mining helps in making decisions about marketing activities such as pricing or product placement. 

In this article, we are going to use the PyCaret library for association rule mining that has a special module for the procedure. Let’s take a look at the module.

Module for association rule mining

Pycaret has a pycaret.arules module for association rule mining that uses a supervised method of machine learning. This module can be utilized for finding relationship measures between the variables of the dataset. One of the interesting things about the module is that it automatically converts datasets with transaction values into the shape that is required for market basket analysis. Since PyCaret is specially designed for low code machine learning this algorithm also requires low code to design a better model.

Dataset for association rule mining

We mostly found the usage of association rule mining in market basket analysis. So in this article also we will use samples from the Online Retail Dataset. This dataset contains details of transactions that occurred between 01/12/2010 and 09/12/2011 in an online retail store. This dataset contains the following variables:

  • InvoiceNo
  • StockCode
  • Description
  • Quantity
  • InvoiceData
  • UnitPrice
  • CustomerID
  • Country

We can find the original dataset here. We will be using the dataset that PyCaret provides for practice, we can import the dataset using the following lines of codes.

from pycaret.datasets import get_data
data = get_data('france')

Output :

In this implementation, we are using the dataset of France only. In the output, we can see some of the values from the dataset. Now we are ready to implement our association rule mining project.

Data conversion

After calling the data we are required to import our association rule module and convert the data from transactional data to market basket data shape. We can do this using the following lines of codes.

from pycaret.arules import *
exp_arul101 = setup(data = data, 
                    transaction_id = 'InvoiceNo',
                    item_id = 'Description') 

Output:

Here in the output, we can see the unique number of transactions in our dataset that is the unique count of the invoice number and the unique number of items that we get using the Description column. Since we haven’t ignored any of the items we get no values.

Modelling association rules

We can simply instantiate a model using the following lines of codes.

model1 = create_model()

When we talk about the parameters of our choice we can define the following parameters in the model:

  • metric
  • threshold
  • min_support 
  • round

Let’s print the shape of the created rules and head.

Here in the output we can see the antecedents and consequents with their support, confidence, lift, leverage, and conviction values.

In the above step, we simply created a model. While converting the dataset we have seen an option to ignore items in the output, in the setup module we can define the ignore_item parameter to ignore any item from the list. This we can perform using the following lines of codes.

exp_arul101 = setup(data = data, 
                    transaction_id = 'InvoiceNo',
                    item_id = 'Description',
                    ignore_items = ['POSTAGE']) 

Output:

Here we can see that we have ignored the item POSTAGE. Let’s model this converted data to find the association rules. Let’s create and print details of our model while ignoring an item.

model2 = create_model()
print(model2.shape) 
model2.head()

Output:

Here we can see the difference between this output and the above output.

Visualizing association rule mining

The PyCaret library is famous because of one more thing that is interpretability and explainability of models. That means we can visualize our models and their results and understand them better. Let’s visualize our model. Before visualizing models in Google Colab we are required to enable Colab for Pycaret. This can be done using the following lines of codes.

from pycaret.utils import enable_colab
enable_colab()

Output:

Let’s plot the model.

plot_model(model2)

Output:

We can see that the visualization that we get is on plotly which means they are interactive. We are not able to post interactive visualizations here. In practice, you can interact with them.

We can also plot this visualization in three dimensions.

plot_model(model2, plot = '3d')

Output:

Here, the above output is also interactive and three-dimensional. You can find these visualizations in this notebook.

Final words

In this article, we have gone through the process that can be followed for implementing solutions based on association rule mining using the PyCaret library.  We found that using this python library we can perform this major and difficult task very efficiently and easily. 

References

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM