Now Reading
Beginner’s Guide To Understanding Apriori Algorithm With Implementation In Python


Beginner’s Guide To Understanding Apriori Algorithm With Implementation In Python


To exploit data, one must know what exactly to look for in the data. Data in this generation makes for an invaluable tool for business, especially in the marketing and advertising sectors.



In this article, we will talk about Apriori Algorithm which is one of the most popular algorithms in Association Rule Learning. So before we dig deep into Apriori, let's try to understand what Association Rule Learning means.

What is Association Rule Learning?

Association Rule Learning has the most popular applications of Machine Learning in business. It has been widely used to understand and test various business and marketing strategies to increase sales and productivity by various organizations including supermarket chains and online marketplaces.

Association Rule Learning is rule-based learning for identifying the association between different variables in a database. One of the best and most popular examples of Association Rule Learning is the Market Basket Analysis. The problem analyses the association between various items that has the highest probability of being bought together by a customer.

For example, the association rule, {onions, chicken masala} => {chicken} says that a person who has got both onions and chicken masala in his or her basket has a high probability of buying chicken also.

Apriori Algorithm

The algorithm was first proposed in 1994 by Rakesh Agrawal and Ramakrishnan Srikant. Apriori algorithm finds the most frequent itemsets or elements in a transaction database and identifies association rules between the items just like the above-mentioned example.

The algorithm uses a "bottom-up" approach, where frequent subsets are extended one item at once (candidate generation) and groups of candidates are tested against the data. The algorithm terminates when no further successful rules can be derived from the data.

How Apriori works

To construct association rules between elements or items, the algorithm considers 3 important factors which are, support, confidence and lift. Each of these factors is explained as follows:

Support:

The support of item I is defined as the ratio between the number of transactions containing the item I by the total number of transactions expressed as :

Confidence:

This is measured by the proportion of transactions with item I1, in which item I2 also appears. The confidence between two items I1 and I2,  in a transaction is defined as the total number of transactions containing both items I1 and I2 divided by the total number of transactions containing I1.

Lift:

Lift is the ratio between the confidence and support expressed as :

Implementing Apriori With Python

Let us consider a simple dataset consisting of a thousand observations of the movie interests of a thousand different people. We will use the data to understand different associations between different items in this case movies. The objective is to identify the chances of a person watching a movie given he has already watched other movies.

Before we begin our coding we need to install the apyori package. To install the package, open the terminal or command prompt, type in and enter the following command :

pip install apyori

Note:

Do not forget to activate your conda environment with <code>conda activate</code> if you are working with anaconda.

Lets code!

Importing the dataset

import pandas as pd
data = pd.read_excel("Movie_reccommendation.xlsx")

Let's have a look at the dataset :


Converting the data frame into lists

The algorithm in the apyori package is implemented in such a way that the input to the algorithm is a list of lists rather than a data frame. So we need to convert the data into a list of lists.

observations = [] for i in range(len(data)):
observations.append([str(data.values[i,j]) for j in range(13)])

Fitting the data to the algorithm

from apyori import apriori
associations = apriori(observations, min_length = 2, min_support = 0.2, min_confidence = 0.2, min_lift = 3)

Where,

  • min_support: The minimum support of relations (float)
  • min_confidence: The minimum confidence of relations (float)
  • min_lift: The minimum lift of relations (float)
  • min_length: The minimum number of items in a rule
  • max_length: The maximum number of items in a rule

The optimum values for min_support,min_confidence and  min_lift arguments can be set by trying out different values and checking the association rules whether the arguments produced a valid association between items or not.

Once we execute the above code block, the algorithm returns 37 rules based on the set parameters of min_length = 2, min_support = 0.2, min_confidence = 0.2 and min_lift = 3

Converting the associations to lists

associations = list(associations)

See Also

Understanding the rules

The apriori algorithm automatically sorts the associations' rules based on relevance, thus the topmost rule has the highest relevance compared to the other rules returned by the algorithm.

Let's have a look at the first and most relevant association rule from the given dataset.

print(associations[0])

Output:

RelationRecord(items=frozenset({'ghost in the shell', 'ex machina'}), support=0.327, ordered_statistics=[OrderedStatistic(items_base=frozenset({'ex machina'}), items_add=frozenset({'ghost in the shell'}), confidence=1.0, lift=3.058103975535168), OrderedStatistic(items_base=frozenset({'ghost in the shell'}), items_add=frozenset({'ex machina'}), confidence=1.0, lift=3.058103975535168)])

Rule one is the most relevant rule that the algorithm identified from the given dataset.

The above output specifies the association between two movies 'ghost in the shell' and 'ex machina'.

The two movies have the support of 0.327. i,e.

support('ghost in the shell', 'ex machina') =0.327

Also from the output,

confidence('ex machina''ghost in the shell')=1

confidence('ghost in the shell''ex machina')=1

This implies that if a person has watched ‘ex machina’ he or she is 100% likely to watch 'ghost in the shell' and also if a person has watched 'ghost in the shell' he or she is 100% likely to watch 'ex machina’.

The lift of 3.058103 shows the relevance of the rule since we have only chosen the rules with a minimum relevance of 3.



Register for our upcoming events:


Enjoyed this story? Join our Telegram group. And be part of an engaging community.


Our annual ranking of Artificial Intelligence Programs in India for 2019 is out. Check here.

Provide your comments below

comments

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
Scroll To Top