Association Rules Mining: How often are those items found together?

6 min readMar 22, 2022

Association rules mining is often used to find items commonly bought together. However, this is not the only use for it. Finding which symptoms are associated with diseases, fraud detection, risk management, and manufacturing analysis are some of the other ways that it has been used. The main goal of association rules mining is to find rules that state, “If you have this, then it is likely that you will also have that.” These often appear in the format:

this -> that

this and this -> that

This and that occurring together does not mean that one causes the other, just that they often occur in the same situation. Even so, association rules mining can lead to some very useful insights.

Unfortunately, association analysis can take a long time and use a lot of memory. To find every possible association, you must check every possible pairing against each item set. You can see how the time required to do this could quickly become exponential, so it is necessary to use strategies to reduce the number of comparisons made. This can be done by checking fewer transactions, reducing the number of associations considered or by reducing the number of comparisons that must be made. The algorithms that have been designed for association rules mining all use one or more of these strategies.

Which Algorithm Should You Use?

Apriori, FP-Growth and Eclat are the three main algorithms for association rules mining. They each have their own advantages and disadvantages. The Apriori algorithm does not require as much memory as the other algorithms. However, it reads the data several times as it is locating frequent itemsets, so it can take a long time to run. The FP-Growth algorithm tends to be more time-efficient. It finds rules by building a tree structure, which reduces the amount of data to be searched while still retaining information about all item associations. However, memory can be an issue because the entire tree it builds to find rules must be stored. This is more likely to be the case when each transaction contains many uncommon items. The ECLAT algorithm organizes the information by item rather than transaction. This reduces the number of times the data needs to be scanned and the amount of information stored in memory. However, it can be time-consuming when there are many different transactions. Choosing which algorithm to use should be done based on the characteristics of your dataset.

Something all of the algorithms have in common is that they only consider items or item sets if they meet a minimum level of support. Support simply means how many transactions an item appears in out of all possible transactions. You can find more resources that explain how each algorithm works in detail at the bottom of this article.

Association Rules Mining In Python and R

If you would like to try association rules mining yourself, it is easy to do in both Python and R.

In Python, you can use the mlxtend library. First, use one-hot encoding to transform your data frame into the correct format for mining. You can also specify the minimum support and the maximum length of rules. Remember, support indicates how often an item can be found in a dataset. If you only want rules with common items, the support should be higher. If you are interested in rules for items that are rare, you should set the support lower. However, this causes more candidates to be considered, which can be time-consuming and use more memory.

Code for Association Rules Mining in Python using Apriori or FP Growth Algorithms.

You can filter the rules further by specifying the metric and minimum threshold with the association rules function. Here is an example of how I filtered the results to see only rules that contained ‘fraud_amt_high’.

Finding rules containing a specific item is one advantage that R’s arules library has over any libraries available for Python. In R, you will need to install and load the arules, aruleviz, tidyverse and rCBA libraries. The arules library allows you to specify the minimum support and confidence, the maximum length, and the antecedent or consequent. The antecedent is the item(s) that are in the “if” part of the if->then statement. The item(s) in the “then” part are the consequent. Unlike Python, the dataframe does not need to one-hot incoded. Instead, it needs to be turned into a list and then transactions.

Apriori and fpGrowth in R.

Evaluating The Results

To understand how to evaluate the results, you need to understand a few words. Let’s assume that we have done association rules mining on transactions at the grocery store and one of the rules found was:

flour -> butter

Scores for one rule from Python

In this situation flour is the antecedent and butter is the consequent. Support indicates how often a particular item occurs in the entire dataset. In the example above, flour has a very low support of 0.005 and butter has higher support at 0.194. This mean that butter appears more often in transactions than flour. The column that just says “support” tells how often both items appear together.

Confidence indicates how likely it is for someone to have butter in their basket if you know they have flour. A confidence of .6 means that if someone has butter in their basket, there is a 60% chance that they also have flour.

Lift is a measure that indicates how much one item increases the likelihood of another item occurring together with it. It compares the actual confidence of the rule with the confidence you would expect if there were no relationship between the items at all. A lift of 1 indicates no correlation between the items. A lift below zero indicates a negative correlation. In other words, if someone buys one of the items they are less likely to buy the other. Coke and Pepsi would likely have a lift below 0. A lift greater than one indicates a positive correlation; finding one of the items in a shopping cart increases the likelihood of finding the other. Lift is unaffected by whether an item in an antecedent or a consequent. Bread -> butter and butter -> bread will have the same lift score. Lift is a good measure of interestingness when you are looking at rare items in a dataset. If an item has low support, but high lift there is a good chance it is a rule you will want to look more closely at.

For items that are common in a dataset, conviction is a better measure of interestingness. To find the conviction score, the likelihood of butter appearing with bread if there was no connection between the two is calculated. This is compared the to the actual frequency of butter appearing with bread in the dataset. Unlike lift, the conviction score is different if the order of the item changes — bread -> butter will not have the same conviction as butter -> bread.

The Algorithms In More Detail

Here some resources that explain each of the algorithms simply and in more depth. Understanding how the algorithms work can help you choose which one to use for your particular dataset. If your dataset is not particularly large or complicated, you might want to try all three and compare the results.

Apriori Algorithm

A. Bhatia Apriori Algorithm https://www.youtube.com/watch?v=gNvkj3J1JVg

J. Korstanje The Apriori algorithm https://towardsdatascience.com/the-apriori-algorithm-5da3db9aea95 Sep 22, 2021

FP Growth Algorithm

M. Huddar, Frequent Pattern (FP) Growth Algorithm Association Rule Mining Solved Example https://youtu.be/7oGz4PCp9jI

Andrewngai Understand and Build FP-Growth Algorithm in Python https://towardsdatascience.com/understand-and-build-fp-growth-algorithm-in-python-d8b989bab342 May 17, 2021

ECLAT Algorithm

P. Varsha ECLAT, Vertical Apriori Algorithm, problem, exercise, solved, Association Rule Mining, Data Mining https://www.youtube.com/watch?v=IwbnylEzp0w

J. Korstanje The Eclat algorithm https://towardsdatascience.com/the-eclat-algorithm-8ae3276d2d17 Sep 29, 2021

General References

Azevedo, P. J., & Jorge, A. M. (2007). Comparing Rule Measures for Predictive Association Rules. Conference: Machine Learning: ECML 2007, 18th European Conference on Machine Learning.

Gu, L., Li, J., He, H., Williams, G., Hawkins, S., & Kelman, C. (2003). Association Rule Discovery with Unbalanced Class Distributions. Advances in Artificial Intelligence, 16th Australian Conference on Artificial Intelligence, Perth, Australia, December 3–5, 2003, Proceedings.

Tackett, J. A. (2013). Association Rules for Fraud Detection. Journal of Corporate Accounting and Finance, 24(4), 15–22. https://doi.org/10.1002/jcaf.21856

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2018). Introduction to Data Mining (2nd Edition) (2nd ed.). Pearson.

Libraries

Python mlxtend library: https://pypi.org/project/mlxtend/

R arules library: https://www.rdocumentation.org/packages/arules/versions/1.7-3