Data Mining for Performance Analysis in Cricket

Cricket batsman hitting a ball shot from below with stumps on cricket pitch

Data mining is one of the widely used techniques for finding hidden patterns from voluminous data. Sports management committee uses data mining as a tool to select the players of the team to achieve best results.

In this article, data mining is used for Indian cricket team and an analysis is being carried out to decide the order of players dynamically. Association mining rule is applied to performance data such as( batting average and bowling average.etc) pertaining to Indian Cricketers collected from secondary resources. Some of the index parameters such as performing in first inning or second inning or playing in the home country or abroad are used.

This analysis shows the performance of the Indian Cricketers from 2001 to 2012. The detailed study carried out reveals that performance of Indian cricketers while playing in the first inning is better in the home ground as compared to second inning being played abroad. The same methodology can be applied to other Team’s cricket players.  

Cricket is considered today as one of the major world sports in terms of participants, spectators and media interest. Although it originates from England, cricket did not attract much interest and attention in Europe like football did. However it became hugely popular in countries such as India, Pakistan, Sri Lanka, Bangladesh, South Africa, Australia, New Zealand and West Indies, most of them former British colonies or still under the Crown influence.

With an increased influence and interest in the game of cricket around the world, the International Cricket Conference (ICC) is trying to implement new development programs with the goal of producing more national teams capable of competing at Test level but also club teams that can compete in professional leagues at national or international level.

Thus, in the last years, we could see the development of the shorter versions of the game such as the Twenty20 World Cup (2007), the official Indian Premier League (2008) and the Cricket Champion League (2009).Because of its increased popularity and tremendous developments, especially in terms of the birth of new professional competitions, cricket became today a major attraction, whose performance in all of its aspects is an important phenomenon to watch and measure. As a result, more applications and programs that monitor performance in cricket have already started to emerge.

Literature Review:

In data mining, association rule learning is a popular and well-researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships. Support is an indication of how frequently the items appear in the database. The confidence indicates the number of times the if/then statements have been found to be true.

Association rules are if/then statements that help uncover relationships between seemingly unrelated data in a relational database or another information repository. An example of an association rule would be “If a customer buys a dozen eggs, he is 80% likely to also purchase milk”. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time. Association rule generation is usually split up into two separate steps:

  1. First, minimum support is applied to find all frequent itemsets in a database.
  2. Second, these frequent itemsets and the minimum confidence constraint are used to form rules.

Many algorithms for generating association rules were presented over time. Some well-known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from frequent itemsets found in a database. In data mining, association rules are useful for analyzing and predicting customer behaviour. They play an important part in shopping basket data analysis, product clustering, catalogue design and store layout.


Data used for this research is collected from cricket information websites such as analysis is carried out using Weka version 3.7.9 software developed by the University of Waikato, Newzeland.In this research, I have considered Indian team consisting of 11 players in which,7 players are considered as batsmen and 4 players as a bowler, performance of players is analyzed against Australian team to decide the order of the batsmen and bowler as well. The support for the analysis is varied from 0.9 to 0.1 and the Confidence for the analysis is set at 0.7.  

Indian batsmen performance is poor at away condition than home, but this analysis shows that the performances of the most of Indian batsmen against Australia at away condition and in second innings are extremely well. The outcome of the toss, the order of innings and venue does not have any impact on the performance of Indian batsmen against Australia in home condition. This result clearly shows the superiority of Australian bowler over India batsmen at away condition in first innings. The outcome of toss, order of innings and venue has not impacted on the performance of Indian batsmen in the neutral condition.


Career statistics of 11 current cricketers who had played at least 30 ODI matches is selected as the input dataset. Two data sets are considered for this study, which are as follows: Dataset 1: Contain seven Players which are considered as batsmen. Dataset 2: Contain four players which are considered as bowler. The results of the analysis against Australian team considering Dataset 1 are presented in Table 1. The results of the analysis against Australian team considering Dataset 2 are presented in Table 2.

Batsmen (Dataset -1)

Bowlers (Dataset -2)

Some of the exciting results observed in Table 2 are discussed in this section. Indian batsmen bowler is poor at away condition than home, this analysis shows that the performances of the Indian bowler against Australia at home condition and in second innings are extremely well.

The outcome of the toss, order of innings and venue does not have any impact on the performance of Indian bowler against Australia at home condition.

This result clearly shows the superior performance of Australian batsmen over India bowler at foreign condition in first innings. The outcome of toss, order of innings and venue has not impacted on performance of Indian bowler at neutral condition in first inning.


In this article, I have discussed the application of ARM in sports management, especially, in cricket. The current research, being the first of its kind to apply ARM to cricket to decide the dynamic order of the players, has its limitations.

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox