Active Hackathon

# Understanding Cohen’s Kappa Score With Hands-On Implementation

In this article, we will learn in detail about what Cohen’s kappa is and how it can be useful in machine learning problems.

After a machine learning model is trained and tested, there are two primary factors that need attention. These are reliability and validity. Reliability is the level of trust we have on the model to produce consistent results in similar situations. It is the precision of the model. On the other hand, validity is the accuracy of the model on test data, that is how good is the result produced.

Cohen’s Kappa is a statistical measure that is used to measure the reliability of two raters who are rating the same quantity and identifies how frequently the raters are in agreement.

#### THE BELAMY

In this article, we will learn in detail about what Cohen’s kappa is and how it can be useful in machine learning problems.

### Intra-rater and inter-rater reliability

Before we understand Cohen’s Kappa, let us understand what Intra and inter-rater reliability are. Consider an experiment where two people are voting yes or no.

Intra-rater reliability is when the same type of experiment is completed by the same rater but in two or more different situations.

Inter-rater reliability is when there are two different raters who are rating for the same experiment and they agree on the same vote.

### Understanding and evaluating Cohen’s Kappa

If there are N items that need to be classified into C mutually exclusive categories, the work of Cohen’s kappa is to measure the agreement between the two raters in order to classify N to C.

The value for this can be between 0 and 1 where 0 means there is no or random agreement between the raters and 1 indicates there is total agreement between them. But there can even be a negative value which indicates that there is absolutely no agreement between them.

To make things simple, let us derive the formula and make calculations to evaluate this metric.

Assume there are two raters r1 and r2 and they are rating ‘yes’ and ‘no’. Their choices are as follows:

r1=[‘yes’,’no’,’yes’,’no’,’yes’,’no’,’yes’,’no’,’yes’]

r2=[‘yes’,’yes’,’yes’,’no’,’no’,’no’,’yes’,’yes’,’yes’]

Now I will make a grid that can calculate the number of yeses and no’s.

Now, let us make the calculations

First, let us calculate the total possibilities in which both parties agree. That is the diagonal of the above matrix.

Agreement= sum of agreements / total number of instances

Agreement = (4+2)/9 = 0.66

Now we need to consider the cases where raters are not in agreement. We will do this calculating probability of yes and no.

p(yes)= ((4+1)/9)*((4+2)/9)=0.37

p(no)=((2+2)/9)*((2+1)/9)=0.14

Total non disagreement= 0.37+0.14= 0.51

To calculate the Kappa coefficient we will take the probability of agreement minus the probability of disagreement divided by 1 minus the probability of disagreement.

K= 1-(0.34/0.49) = 0.31

This is a positive value which means there is some mutual agreement between the parties.

Let us now implement this with sklearn and check the value.

`r1=['yes','no','yes','no','yes','no','yes','no','yes']`

`r2=['yes','yes','yes','no','no','no','yes','yes','yes']`

`from sklearn.metrics import cohen_kappa_score`

`cohen_kappa_score(r1,r2)`

### Special cases:

1. When there are opposing votes the value of k is 0 which means there is no agreement between the parties.

`r1=['yes']*9`

`r2=['no']*9`

`from sklearn.metrics import cohen_kappa_score`

`cohen_kappa_score(r1,r2)`

1. When there are very few agreements between the two the k value is negative

`r1=['yes','yes','no','no','yes','yes','no','no','yes']`

`r2=['yes','yes','yes','no','no','no','yes','yes','yes']`

`from sklearn.metrics import cohen_kappa_score`

`cohen_kappa_score(r1,r2)`

The main use of Cohen’s kappa is to understand and identify if the data that is collected for training purposes is the correct representation of variables or not. If the values are closer to 1 it is considered good and values closer to 0 are uncertain.

### Conclusion

In this article, we understood the complete working of Cohen’s kappa and made the calculations as well. With this information, it can be useful in understanding the data distribution and which values are reliable and which are not.

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

## More Great AIM Stories

### Why Doing A Full-Time Data Science Course Is Better

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

## Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

### Indian IT Finds it Difficult to Sustain Work from Home Any Longer

Hybrid work models provide the best of both worlds and offer the flexibility of remote working/working from home/working from anywhere.

### Engineering Emmys Announced – Who Were The Biggest Winners

Dr. Paul E. Debevec was awarded the Charles F. Jenkins Lifetime Achievement Award.

### How can the Indian Railway benefit from 5G?

Deploying multiple sensors will allow the Railways to monitor tracks, power systems and environmental conditions in real-time.

### Need a Fashion Designer? Just Ask the AI

AI technology has advanced to the level that it can create complicated unique designs

### Does India match up to the USA and China in AI-enabled warfare?

India’s military spending for 2021 was ranked as the third-highest in the world.

### ThoughtWorks Bats Thoughtfully, calls for Leveraging Tech Responsibly

Across the globe, there’s a lot of demand for data mesh, data platforms and modernising data ecosystems.

### The origin of Neo4j

Neo4j has more than 700 employees globally.

### Attention aspiring data scientists and analytics enthusiasts: Genpact is holding a career day in September!

Don’t miss the opportunity to interact with some of the brightest minds in analytics during Genpact’s Analytics Career Day.

### Poll Campaigns Get Interesting with Deepfakes, Chatbots & AI Candidates

The world around politics is changing as people nominate AI bots in elections, deepfake videos are circulated by political parties and AR and 3D holograms get popular in Indian politics.

### Decentralised, Distributed, Transparent: Blockchain to Disrupt Ad Industry

The distributed, decentralised and transparent system of blockchain checks ad frauds and increase ROI