Now Reading
Effective Way To Replace Correlation With Predictive Power Score(PPS) In Python

Effective Way To Replace Correlation With Predictive Power Score(PPS) In Python

PP Score Banner
W3Schools

The strength of a linear relationship between two quantitative variables can be measured using Correlation. It is a statistical method that is very easy in order to calculate and to interpret. It is generally represented by ‘r’ known as the coefficient of correlation.

This is the reason why it is highly misused by professionals because correlation cannot be termed for causation. It is not necessary that if two variables have a correlation then one is dependent on the other and similarly if there is no correlation between two variables it is possible that they might have some relation. This is where PPS(Predictive Power Score) comes into the role.

 Predictive Power Score works similar to the coefficient of correlation but has some additional functionalities like:



  • It works on both Linear and Non-Linear Relationships
  • Can be applied to both Numeric and Categorical columns
  • It finds more patterns in the data.

In this article, we will explore how we can use the Predictive Power Score to replace correlation.

Implementation:

PPS is an open-source python library so we will install it like any other python library using pip install ppscore.

  1. Importing required libraries

We will import ppscore along with pandas to load a dataset that we will work on.

import ppscore as pps

import pandas as pd

  1. Loading the Dataset

We will be using different datasets to explore different functionalities of PPS. We will first import an advertising dataset of an MNC which contains the target variable as ‘Sales’ and features like  ‘TV’, ‘Radio’, etc. 

df = pd.read_csv(‘advertising.csv’)

df.head()

  1. Finding Relation using PPScore

We will use some basic functions defined in ppscore.

  1. Finding the Relationship score

PP Score lies between 0(No Predictive Power) to 1(perfect predictive power), in this step we will find PPScore/Relationship between the target variable and the featured variable in the given dataset.

pps.score(df, "Sales", "TV") 

Here we can see that along with the ppscore it provides a lot more information that is the  Model it uses for finding the score, what is the core of the model, evaluation metric used, etc.

Similarly, we can find the PP Score for all the features against the targeted variable which is ‘Sales’ in our case using the predictor function.

pps.predictors(df, "Sales")

Dataset Advertisement

Here we can see that we found the predictive power score for all the features/predictors.

  1. Visualizing the correlation

Normally we create a correlation matrix and visualize it using a heatmap, PPS also has a  matrix function which is similar to the correlation matrix. Let us create a pps matrix and visualize it.

For visualization, we will be using seaborn and we need to import it.

import seaborn as sns

matrix_df = pps.matrix(df).pivot(columns='x', index='y',  values='ppscore')

sns.heatmap(matrix_df, annot=True)

Heatmap

This is how we can visualize the ppscore relationship between different attributes of the dataset.

Now let us explore one more dataset which contains both categorical and numerical data. The dataset can be downloaded from Kaggle and it contains attributes of different used cars and contains mixed data i.e. both numeric and categorical. We will remove all the column named features to reduce the number of columns. 

See Also
Yellowbrick Visualization

df1 = pd.read_csv('cars.csv')

df1.head()

Dataset

We have already seen how we can find the ppscore, so we will now compare the visualization using the normal correlation matrix and the ppscore matrix.

  1. Correlation

from matplotlib.pyplot import figure

figure(figsize=(12,8))

sns.heatmap(df1.corr(), annot=True)

Heatmap with Correlation

Here we can see that the total number of attributes is 9 which are the numerical columns because correlation only finds relation between categorical columns.

  1. PPScore Matrix Visualization

figure(figsize=(12,8))

a = pps.matrix(df1).pivot(columns='x', index='y', values='ppscore')

sns.heatmap(a, annot=True)

Heatmap using PPScore

Here we can see that it takes in the count all the columns which are there in the dataset which makes it more useful and powerful than correlation.   

Conclusion:

In this article we saw how correlation can be replaced using ppscore, which is an open-source python library used for finding relationships in both numerical and categorical columns, we also visualized the relationship created by correlation and ppscore to see what’s the difference between them.  

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top