MITB Banner

Effective Way To Replace Correlation With Predictive Power Score(PPS) In Python

Predictive Power Score works similar to the coefficient of correlation but has some additional functionalities.

Share

PP Score Banner

The strength of a linear relationship between two quantitative variables can be measured using Correlation. It is a statistical method that is very easy in order to calculate and to interpret. It is generally represented by ‘r’ known as the coefficient of correlation.

This is the reason why it is highly misused by professionals because correlation cannot be termed for causation. It is not necessary that if two variables have a correlation then one is dependent on the other and similarly if there is no correlation between two variables it is possible that they might have some relation. This is where PPS(Predictive Power Score) comes into the role.

 Predictive Power Score works similar to the coefficient of correlation but has some additional functionalities like:

  • It works on both Linear and Non-Linear Relationships
  • Can be applied to both Numeric and Categorical columns
  • It finds more patterns in the data.

In this article, we will explore how we can use the Predictive Power Score to replace correlation.

Implementation:

PPS is an open-source python library so we will install it like any other python library using pip install ppscore.

  1. Importing required libraries

We will import ppscore along with pandas to load a dataset that we will work on.

import ppscore as pps

import pandas as pd

  1. Loading the Dataset

We will be using different datasets to explore different functionalities of PPS. We will first import an advertising dataset of an MNC which contains the target variable as ‘Sales’ and features like  ‘TV’, ‘Radio’, etc. 

df = pd.read_csv(‘advertising.csv’)

df.head()

  1. Finding Relation using PPScore

We will use some basic functions defined in ppscore.

  1. Finding the Relationship score

PP Score lies between 0(No Predictive Power) to 1(perfect predictive power), in this step we will find PPScore/Relationship between the target variable and the featured variable in the given dataset.

pps.score(df, "Sales", "TV") 

Here we can see that along with the ppscore it provides a lot more information that is the  Model it uses for finding the score, what is the core of the model, evaluation metric used, etc.

Similarly, we can find the PP Score for all the features against the targeted variable which is ‘Sales’ in our case using the predictor function.

pps.predictors(df, "Sales")

Dataset Advertisement

Here we can see that we found the predictive power score for all the features/predictors.

  1. Visualizing the correlation

Normally we create a correlation matrix and visualize it using a heatmap, PPS also has a  matrix function which is similar to the correlation matrix. Let us create a pps matrix and visualize it.

For visualization, we will be using seaborn and we need to import it.

import seaborn as sns

matrix_df = pps.matrix(df).pivot(columns='x', index='y',  values='ppscore')

sns.heatmap(matrix_df, annot=True)

Heatmap

This is how we can visualize the ppscore relationship between different attributes of the dataset.

Now let us explore one more dataset which contains both categorical and numerical data. The dataset can be downloaded from Kaggle and it contains attributes of different used cars and contains mixed data i.e. both numeric and categorical. We will remove all the column named features to reduce the number of columns. 

df1 = pd.read_csv('cars.csv')

df1.head()

Dataset

We have already seen how we can find the ppscore, so we will now compare the visualization using the normal correlation matrix and the ppscore matrix.

  1. Correlation

from matplotlib.pyplot import figure

figure(figsize=(12,8))

sns.heatmap(df1.corr(), annot=True)

Heatmap with Correlation

Here we can see that the total number of attributes is 9 which are the numerical columns because correlation only finds relation between categorical columns.

  1. PPScore Matrix Visualization

figure(figsize=(12,8))

a = pps.matrix(df1).pivot(columns='x', index='y', values='ppscore')

sns.heatmap(a, annot=True)

Heatmap using PPScore

Here we can see that it takes in the count all the columns which are there in the dataset which makes it more useful and powerful than correlation.   

Conclusion:

In this article we saw how correlation can be replaced using ppscore, which is an open-source python library used for finding relationships in both numerical and categorical columns, we also visualized the relationship created by correlation and ppscore to see what’s the difference between them.  

Share
Picture of Himanshu Sharma

Himanshu Sharma

An aspiring Data Scientist currently Pursuing MBA in Applied Data Science, with an Interest in the financial markets. I have experience in Data Analytics, Data Visualization, Machine Learning, Creating Dashboards and Writing articles related to Data Science.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.