Active Hackathon

Banking Analytics Basics – Developing A Customer Level Behaviour Scorecard

We have various types of scorecards like acquisition, behaviour, income, collection etc. The purpose of building a behaviour scorecard is to monitor the performance of booked accounts, i.e. accounts which are already in Bank’s books. We use the scorecard to predict the performance and transition of accounts across various delinquency buckets and how they are performing over a given time horizon (known as the performance window (derived from vintage analysis) during model development process and we also monitor them during the post-performance period as well as the Out-Of-Time period to check the performance of our model).

The behaviour scorecards are used by almost all the banks to predict the probability of default of a customer and the key decisions are made based on the behaviour scorecard. Most of risk analytics projects are around the development and validation of behaviour scorecards. An advance level of the same concept can be applied to develop the rating scale, similar to the one used by S&P or Moody’s where AAA rating indicates low risk of default (hence better rating) compared to BBB rating.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The prime objective of this model is to generate behaviour scores which would be used for portfolio decision in conjunction with existing application score used for the purpose of collection/portfolio review. Only internal bank data is used, development data uses 4 vintages (Jan 15, Apr 15, Jul 15 and Oct 15) and out of time data uses 3 vintages (Jan 16, Apr 16, Jul 16):

  • Data aggregation for DEV 25% random sample from each one of the 4 vintages
  • Data aggregation for OOT Validation 33.3% random sample from each one of the 3 vintages

Modelling Methodology

 

Variable Reduction Process

There are various variable reduction processes which are followed during a model development process. They are as follows:

  • Removing variables with std dev = 0
  • Removing variables with a high missing %
  • Information value check (variables with IV<0.1 and IV>0.5 can be removed as weak and over predicting respectively)
  • Bi-variate Trend Analysis (checking the trend of the variable with the target)
  • Multicollinearity removal (variables with VIF > 2 should be removed due to multicollinearity)

Waterfall chart for variable reduction process

Final Variables in the Model

In the final model has 8 variables:

  • Count of delinquency > 0 in last 6 cycles (A)
  • Number of cash advances in last 6 cycles (B)
  • Current balance as a % of max balance in last 6 cycles (C)
  • Payment ratio (D)
  • Amount of over limit in last 6 cycles (E)
  • Number of cycles since highest cash use amount (F)
  • Number of purchases in last 6 cycles (G)
  • Outstanding balance in last 6 cycles (H)
Parameter DF Estimate Standard Error Wald Chi Square Pr > Chi Square Marginal KS VIF
Intercept 1 -3.89 0.0106 1,34,542 <0.0001
A 1 0.71 0.0106 4,573 <0.0001 3.1 1.12
B 1 0.16 0.0142 118 <0.0001 0.1 1.54
C 1 0.63 0.0125 2,515 <0.0001 3.6 1.43
D 1 0.32 0.0128 634 <0.0001 1.3 1.45
E 1 0.29 0.0121 585 <0.0001 0.2 1.14
F 1 0.45 0.0169 703 <0.0001 0.6 1.27
G 1 0.49 0.0283 299 <0.0001 0.3 1.54
H 1 0.19 0.0153 158 <0.0001 0.2 1.31

 

Model performance

Concordance:

Model performance can be checked with the concordance value:

  • A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
  • A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
  • A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).

For this model:

  • %concordant = 85.7%
  • %discordant = 14.2%
  • %tied = 0.1%

AUROC:

AUROC gives the area under the ROC curve. It is plotted as a graph between sensitivity and 1-specificity, which we can get from the confusion matrix. An ideal model will have AUROC very close to 1.

  • Sensitivity is the ability of a model to correctly predict y=1 and specificity is the ability of a model to correctly predict y=0.
  • Sensitivity is calculated as True Positive / (True Positive + False Negative) while specificity is calculated as True Negative / (False Positive + True Negative).
Parameter Actual

Good

Actual

Bad

Model

Good

True

Positive

False

Positive

Model

Bad

False

Negative

True

Negative

 

For this model:

  • AUROC = 0.858

KS:

The KS statistic gives us the separation power of the model. It is calculated as the maximum of the absolute value of the difference between cumulative non-event and cumulative event. A good model will have a KS > 30. A high value of KS will depict over-prediction in the model.

For this model:

  • KS = 57.07

We have two important concepts of Gains and Lifts which we can get from the KS table.

  • Gain: Gain at a given decile level is the ratio of cumulative number of targets (events) up to that decile to the total number of targets (events) in the entire data set. It can be Interpreted as the % of targets (events) covered at a given decile level.
  • Lift: It measures how much better one can expect to do with the predictive model comparing without a model. It is the ratio of gain % to the random expectation % at a given decile level. The random expectation at the xth decile is x%.

More Great AIM Stories

Rohit Garg
Rohit Garg has close to 7 years of work experience in field of data analytics and machine learning. He has worked extensively in the areas of predictive modeling, time series analysis and segmentation techniques. Rohit holds BE from BITS Pilani and PGDM from IIM Raipur.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM