Now Reading
Banking Analytics Basics – Developing A Customer Level Behaviour Scorecard

Banking Analytics Basics – Developing A Customer Level Behaviour Scorecard

We have various types of scorecards like acquisition, behaviour, income, collection etc. The purpose of building a behaviour scorecard is to monitor the performance of booked accounts, i.e. accounts which are already in Bank’s books. We use the scorecard to predict the performance and transition of accounts across various delinquency buckets and how they are performing over a given time horizon (known as the performance window (derived from vintage analysis) during model development process and we also monitor them during the post-performance period as well as the Out-Of-Time period to check the performance of our model).

The behaviour scorecards are used by almost all the banks to predict the probability of default of a customer and the key decisions are made based on the behaviour scorecard. Most of risk analytics projects are around the development and validation of behaviour scorecards. An advance level of the same concept can be applied to develop the rating scale, similar to the one used by S&P or Moody’s where AAA rating indicates low risk of default (hence better rating) compared to BBB rating.

Register for our upcoming Masterclass>>

The prime objective of this model is to generate behaviour scores which would be used for portfolio decision in conjunction with existing application score used for the purpose of collection/portfolio review. Only internal bank data is used, development data uses 4 vintages (Jan 15, Apr 15, Jul 15 and Oct 15) and out of time data uses 3 vintages (Jan 16, Apr 16, Jul 16):

  • Data aggregation for DEV 25% random sample from each one of the 4 vintages
  • Data aggregation for OOT Validation 33.3% random sample from each one of the 3 vintages

Modelling Methodology

 

Variable Reduction Process

There are various variable reduction processes which are followed during a model development process. They are as follows:

  • Removing variables with std dev = 0
  • Removing variables with a high missing %
  • Information value check (variables with IV<0.1 and IV>0.5 can be removed as weak and over predicting respectively)
  • Bi-variate Trend Analysis (checking the trend of the variable with the target)
  • Multicollinearity removal (variables with VIF > 2 should be removed due to multicollinearity)

Waterfall chart for variable reduction process

Final Variables in the Model

In the final model has 8 variables:

  • Count of delinquency > 0 in last 6 cycles (A)
  • Number of cash advances in last 6 cycles (B)
  • Current balance as a % of max balance in last 6 cycles (C)
  • Payment ratio (D)
  • Amount of over limit in last 6 cycles (E)
  • Number of cycles since highest cash use amount (F)
  • Number of purchases in last 6 cycles (G)
  • Outstanding balance in last 6 cycles (H)
Parameter DF Estimate Standard Error Wald Chi Square Pr > Chi Square Marginal KS VIF
Intercept 1 -3.89 0.0106 1,34,542 <0.0001
A 1 0.71 0.0106 4,573 <0.0001 3.1 1.12
B 1 0.16 0.0142 118 <0.0001 0.1 1.54
C 1 0.63 0.0125 2,515 <0.0001 3.6 1.43
D 1 0.32 0.0128 634 <0.0001 1.3 1.45
E 1 0.29 0.0121 585 <0.0001 0.2 1.14
F 1 0.45 0.0169 703 <0.0001 0.6 1.27
G 1 0.49 0.0283 299 <0.0001 0.3 1.54
H 1 0.19 0.0153 158 <0.0001 0.2 1.31

 

Model performance

Concordance:

Model performance can be checked with the concordance value:

  • A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
  • A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
  • A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).

For this model:

  • %concordant = 85.7%
  • %discordant = 14.2%
  • %tied = 0.1%

AUROC:

AUROC gives the area under the ROC curve. It is plotted as a graph between sensitivity and 1-specificity, which we can get from the confusion matrix. An ideal model will have AUROC very close to 1.

  • Sensitivity is the ability of a model to correctly predict y=1 and specificity is the ability of a model to correctly predict y=0.
  • Sensitivity is calculated as True Positive / (True Positive + False Negative) while specificity is calculated as True Negative / (False Positive + True Negative).
Parameter Actual

Good

Actual

Bad

Model

Good

True

Positive

False

See Also
Top 8 Upcoming AI Events To Look Out For

Positive

Model

Bad

False

Negative

True

Negative

 

For this model:

  • AUROC = 0.858

KS:

The KS statistic gives us the separation power of the model. It is calculated as the maximum of the absolute value of the difference between cumulative non-event and cumulative event. A good model will have a KS > 30. A high value of KS will depict over-prediction in the model.

For this model:

  • KS = 57.07

We have two important concepts of Gains and Lifts which we can get from the KS table.

  • Gain: Gain at a given decile level is the ratio of cumulative number of targets (events) up to that decile to the total number of targets (events) in the entire data set. It can be Interpreted as the % of targets (events) covered at a given decile level.
  • Lift: It measures how much better one can expect to do with the predictive model comparing without a model. It is the ratio of gain % to the random expectation % at a given decile level. The random expectation at the xth decile is x%.
What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top