   # PPNR Modeling – OLS, Co-Integration and ARIMAX • Business problem: To forecast the different components of PPNR. These components include Non-interest Income and Non-interest Expense.
• Proposed solution
• OLS
• Advantages – easy to develop / test and easy to explain
• Disadvantages– difficult to finding strong correlation between dependent and independent variables
• Co-Integration
• Advantages – easy to find strong correlations between dependent and independent variables
• Disadvantages – difficult to pass all the tests / assumptions of co-integration
• ARIMAX
• Advantages – very powerful modeling technique to overcome the shortcomings of OLS and co-integration models
• Disadvantages – complex to develop as there are two stages. In stage 1 OLS model is developed and in stage 2 ARIMAX model is developed post identification of AR and MA terms

1. Introduction
1. PPNR
• Pre-provision net revenue (PPNR), under the Federal Reserve’s Comprehensive Capital Analysis and Review (CCAR), measures net revenue forecast from asset-liability spreads and non-trading fees of banks.
• Pre-provision Net Revenue (PPNR) = Net Interest Income + Non-interest Income – Non-interest Expense
• Interest Income: Loans and Securities
• Interest Expense: Deposits and Bonds
• Non-Interest Income: Credit Related Fees and Non-Credit Related
• Non-Interest Expense: Employee Compensation, Processing / Software, Occupancy, Credit / Collections and Residential Mortgage Repurchase

2.2 Modeling Approaches

#### THE BELAMY

• If the dependent and independent variables are stationary
• ADF test is done on the independent variables. Only those variables are kept, those are stationary.
• Correlation between independent variables and dependent variable is done. Only those variables are kept, those have high correlation with dependent variable.
• OLS Model is developed.
• If the dependent and independent variables are non-stationary
• ADF test is done on the independent variables. Only those variables are kept, those are non-stationary
• Co-integration between independent variables and dependent variable is done. Only those variables are kept, those are co-integrated with dependent variable.
• Correlation between independent variables and dependent variable is done. Only those variables are kept, those have high correlation with dependent variable.
• OLS Model is developed

2.3 Independent Variables

2.4 Model Outputs

• Time Period
• Historical – 44 data points (from 2005Q1 to 2015Q4)
• Forecasted – 13 data points (from 2016Q1 to 2019Q1)
• Non-interest Income and Non-interest Expense are modeled
• Non-interest Expense is modeled using the stationary model developed approach
• Non-interest Income is modeled using the non-stationary model developed approach

2.5 Model Tests

• Stationarity of dependent and independent variables:
• If the p-value <= 0.10 then the series is stationary
• If the p-value > 0.10 then the series is non-stationary
• Multi co-linearity:
• Correlation matrix is used to test multi co-linearity
• If the correlation between variables is less than 0.30 or more than -0.30 then there is low multi co-linearity
• If the correlation between variables is more than 0.70 or less than -0.70 then there is high multi co-linearity
• Significance:
• The p-value <= 0.05 then the coefficient is statistically significant
• The p-value > 0.05 then the coefficient is statistically insignificant
• Auto correlation:
• Durbin-Watson test is done
• If DW statistics is less than 1 then there is positive auto correlation
• If DW statistics is close to 2 then there is no auto correlation
• If DW statistics is more than 3 then there is negative auto correlation
• Stationarity of residual:
• If the p-value <= 0.10 then the series is stationary
• If the p-value > 0.10 then the series is non-stationary

## 3. Stationary Series

3.1 Process

• ADF test is done on the independent variables. Only stationary variables are kept (23 out of 72 variables are selected).
• Correlation between independent variables and dependent variable is done. Only those variables are kept, that have high correlation with dependent variable (2 out of 23 variables are selected).
• OLS Model is developed, checks on multi co-linearity, significance of the variable and stationary of the residuals are done (2 out of 2 variables are selected).

3.2 Dependent Variables

• It is observed that the dependent variables (Non-Interest Income 1st Difference and Non-Interest Expense 1st Difference) are stationary
• Non-Interest Income 1st Diff = Non-Interest Income (t) – Non-Interest Income (t-1)
• Non-Interest Expense 1st Diff = Non-Interest Expense (t) – Non-Interest Expense (t-1)

3.3 Independent Variables

• It is observed that out of 72 independent variables, 23 independent variables are stationary.
• If the p-value <= 0.10 then the series is stationary
• If the p-value > 0.10 then the series is non-stationary
• It is observed that no macro-economic variable has high correlation with Non-Interest Income 1st Diff. However, few macro-economic variables have high correlation with Non-Interest Expense 1st Diff.
• If correlation is more than 0.30 or less than -0.30 then it is marked as high
• It is observed that out of 23 independent variables, 2 independent variables have high correlation with Non-Interest Expense 1st Diff.

3.4 Model Development

• It is observed that the model has low R-Sq and Adj R-Sq.
• There are 2 variables in the model.
• CPI growth and GDP growth (lag 2)
• The p-value for both the variables is less than 0.05
• It is observed that there is very low multi co-linearity in the model
• Correlation between variables is less than 0.30 or more than -0.30
• It is observed that there is no auto-correlation in the model and the residual is stationary
• DW test statistics is close to 2
• The p-value of the ADF test is less than 0.10

3.5 Projection

• The projection is done for 13 Quarters
• If t = 1: Predicted Non-Interest Expense (t) = Actual Non-Interest Expense (t)
• If t > 1: Predicted Non-Interest Expense (t) = Predicted Non-Interest Expense (t-1) + Predicted Non-Interest Expense 1st Diff (t)
• The severely adverse projection is done for forecasted period

## 4. Non-stationary Series

4.1 Process

• ADF test is done on the independent variables. Only non-stationary variables are kept (49 out of 72 variables are selected).
• Co-integration between independent variables and dependent variable is done. Only those variables are kept, those are co-integrated with dependent variable (6 out of 49 variables are selected).
• OLS Model is developed, checks on multi co-linearity, significance of the variable and stationary of the residuals are done (1 out of 6 variables is selected).

4.2 Dependent Variables

• It is observed that the dependent variables are non-stationary

4.3 Independent Variables

• It is observed that out of 72 independent variables, 49 independent variables are non-stationary.
• If the p-value <= 0.10 then the series is stationary
• If the p-value > 0.10 then the series is non-stationary
• It is observed that no macro-economic variable is co-integrated with Non-Interest Expense. However, few macro-economic variables are co-integrated with Non-Interest Income.
• If the p-value <= 0.10 then the series is co-integrated
• If the p-value > 0.10 then the series is not co-integrated

4.4 Model Development

• It is observed that the model has high R-Sq and Adj R-Sq.
• There is 1 variable in the model.
• 3mT rate (difference YoY)
• The p-value for the variable is less than 0.05
• It is observed that there is positive auto-correlation in the model and the residual is stationary
• DW test statistics is less than 1
• The p-value of the ADF test is less than 0.10
• Since there is positive auto-correlation in the model, ARIMAX model is developed
• The ACF and PACF plots are generated for the OLS residual
• Based on the ACF and PACF plot, AR(1) model is developed
• Reference: Time Series Modeling and Forecasting—An Application to Bank’s Stress Testing, SAS Global Forum 2015, Paper 3338-2015
• ARIMAX model specifications
• P, D, Q = 1, 0, 0
• X = 3mT rate dyoy
• When AR(2) term was introduced in the model, it was found to be insignificant, hence higher lags for AR are not included in the model
• There are 2 variables in the model.
• AR(1) term and 3mT rate (difference YoY)
• The p-value for both the variables is less than 0.05
• The sigma2 in the coefficients table is the estimate of the variance of the error term.
• It is observed that there is no auto-correlation in the model and the residual is stationary
• DW test statistics is close to 2
• The p-value of the ADF test is less than 0.10

4.5 Projection

• The projection is done for 13 Quarters
• The dip in 2008-2009 is captured well by the model
• The severely adverse projection is done for forecasted period
• Graph
• Predicted (Blue line) – OLS model
• Forecasted (Red line) – ARIMAX model

## More Great AIM Stories

### How Gupshup Uses AI Rohit Garg has close to 7 years of work experience in field of data analytics and machine learning. He has worked extensively in the areas of predictive modeling, time series analysis and segmentation techniques. Rohit holds BE from BITS Pilani and PGDM from IIM Raipur.

## Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more. 