The demand for Credit Risk Analytics professionals across the geographies with technical expertise such as statistical modelling and domain expertise have increased exponentially over years. There is a need for professionals who have an ability to overlay data analytics and perform statistical modelling, all this with business insights.
Overall, the Indian Risk Analytics industry was pegged at USD 2.5 billion in 2020. Given that India is the preferred destination for business since most of the companies are serving global clients, analytics accounts for 25% of revenue and IT service segment is another major contributor.
In this article, we list down some leading questions that data scientists and analytics professionals would be asked during a risk analytics interview. The write-up is divided into 3 sections:
- 10 Questions on Banking
- 10 Questions on Model Development and Validation
- 10 Questions on Time Series
- Top SQL Interview Questions For Data Scientists
- Top Interview Questions To Land A Cloud Architect Job
- Top XGBoost Interview Questions For Data Scientists
- 40 Interview Questions On Statistics For Data Scientists
- Most Commonly Asked Interview Questions On Data Visualisation
- 60 Interview Questions On Machine Learning
- 10 Important Pandas Interview Questions Every Beginner Must Know
- 11 Most Commonly Asked NLP Interview Questions For Beginners
- 12 Most Popular Python Interview Questions You Must Prepare For
- Top Interview Questions For A Data Engineer Job Profile
10 Questions on Banking
- What are the 3 Pillars in Basel Framework?
- Minimum Capital Requirement: Calculated based on the risk under various heads.
- For credit risk approaches are – Standardized, F-IRB and A-IRB.
- For market risk the approach is – VaR.
- For operational risk the approaches are – Basic indicator approach, standardized approach and internal measurement approach
- Supervisory Review: It is based on Internal Capital Adequacy Assessment Plan. Gives banks the power to review their risk management system
- Market Discipline: Developing a set of disclosure requirement. Requiring institutions to disclose details or scope of application, capital, risk exposures, risk assessment process and capital adequacy of the institution
- What are the approaches for the treatment of impaired provisions?
- Standardized Approach: Regulators prescribed risk weight. If loss has occurred then it impacts Tier 1 capital
- Foundation IRB Approach: Banks own estimation of 1-year PD and regulators prescribed LGD and EAD. If EL > Provision then excess is reduced from capital. If EL < Provision then excess is added to capital
- Advance IRB Approach: Banks own estimation of PD, LGD and EAD
- What is ICAAP?
- Internal Capital Adequacy Assessment Plan (ICAAP)
- To inform the board of directors of the ongoing assessment of the bank’s risk and how the bank intends to mitigate those risks. Also, to assess the current and future capital requirements.
- What is Capital?
- Capital serves as a buffer to absorb unexpected losses and to fund ongoing activities of the bank.
- Tier 1 Capital: It is also called core capital. Minimum tier 1 capital is 6% of RWAs. It is Equals shareholder’s equity + retained earnings
- Tier 2 Capital: It is also called supplementary capital.
- Minimum tier 1 + tier 2 capital is 8% of RWAs.
- Minimum capital adequacy ratio (including the capital conservation buffer) is 10.5% of RWAs.
- It consists of Revaluation reserves + hybrid capital instruments + subordinate term debt + general loan loss reserves + undisclosed reserves, etc
- What are the key Capital Ratios?
- CET1: Common Equity under revised capital framework / Standardized approach to RWAs
- Tier 1 Ratio: Tier 1 capital under revised capital framework / Standardized approach to RWAs
- Total Capital Ratio: Total capital under revised capital framework / Standardized approach to RWAs
- What is expected loss and unexpected loss?
- Expected losses (Provision): Expected loss is the sum of the values of all possible losses, each multiplied by the probability of that loss occurring. EL = PD x EAD x LGD
- Unexpected losses (Capital): Above the EL. Calculated as standard deviation from the mean of a certain confidence interval. Banks are required to hold capital for unforeseen financial losses. UL = EAD x SQRT[(PD2 x σ2LGD) + (LGD2 x σ2PD)]
- What is IFRS9?
- International Accounting Standard Board (IASB) issued International Financial Reporting Standard 9 (IFRS9) for the recognition of impairments.
- There are 3 Stages of Impairment:
- Stage 1: Loan is originated or existing loan with no significant increase in credit risk. ECL resulting from default events in next 12 months.
- Stage 2: If credit risk has increased significantly and not considered low. Lifetime ECL are recognized.
- Stage3: Credit risk increases to a point where it is considered credit impaired. Lifetime ECL are recognized.
- IFRS9 Implications:
- Earlier recognitions of losses
- Differentiates exposures that have shown deterioration
- Requires a forecast of losses
- What is CECL?
- Financial Accounting Standard Board (FASB) issued Current Expected Credit Loss (CECL) for the recognition of impairments.
- Differences between CECL and IFRS9:
- CECL: Lifetime losses are estimated upon initial recognition of assets.
- IFRS9: 12 months ECL for performing loans and lifetime ECL for under-performing or non-performing loans.
- What is CCAR?
- Comprehensive Capital Analysis and Review
- Federal Reserve’s objective to ensure that large systematically important banking institutions have forward looking, institution specific, risk tailored capital planning process.
- To assure that banks will have sufficient funds to remain solvent during times of economy and financial distress.
- CCAR Process Flow:
- What is Stress Testing and Sensitivity Analysis?
- There are 3 FED scenarios: Base, adverse and severely adverse. The forecasting is done for 9 quarters. There are called Federal Reserve (supervisory scenarios).
- There are 2 BHC scenarios: Base and adverse. The forecasting is done for 9 quarters. These are called Bank Holding Company (internal scenarios)
- Sensitivity test:
- Sensitivity (adverse) = Default exposure (adverse) / Default exposure (base)
- Sensitivity (severely adverse) = Default exposure (severely adverse) / Default exposure (base)
10 Questions on Model Development and Validation
- What is Probability of Default (PD)?
- Average number of obligators that default in a particular rating grade in a year.
- Estimated through logistic regression model. Where the outcome is dichotomous (1, 0 – indicating default, no default).
- Few of the dependent variables: Current non-payment, Historical non-payment, Percentage of payment, Credit limit use, Maturity, etc.
- What is Exposure at Default (EAD)?
- Estimate of outstanding amount, in case the obligator defaults. Highly relevant in revolving balances
- Focus on metrics that associate the increase in balances between reference time and date of default
- Estimated through EADF: EADF = Balance at default / Balance at reference date
- What is Loss Given Default (LGD)?
- Percentage of exposure that the bank might lose if the obligator defaults. Dependent on the characteristic of the loan
- For mortgages, collateral determines LGD. For credit card, there is no collateral, hence 3 months of cash flow post default is determined. In most of the cases, LGD is empirically drawn.
- LGD = 1 – [Σ Payments for 3 months / max (Balancet, Balancet+1, Balancet+2, Balancet+3)]
- What is the difference TtC and PiT PD?
- Through the Cycle (TtC) PD: Take longer period into consideration, hence more stable.
- Point in Time (PiT) PD: In line with recent macro-economic scenarios.
- What is Information Value (IV)?
- IV is a very useful concept for variable selection during model development
- IV is widely used in credit card industry
- IV = Σ [(Distribution of Good – Distribution of Bad) x WoE
- WOE = Log (Distribution of Good / Distribution of Bad)]
- More the IV, more is the explanatory power of the variable
|C||Y=0Bad||Y=1Good||Y = 0%Bad||Y = 1%Good||WoE||IV|
- Population Stability Index (PSI)?
- A typical monitoring and maintenance requires estimation of population stability index.
- PSI = Σ [(% Actual – % Estimated) x Log (% Actual / % Estimated)]
- PSI is checked to ensure that the model is not influenced by changes in economic conditions or changes in product offering due to internal policy changes.
- PSI range:
- PSI < 0.1: No action
- Between 0.1 and 0.25: Monitor closely
- PSI > 0.25: Need to redevelop the model
- What is Confusion Matrix?
- It is a N x N matrix, where N is the number of classes being predicted.
- For dichotomous output N = 2.
- Accuracy: Proportion of the total number of predictions that were correct.
- Precision: Proportion of predicted positive cases that were correctly identified.
- Recall / Sensitivity: Proportion of actual positive cases which are correctly identified.
- Specificity: Proportion of actual negative cases which are correctly identified.
- What is Concordance?
- Concordant: A pair is concordant if 1 (observation with the desired outcome i.e. event) has a higher predicted probability than 0 (observation without the outcome i.e. non-event).
- Discordant: A pair is discordant if 0 (observation without the desired outcome i.e. non-event) has a higher predicted probability than 1 (observation with the outcome i.e. event).
- Tied: A pair is tied if 1 (observation with the desired outcome i.e. event) has same predicted probability than 0 (observation without the outcome i.e. non-event).
- What is Gain and Lift Chart?
- Gain and lift charts are mainly concerned to check the rank ordering of the probabilities.
- Gain: The percentage of targets (events) covered at a given decile level.
- Lift: It is the ratio of gain percentage to the random expectation percentage at a given decile level.
- What is KS, AUROC and Gini?
- KS or Kolmogorov-Smirnov chart: It measures performance of classification models. The KS statistic gives the separation power of the model. It is calculated as the maximum of the absolute value of the difference between cumulative non-event and cumulative event. A good model will have a KS > 30. A high value of KS will depict over-prediction in the model.
- AUROC curve: It is a fundamental tool for diagnostic test evaluation. It is plotted as a graph between sensitivity and 1-specificity, which we can get from the confusion matrix.
- An ideal model will have AUROC very close to 1.
- Lift is dependent on total response rate of the population, ROC curve on the other hand is almost independent of the response rate.
- Gini coefficient: It is the ratio between area between the ROC curve and the diagonal line and the area of the above triangle. Gini = 2 x AUC – 1
10 Questions on Time Series
- What is Time Series?
- Time Series is a series of observations measured over time.
- These observations are applicable to different fields such as: Cardiology (Heart Rate Monitor), Finance (Stock Market Data), etc.
- What is stochastic process?
- It is a collection of random variables ordered in time
- We call a stochastic process purely random or white noise process if it has zero mean and constant variance
- What is stationary process?
- If the mean and variance are constant over time and value of co-variance between the two time periods depends only on gap and not actual time period.
- Mean: E(Yt) = µ
- Variance: Var(Yt) = σ2
- Co-variance: γk = E[(Yt – µ) (Yt+k – µ)]
- If time series is stationary it will tend to return to the mean
- What is Unit root test?
- It is a test for stationarity. We need to estimate if ρ = 1.
- Where, Yt = ρ Yt-1 + Ut
- What is AR process?
- Where Yt is related to previous observations Yt-1, Yt-2, etc
- Yt = α Yt-1 + Ut
- Ut is the uncorrelated random error with zero mean and constant variance
- Yt follows first order (AR1)
- What is MA process?
- Where Yt is related to previous error terms Ut-1, Ut-2, etc
- Yt = µ + β0 Ut + β1 Ut-1
- Yt follows first order (MA1)
- What is Integrated Stochastic Process?
- I(1): integrated of order 1, that is difference it once
- I(2): integrated of order 2, that is difference it twice
- I(d): integrated of order d, that is difference it d-times
- How do you evaluate the model performance?
- Average Error: 1/n Σ (P – A).
- Mean Absolute Percentage Error (MAPE): 1/n Σ |(A – P) / A|
- Lower the Average Error and lower the MAPE, better is the model.
- Where, P is predicted and A is actual
- What is ACF and PACF plots?
- ACF (auto-correlation function): gives us the values of auto-correlation of any series with its lagged values
- PACF (partial auto-correlation function): gives us the values of correlation of the residuals (which remains after removing the effects which are already explained by the earlier lag(s)) with the next lag value.
- For AR process there is a sharp decline in the PACF plot and gradual decline in the ACF plot
- How to check stationarity of a series?
- ADF test is used to check the stationarity of a series
- If p-value < 0.05 then the series is stationary else the series is non-stationary.
|Type||Lags||Tau||Pr < Tau|
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Join Our Telegram Group. Be part of an engaging online community. Join Here.
Rohit Garg has close to 7 years of work experience in field of data analytics and machine learning. He has worked extensively in the areas of predictive modeling, time series analysis and segmentation techniques. Rohit holds BE from BITS Pilani and PGDM from IIM Raipur.