A wellfunctioning Loss Given Default (LGD) model is expected to be present with all Advanced IRB banks as required by the Regulation (EU) No 575/2013, later referred to as CRR and subsequently by a set of regulatory papers released by the EBA and ECB.
EBA Guidelines on the PD estimation, the LGD estimation and the treatment of defaulted exposures (EBA/GL/2017/16) further referred as “GL on PD&LGD estimation” distinguishes two steps in the development of the LGD framework:
 Model development in LGD estimation as per section 6.2 (which is commonly known as the Risk Differentiation Function)
 LGD calibration as per section 6.3
In this article, we will be focusing solemnly on the first step; thus, it will be mainly concerned with how to differentiate obligors/facilities in terms of their relative risk, measured in LGD. The second step (LGD calibration), we will address as a topic with a separate blog post.
Below we will first describe how typical LGD data looks like (the way we usually encountered them) and then we will describe several statistical approaches to LGD modelling. It should be noted that our objective is a highlevel overview of the prospective methodologies, that we have observed in the market, however the reader should not presume that this is an exhaustive list of modelling approaches. Furthermore, in each case the selected approach should be tailored to the individual specificities of the Institution’s internal processes, type of the portfolio and the geography where the Institution is operating.
Target LGD variable and risk drivers
As in any statistical model, in order to construct LGD, one requires to define the target variable (LGD) and the explanatory variables (referred as risk drivers in the regulatory modelling community).
It should be noted that in the current post only a highlevel idea for the target LGD will be presented and the key reference point on this topic is the section 6.3 within the “GL on PD&LGD estimation“. Keeping this in mind, for the purpose of explaining the ranking mechanism, target LGD can be defined as follows:
Where:
is the index of the observation in the data set (can be either obligor or facility depending on level of modelling);  
exposure at the moment of default;  
discounted sum of recovery amounts received by the institution after the moment of default;  
discounted sum of additional drawings;  
discounted sum of direct and indirect costs respectively. 
The discount rate according to the paragraph 143 within the “GL on PD&LGD estimation” should equal to 5% + three months Euribor. It is important to mention that LGD measurement requires multiyear loss observations, which means that the recoveries for the defaulted loans are usually collected over a long period of time (for example it can be around 5 years for mortgage loans)
Risk drivers include different types of variables such as transaction or obligor related characteristics, geographical location or any other internal or external factor, limited only by local legislation with respect to data protection or discrimination. Common examples of risk drivers are: LoantoValue ratio, exposure size, age of relationships with the bank, etc. Due to the fact that in every country similar risk drivers are contributing to the final LGD measurement, it is not unlikely to find in every country specific LGD models (e.g. mortgages) reflect similar set of risk drivers. Though across countries there are always differences, so the same LGD model can not possibly be used for several countries in this context.
Below is an example of how the SAS data set for LGD might look like: (please click to enlarge)
The above table includes the following columns:
cust_id  customer (obligor) ID 
def_date  date of the default 
EAD  exposure at the moment of default 
os_before_def  exposure outstanding amount within 12 months before the moment of default. This is the “observed” exposure amount when compared to EAD (at the moment of model application EAD is not available as the obligor has not defaulted yet) 
LTV  ratio of loan amount to the value of the collateral 
N_of_loans  number of obligor’s loans 
region  geographical region encoded as discrete numeric values 
age_of_c  age of customer (in years) 
age_of_r  age of relationships with the Bank (in months) 
arr_in_12m  number of months spent in arrears 
income  income of the customer 
LTI  ration of loan amount to income of the customer 
Cure_event  flag encoding cure event outcome 
LGD  loss given default (target variable for modelling) 
Having discussed the data, let’s get directly to the modelling!
Modelling approaches
Suppose that the data set described earlier, has been constructed by the modelling team. Let’s discuss what are the common approaches for LGD modelling. The approaches for modeling vary in terms of modelling complexity, requirements of the data availability and the expected model performance.
The picture below provides a list of LGD modelling approaches sorted by complexity of implementation. (please click to enlarge)
Approach 1: Direct prediction of LGD
The simplest approach is to model LGD as a linear regression:
Where X represent the risk drivers, described in the previous section, β are the estimation coefficients and ε is the “noise”. This approach has the least number of requirements in terms of data availability and granularity; however, the expected model performance is relatively low. Another extension of Approach 1 is to use multinomial logistic regression. This could be helpful to tackle the Ushape distribution of the target variable as it can group concentrations of LGD estimates at around the values 0 and 1 into separate model output classes.
Finally, the previous extension of approach 1 can be generalized into a nested logistic and linear regressions, where logistic regression will predict if LGD is equal to 0 (or a small value around 0) and if not, linear regression could be applied to predict LGD when it is higher than 0.
Approach 2: LGD estimate through Probability of Cure, Loss Given Cure and Loss Given Liquidation
In order to formulate the approach let’s discuss the simplified recovery process of the Bank.
When the loan enters the default state, there are usually two possible outcomes. Either the loan is cured, and obligor continues to pay on a periodical basis or in case if the obligor is not able to pay – the loan is liquidated by the Bank. In this case, Bank starts the internal recovery process and any collateral associated with the loan is processed so that the underlying collateral value can be extracted. If the loan is cured, then there is almost no loss (the possible loss might be related to costs spent on loan restructuring and loss in time value of money). If the collateral is liquidated, the loss will likely depend on the value of the sold collateral.
Based on the above, LGD can be modeled as the average of Loss given Cure (LGC) and Loss given Liquidation (LGL) weighted by the Probability of Cure (PC):
Therefore, in order to calculate LGD, one needs to obtain the three components: Probability of Cure, Loss Given Cure and Loss Given Liquidation. (please click to enlarge)
Each of the above components can be modeled via a separate regression. In practice, Loss Given Cure is relatively immaterial (usually below 5%), therefore often no regression for it is developed, instead a simple average for the observed LGC is used. Below we will shortly present how the Probability of Cure and Loss Given Liquidation models can be developed.
Loss Given Liquidation model
The simplest way to model Loss Given Liquidation is to calculate it as LGL = 1 – RR, where RR is Recovery Rate which is modeled using a single linear regression:
Note, that compared to the similar regression formula in the Approach 1, no cure events should be included in the estimation of the above Recovery Rate. Furthermore, risk drivers selected in the above Recovery Rate model should not necessarily be the same as the ones selected based on Approach 1.
In practice, it is often the case that Los Given Liquidation is defined by several sources of cash flows (as mentioned on picture 3) which have different economic origin, such as: recoveries from collateral (sometimes several collateral of different types – real estate, securities, saving deposits, etc.), recoveries nonrelated to collateral execution, additional drawings, direct and indirect costs.
In case if several of the cash flow sources are material and nonhomogeneous, it might be beneficial (as well as expected by the EBA according to the paragraph 128 of the “GL on PD&LGD estimation”) to model different sources of cash flows via separate regression models, which should be later aggregated in order to obtain the predicted RR. From a practical perspective this approach can be simplified as follows:
 For components which are either immaterial or are obtained based on allocation methodologies (e.g. indirect costs) – a simple average can be used
 Different types of collaterals could be taken into account in one regression by including dummy or categorical variables which would represent type of the collateral
 Sources of cash flows with different economic origin could be aggregated together if they are immaterial compared to the overall recovery rate
Probability of Cure model
Probability of Cure can be modelled via logistic regression:
Where X represents the risk drivers, and β are the estimation coefficients.
It should be mentioned that modelling Probability of Cure (‘PC’) allows for different levels of complexity. Cure event for some portfolios might happen after 3 (or more) years spent in default status. Therefore, Probability of Cure model has to take into account several years of the default process (up to maximum workout period as defined in paragraph 156 of the “GL on PD&LGD estimation”).
Suppose that Modelling Team is developing a Cure Probability model for a mortgage portfolio with maximum workout period of 5 years. The simplest way to model LGC is to use a basic logistic regression which predicts a cure event within 5 years from the moment of default. In this case, all the observations which defaulted within 5 years from the current moment and where the recoveries are still continuing, have to be excluded from the model development data set, as for these observations the outcome (cure or liquidation) are still unknown. If a significant part of the data from the most recent time periods is excluded, one can challenge the representativeness of the model estimates with respect to the sample which the model is expected to be applied on. In order to overcome this issue, one can consider developing several models with only one year of outlook horizon to forecast the conditional probabilities (12 months ahead) of cure and liquidation. The picture below summarizes a possible path of the facility being in default.(please click to enlarge):
In this example, facility stays in default status for two full years and cures during the third year. The cumulative probability of cure can be calculated as follows:
Where
are the respective conditional 1year probabilities of liquidation and cure, N – is the maximum workout period.
In practice conditional probability of cure is estimated using regression only up to 2 or 3 years after default and for the remaining years – respective average conditional cure rates are used. The advantage of the above cure probability model structure is that one can include the most recent data in the model estimation, therefore making the model more reliable for the application purposes.
Key advantages of the second approach
It captures the peculiarities of the recovery and restructuring process within the Bank, therefore it will limit overfitting and make the model more reliable from a perspective of conceptual soundness. In most cases it results in a higher ranking power compared to Approach 1.
As both Cure Probability and Recovery Rate models have to be developed, a more detailed data analysis has to be conducted, therefore better assurance on the underlying data quality can be obtained.
In case of the conditional Cure Probability model structure, more recent data can be included in the model development.
Approach 3
Approach 3 can be summarized as a group of modelling techniques which further generalize Approach 2. For instance, suppose that one develops a mortgage LGD model. According to Approach 2, either Cure or Liquidation outcomes are expected to happen. In practice, in some geographies, it can also be the case when there is no Cure event and at the same time mortgage collateral cannot be liquidated, due to absence of prospective buyers or governmental protection schemes. In this case, collateral is written off, therefore a new default outcome should be introduced and LGD can be modeled as follows:
Where LGW is the Loss Given Writeoff which represents the loss in the situation when collateral can not be sold and therefore has to be writtenoff.
In general, multiple variations of this approach could be introduced. For instance, cure event could be further split into selfcure and restructuring. Several examples of the possible options are presented on the picture 2.
Conclusion
Development of the LGD Risk Differentiation Function described above is just one of the steps in the IRB LGD framework. Other steps include the calibration, the estimation of LGD Downturn adjustment and the Margin of Conservatism. Additionally, a separate LGD model (including all the aforementioned components) has to be developed for the indefault obligors.
Successful development of the IRB LGD model requires combination of regulatory knowledge, experience in banking and credit risk modelling as well as strong programming skills.
If you have more questions on LGD modelling, please contact the Regulatory Quant Team:

Kaan Aksel Telefon: +49 69 9585 5874 

Petr Geraskin Telefon: +49 69 9585 6006 