Abstract:
This article shows how it is possible to modify the LDA in order to measure operational risk by any bank. The key challenge is the scarcity of data. After describing the way banks deal with this issue, I consider an alternative way of increasing the number of observations by considering bank’s states of nature. I show how to increase the availability of data in order to assess the frequency and severity distribution. Finally I consider the conformity of the proposed approach with legal requirements and influence on the stability of risk parameters.
The data scarcity issueOne of the key challenges in modeling operational risk is the scarcity of data. The majority of banks have in the best case no more than 10 years of loss data. The number of collected data is relatively small even for the biggest financial institutions. A database of 20,000 losses seems ridiculous if we compare it to database used for modeling market or credit risk and 20,000 observations is a dream for small banks.
On the other side a prerequisite for LDA approach modeling is obligation for the analyst to split data into homogeneous clusters. In our case the 20,000 losses relate to different business lines and event types which are far for being homogenous. For instance the Basel II event type “Execution, delivery and process management” includes events related to “transaction capture” as well as events related to “vendors and suppliers” (so called events categories). Let us now consider a single event category like “theft and fraud”. One cannot argue that this category is homogenous. Events related to checks are not the same as events related to cash, credit cards, internet transactions. The frequency of some frauds is stable, of some others is increasing and for quite others is diminishing. The reason is that causes of events and operational environment are not the same.
So now make a bit of calculations. Let us try to divide our 20,000 database into homogenous clusters. We have 7 event types and 8 business lines, we get on average only 400 observations per cluster. If we take into account the 20 Basel categories we have approximately 140 transaction per cluster. If we want to have real homogeneous clusters the number will be even lower. We have to consider that the events are not uniformly distributed into the different event categories and business lines. Some 10-12 combinations of categories and business lines are dominating. So it is possible to have a situation of a cluster with only 1 or 2 observations, the majority of others with 5-10, some well supported with 200 or more data and only a few with more than 1,000 data. In this situation statistical analysis of data is not possible for the majority of clusters. One cannot reasonably assess parameter or a function with so few observations.
This situation affects not only small financial institutions (insufficient amount of data for the majority of clusters) but even the biggest ones (insufficient amount of data for some clusters).
The way that banks deal with data scarcity
Several solutions have been proposed. They belong to 4 groups:
1) The first approach is to pool internal data disregarding the fact that they are not homogenous and are probably in some extend correlated. For instance the data are not split into Basel business lines and/or divided into event categories. By this way the analyst have only 7 clusters. The amount of data is now satisfactory but new issues have appeared or have gained in importance. LDA approach implicitly considers that the losses in time may be described by a stationary process (after taking into account environment factors). It is much easier to transform homogenous losses data into a stationary process than heterogeneous data. Moreover the analyst is now assuming without proof that autocorrelation between pooled events is zero. However, the Basel Committee and UE have recommended, “that bank may be permitted to use internally determined correlations, in operational risk losses across individual operational risk estimations, provided it can demonstrate to the satisfaction of the national supervisor that its systems for determining correlations are sound, implemented with integrity and take into account the uncertainty surrounding any such correlation estimates. The bank validates its correlation assumptions, using appropriate quantitative and qualitative techniques” (BIS, 2006, p. 152)[1]. Pooling data reduces the number of estimates and is a way of escaping BIS, EU Directive’s formal requirements[2], EBA Consultation Paper (EBA/CP/2-14/08) [3].
2) The second approach is to pool internal and external data. However internal and external data may be not comparable: for instance earthquakes in Central Europe have not the same frequency and magnitude as in Japan or California. External data may be used if the analyst demonstrate the relevance of external data. In fact, this approach rely perhaps to much on honesty, knowledge, judgment and capacities of the analyst.
3) The third approach uses scenario analysis. The role of the analyst is even greater.
4) The fourth one is to use near-misses i.e. losses that were successfully prevented (by luck or managerial action). This approach developed by Muermann and Oktem [4] requires to collect near-misses. However the quality of these data is poorer, the will or determination of collecting them are certainly not as strong as in case of losses.
As we have seen none of the proposed solutions was really satisfactory for such a purpose even if they give valuable insight.
How can we increase the number of observations in order to assess the frequency?In order to make the considerations more understandable let us ponder an example. Suppose that we have to assess risk related to bank’s settlements. In general each bank can deal with 3 types of settlements: 1) settlements between bank’s internal accounts or bank’s customers, 2) settlement of a single transaction with an external bank (the settlement instructions may be channeled for instance by SWIFT) 3) settlement of a block of transactions (i.e. with a clearing institution in the case of bialteral or multilateral clearing). For each type of settlement some errors occur, the amounts can be tiny (settlement of an instruction given by a retail customer) as well as huge (case of the settlement by the back-office of a transaction concluded by the bank’s dealing room in the money marked or forex market. The rate of recovery is in general high but can vary – generally the number or party involved reduces this rate.
So let us consider the case of internal settlements.
The daily number of postings is substantial. If we consider the number of postings in a 3 year's period this number is quite high (3 years*number of days per year*average number of postings per day) and comparable to the amount of data used in market or credit risk. For simplicity we will consider that this number is N for a 3 year's period. The reader may of course improve the model by taking into account the seasonalities (differences in the number of posting for different days of the week, month of the years, influence of bank's holidays).
Let us also consider each posting as a single observation. Nearly all postings are done properly, a tiny part of them causes troubles but without material loss for the bank and only a few ends up with a loss (say “n” postings with errors).
So each posting or trial may have only 2 states: a state without any loss or a state with a loss.
If we assume that errors in postings are independent, we may consider that we have the situation of N drawings with replacement. The probability that a posting will end up with a loss is equal to p= n/N, the probability of the opposite state
1-p=1 –n/N. We see that this situation can be modeled with a simple binomial distribution.
The possible outcomes for a 1 year period are:
This article shows how it is possible to modify the LDA in order to measure operational risk by any bank. The key challenge is the scarcity of data. After describing the way banks deal with this issue, I consider an alternative way of increasing the number of observations by considering bank’s states of nature. I show how to increase the availability of data in order to assess the frequency and severity distribution. Finally I consider the conformity of the proposed approach with legal requirements and influence on the stability of risk parameters.
The data scarcity issueOne of the key challenges in modeling operational risk is the scarcity of data. The majority of banks have in the best case no more than 10 years of loss data. The number of collected data is relatively small even for the biggest financial institutions. A database of 20,000 losses seems ridiculous if we compare it to database used for modeling market or credit risk and 20,000 observations is a dream for small banks.
On the other side a prerequisite for LDA approach modeling is obligation for the analyst to split data into homogeneous clusters. In our case the 20,000 losses relate to different business lines and event types which are far for being homogenous. For instance the Basel II event type “Execution, delivery and process management” includes events related to “transaction capture” as well as events related to “vendors and suppliers” (so called events categories). Let us now consider a single event category like “theft and fraud”. One cannot argue that this category is homogenous. Events related to checks are not the same as events related to cash, credit cards, internet transactions. The frequency of some frauds is stable, of some others is increasing and for quite others is diminishing. The reason is that causes of events and operational environment are not the same.
So now make a bit of calculations. Let us try to divide our 20,000 database into homogenous clusters. We have 7 event types and 8 business lines, we get on average only 400 observations per cluster. If we take into account the 20 Basel categories we have approximately 140 transaction per cluster. If we want to have real homogeneous clusters the number will be even lower. We have to consider that the events are not uniformly distributed into the different event categories and business lines. Some 10-12 combinations of categories and business lines are dominating. So it is possible to have a situation of a cluster with only 1 or 2 observations, the majority of others with 5-10, some well supported with 200 or more data and only a few with more than 1,000 data. In this situation statistical analysis of data is not possible for the majority of clusters. One cannot reasonably assess parameter or a function with so few observations.
This situation affects not only small financial institutions (insufficient amount of data for the majority of clusters) but even the biggest ones (insufficient amount of data for some clusters).
The way that banks deal with data scarcity
Several solutions have been proposed. They belong to 4 groups:
1) The first approach is to pool internal data disregarding the fact that they are not homogenous and are probably in some extend correlated. For instance the data are not split into Basel business lines and/or divided into event categories. By this way the analyst have only 7 clusters. The amount of data is now satisfactory but new issues have appeared or have gained in importance. LDA approach implicitly considers that the losses in time may be described by a stationary process (after taking into account environment factors). It is much easier to transform homogenous losses data into a stationary process than heterogeneous data. Moreover the analyst is now assuming without proof that autocorrelation between pooled events is zero. However, the Basel Committee and UE have recommended, “that bank may be permitted to use internally determined correlations, in operational risk losses across individual operational risk estimations, provided it can demonstrate to the satisfaction of the national supervisor that its systems for determining correlations are sound, implemented with integrity and take into account the uncertainty surrounding any such correlation estimates. The bank validates its correlation assumptions, using appropriate quantitative and qualitative techniques” (BIS, 2006, p. 152)[1]. Pooling data reduces the number of estimates and is a way of escaping BIS, EU Directive’s formal requirements[2], EBA Consultation Paper (EBA/CP/2-14/08) [3].
2) The second approach is to pool internal and external data. However internal and external data may be not comparable: for instance earthquakes in Central Europe have not the same frequency and magnitude as in Japan or California. External data may be used if the analyst demonstrate the relevance of external data. In fact, this approach rely perhaps to much on honesty, knowledge, judgment and capacities of the analyst.
3) The third approach uses scenario analysis. The role of the analyst is even greater.
4) The fourth one is to use near-misses i.e. losses that were successfully prevented (by luck or managerial action). This approach developed by Muermann and Oktem [4] requires to collect near-misses. However the quality of these data is poorer, the will or determination of collecting them are certainly not as strong as in case of losses.
As we have seen none of the proposed solutions was really satisfactory for such a purpose even if they give valuable insight.
How can we increase the number of observations in order to assess the frequency?In order to make the considerations more understandable let us ponder an example. Suppose that we have to assess risk related to bank’s settlements. In general each bank can deal with 3 types of settlements: 1) settlements between bank’s internal accounts or bank’s customers, 2) settlement of a single transaction with an external bank (the settlement instructions may be channeled for instance by SWIFT) 3) settlement of a block of transactions (i.e. with a clearing institution in the case of bialteral or multilateral clearing). For each type of settlement some errors occur, the amounts can be tiny (settlement of an instruction given by a retail customer) as well as huge (case of the settlement by the back-office of a transaction concluded by the bank’s dealing room in the money marked or forex market. The rate of recovery is in general high but can vary – generally the number or party involved reduces this rate.
So let us consider the case of internal settlements.
The daily number of postings is substantial. If we consider the number of postings in a 3 year's period this number is quite high (3 years*number of days per year*average number of postings per day) and comparable to the amount of data used in market or credit risk. For simplicity we will consider that this number is N for a 3 year's period. The reader may of course improve the model by taking into account the seasonalities (differences in the number of posting for different days of the week, month of the years, influence of bank's holidays).
Let us also consider each posting as a single observation. Nearly all postings are done properly, a tiny part of them causes troubles but without material loss for the bank and only a few ends up with a loss (say “n” postings with errors).
So each posting or trial may have only 2 states: a state without any loss or a state with a loss.
If we assume that errors in postings are independent, we may consider that we have the situation of N drawings with replacement. The probability that a posting will end up with a loss is equal to p= n/N, the probability of the opposite state
1-p=1 –n/N. We see that this situation can be modeled with a simple binomial distribution.
The possible outcomes for a 1 year period are:
X(Ω) = {0;1;2,……k, ….. N /3}
The probabilities:
As the number N is high and n small the distribution may be approximated by the Poisson distribution:

where λ= n/N*365*number of postings per day is the yearly average number of postings ending with a loss.
We have assess the frequency of event losses not on the number of observed losses but relating it on the number of observations (postings). If the number of postings is changing through the time and if we assume that the probability of loss on a single posting is fixed, we can easily adjust the figure to take into account the forecasted change in the number of postings.
A similar approach can be adapted to other risk categories. Suppose that we want to assess the function related to the frequency of employees discrimination. To model the frequency we only need to know:
1) the (average) number of employees in each year,
2) the number of disclosed cases,
3) number of years of observations,
and to assume the number of disclosed cases by an employee during a year period cannot exceed 1 and there are no correlation between disclosed cases of discrimination. We see that the number of observations is now the product of the number of employees and the number of years. The probability that an employee will suffer discrimination in a yearly period (if we only base on historical data) is the number of disclosed cases divided by the number of observations. The distribution may also be modeled as a Poisson function. It is also fairly easy to adjust the model to the situation of discrimination of a group of employees.
So in order to assess the frequency of events the analyst should carefully define the observations. Each observation is related to the possible states of nature of the Bank (a state of nature or situation without any loss or state with a loss). By doing this the number of observations is increased.
Another consequence which is going to become clearer in the last part of the article is that we may at least avoid some of the consequences related to the probability of obsevability of a loss (due to the exisetnce of thresholds in recording).
How can we increase the number of observations in order to assess the severity of events?
The situation is only apparently more complicated:
We have assess the frequency of event losses not on the number of observed losses but relating it on the number of observations (postings). If the number of postings is changing through the time and if we assume that the probability of loss on a single posting is fixed, we can easily adjust the figure to take into account the forecasted change in the number of postings.
A similar approach can be adapted to other risk categories. Suppose that we want to assess the function related to the frequency of employees discrimination. To model the frequency we only need to know:
1) the (average) number of employees in each year,
2) the number of disclosed cases,
3) number of years of observations,
and to assume the number of disclosed cases by an employee during a year period cannot exceed 1 and there are no correlation between disclosed cases of discrimination. We see that the number of observations is now the product of the number of employees and the number of years. The probability that an employee will suffer discrimination in a yearly period (if we only base on historical data) is the number of disclosed cases divided by the number of observations. The distribution may also be modeled as a Poisson function. It is also fairly easy to adjust the model to the situation of discrimination of a group of employees.
So in order to assess the frequency of events the analyst should carefully define the observations. Each observation is related to the possible states of nature of the Bank (a state of nature or situation without any loss or state with a loss). By doing this the number of observations is increased.
Another consequence which is going to become clearer in the last part of the article is that we may at least avoid some of the consequences related to the probability of obsevability of a loss (due to the exisetnce of thresholds in recording).
How can we increase the number of observations in order to assess the severity of events?
The situation is only apparently more complicated:
Let us return to the previous case of settlements. There are 2 factors affecting the severity of the loss for a particular settlement: The first is the amount of the transaction (or posted amount), the second the relation between the posted amount and the loss amount. For the first one we have plenty of data and there is no particular difficulty to assess a distribution. For the second parameter data are of course limited (the reason is the limited frequency of operational events) but if we have at least one observation, we can use it to calculate this proportion. We can also take into account the recovery rate by using net loss in estimating the second parameter instead of gross loss.
If we have to estimate the severity for damages to physical assets, we can make the estimation in a similar way. Just use the book value of assets to estimate the severity function and use the proportion by which provisions are made on these assets in order to estimate the relation between book values and losses.
In the case of legal cases the approach is similar – we have only to use legal claims against the bank and not cases lost by the bank and take the proportion between the size of loss and the size of the legal claims.
If a moderate threshold on losses recording was imposed, this approach permits to make an assessment of potential losses on small transactions.
Of course the proposed approach does not capiture the existence of small losses on big transactions but their global impact should be considered as limited as long as the treshold does not exceed a certain amount.
Conformity with the legal requirementsThe EU Directive [2] stipulates that “The risk measurement system shall capture the major drivers of risk affecting the shape of the tail of the loss estimates”. The proposed approach directly addresses this requirement. The risk drivers of frequency and severity are directly included into the model. The possibility of including additional risk drivers is also obvious. On the contrary the model using only loss data cannot really meet this requirement. We know that some losses have occurred but we cannot give any explanation about the frequency and severity. For instance in the case of severity CPI is not an explanation. All prices do not behave in the same manner. The average size of losses will defer across business lines and event types. With the exception of countries suffering severe inflation this factor is practically useless (it cannot explain why severity of some losses is changing in a different rate than others).
Moreover the proposed approach directly incorporate loss date with internal business environment data. The sensitivity of business estimates to changes on these factors can be easily derived. Also the proposed operational risk measurement model is more closely integrated with the business and risk management system than the model using only losses.
Consequences on the stability of parameters and influence of extreme observations.
The approach permits to extend the amount of data. For that reason its easier to justify the choice of distributions as well as make more precise estimation of distribution parameters. As we have more data compared to the case of using only loss data, the influence of an additional extreme observation will be probably reduced and results should be more stable in time.
Conclusion:
A practical solution to deal with the problem of data scarcity is to consider not only losses but evolution of the business environment or bank’s states of nature. As we dispose of a large number of data, it is possible to obtain more precise, stable results which seem to meet better formal EU Directive requirements. The proposed solution permits also to small banks to model operational risk provided they have a minimum amount of loss data (at least one observation for each risk category/business line). For large institutions the approach enables to model operational risk for more homogenous clusters and avoids the flaws related to hazardous data pooling.
[1] BIS (2006) Internal Convergence of Capital Measurement and Capital Standards, http://www.bis.org/publ/bcbs128.pdf
If we have to estimate the severity for damages to physical assets, we can make the estimation in a similar way. Just use the book value of assets to estimate the severity function and use the proportion by which provisions are made on these assets in order to estimate the relation between book values and losses.
In the case of legal cases the approach is similar – we have only to use legal claims against the bank and not cases lost by the bank and take the proportion between the size of loss and the size of the legal claims.
If a moderate threshold on losses recording was imposed, this approach permits to make an assessment of potential losses on small transactions.
Of course the proposed approach does not capiture the existence of small losses on big transactions but their global impact should be considered as limited as long as the treshold does not exceed a certain amount.
Conformity with the legal requirementsThe EU Directive [2] stipulates that “The risk measurement system shall capture the major drivers of risk affecting the shape of the tail of the loss estimates”. The proposed approach directly addresses this requirement. The risk drivers of frequency and severity are directly included into the model. The possibility of including additional risk drivers is also obvious. On the contrary the model using only loss data cannot really meet this requirement. We know that some losses have occurred but we cannot give any explanation about the frequency and severity. For instance in the case of severity CPI is not an explanation. All prices do not behave in the same manner. The average size of losses will defer across business lines and event types. With the exception of countries suffering severe inflation this factor is practically useless (it cannot explain why severity of some losses is changing in a different rate than others).
Moreover the proposed approach directly incorporate loss date with internal business environment data. The sensitivity of business estimates to changes on these factors can be easily derived. Also the proposed operational risk measurement model is more closely integrated with the business and risk management system than the model using only losses.
Consequences on the stability of parameters and influence of extreme observations.
The approach permits to extend the amount of data. For that reason its easier to justify the choice of distributions as well as make more precise estimation of distribution parameters. As we have more data compared to the case of using only loss data, the influence of an additional extreme observation will be probably reduced and results should be more stable in time.
Conclusion:
A practical solution to deal with the problem of data scarcity is to consider not only losses but evolution of the business environment or bank’s states of nature. As we dispose of a large number of data, it is possible to obtain more precise, stable results which seem to meet better formal EU Directive requirements. The proposed solution permits also to small banks to model operational risk provided they have a minimum amount of loss data (at least one observation for each risk category/business line). For large institutions the approach enables to model operational risk for more homogenous clusters and avoids the flaws related to hazardous data pooling.
[1] BIS (2006) Internal Convergence of Capital Measurement and Capital Standards, http://www.bis.org/publ/bcbs128.pdf
[2] EU (2006) “Directive 2006/48/EC of The European Parliament and of the council of 14 June 2006 relating to the taking up and pursuit of the business of credit institutions” - Annex X Operational risk point 1.21.10 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2006:177:0001:0200:EN:PD
[3] EBA(2014) "Draft Regulatory Technical Standards on assessment methodologies for the Advanced Measurement Approaches for the operational risk under Article 312 of Regulation (EU) No 575/2013".
https://www.eba.europa.eu/regulation-and-policy/operational-risk/regulatory-technical-standards-on-assessment-methodologies-for-the-use-of-amas-for-operational-risk
[4] Muermann, A., and Oktem, U. (2002) „The Near-Miss Management of Operational Risk”, The Journal of Risk Finance 4 (1), pp.25-36. http://opim.wharton.upenn.edu/risk/downloads/Areas%20of%20Research/Near%20Miss/02-02-MO%20published.pdf
[3] EBA(2014) "Draft Regulatory Technical Standards on assessment methodologies for the Advanced Measurement Approaches for the operational risk under Article 312 of Regulation (EU) No 575/2013".
https://www.eba.europa.eu/regulation-and-policy/operational-risk/regulatory-technical-standards-on-assessment-methodologies-for-the-use-of-amas-for-operational-risk
[4] Muermann, A., and Oktem, U. (2002) „The Near-Miss Management of Operational Risk”, The Journal of Risk Finance 4 (1), pp.25-36. http://opim.wharton.upenn.edu/risk/downloads/Areas%20of%20Research/Near%20Miss/02-02-MO%20published.pdf
Robert M. Korona

Brak komentarzy:
Prześlij komentarz