The purpose of the present report is to compare different bootstrapping application. In this order, I suppose that our operational risk can be described by a convolution of Poisson and exponential distribution. Then I have simulated data according to the above distributions and calculated VaR99.9% for full dataset and truncated dataset. After that, I have evaluated the benefit to bootstrapping frequency and severity parameters and distributions based on simulated data. Then I have compared the outcome of LDA simulations based on data with bootstrapping approach based on bootstrapping. My conclusion is that bootstrapping can be a valuable and efficient technique for estimating VaR if the probability of observability is close to one. If this condition is not met, the analyst should correct results by taking into accounts unobserved events. If the truncation point is set too high, the results may however substantially differ from the true value.
The main hypothesis:
Let us suppose that the operational risk can be described by a mixing of Poisson and exponential distribution. In order to generate data let us suppose that:
1) The frequency of events is a Poisson distribution with parameter λ=5 (on average 5 events per month);
2) The event’s severity is an exponential distribution with parameter θ=5,000 monetary units (event’s average).
The choice of distribution is dictated only by the simplicity of estimating of parameters and VaR. Results can be transposed to other distributions provided some corrections to parameters estimations techniques are implemented. The utilization of simulated data results from the necessity to know the “real distribution” for assessing the advantage of bootstrapping.
I will also consider two cases. In the first one, all events are recorded and in the second only events above a certain threshold are recorded. I will also assume that each event has only one outcome (loss) in order to avoid discussion about the influence of the dispersion of an event’s cash flow during a determined period or the respective merits of event centric/loss centric or cash-flow centric approaches.
The main hypothesis:
Let us suppose that the operational risk can be described by a mixing of Poisson and exponential distribution. In order to generate data let us suppose that:
1) The frequency of events is a Poisson distribution with parameter λ=5 (on average 5 events per month);
2) The event’s severity is an exponential distribution with parameter θ=5,000 monetary units (event’s average).
The choice of distribution is dictated only by the simplicity of estimating of parameters and VaR. Results can be transposed to other distributions provided some corrections to parameters estimations techniques are implemented. The utilization of simulated data results from the necessity to know the “real distribution” for assessing the advantage of bootstrapping.
I will also consider two cases. In the first one, all events are recorded and in the second only events above a certain threshold are recorded. I will also assume that each event has only one outcome (loss) in order to avoid discussion about the influence of the dispersion of an event’s cash flow during a determined period or the respective merits of event centric/loss centric or cash-flow centric approaches.
VaR for a convolution of Poisson and exponential distributions
Under the above hypothesis I have generated 100,000 scenarios (Monte Carlo simulations) involving the simulations of 6,000,525 events. The value obtained for a yearly VaR99.9% is 477,511.7 (for the simulated scenarios λ=5.001 and θ=5,002 can be explained by the limited number of observations and the utilization of Monte Carlo method). Nevertheless, as the difference toward the original value is minimal, I shall consider the obtained VaR result as the “true” value for VaR for these types of op risk events.
Data simulations
In real life, the analyst disposes of a limited set of events. I will now suppose that only 36 months of observations are at his disposal. The “observed” losses (drawn from the previous distributions) are summarized in the following table.
In real life, the analyst disposes of a limited set of events. I will now suppose that only 36 months of observations are at his disposal. The “observed” losses (drawn from the previous distributions) are summarized in the following table.
Table 1: Dataset

Calculating VaR with an LDA approach
The differences between the original value or parameters and the average number of losses and average severity of losses are only due to the limited number of simulations. An analyst using these data should obtain a bit higher value of VaR than previously calculated. In fact, I obtained a VaR99.9% increased by 5% that is 501,570.
In addition, the operational risk events are generally recorded above a certain threshold. If we consider a threshold of 5,000 the global number of events will reduce to 65 or 1.80(55) on average by month. The average loss for data exceeding the threshold will amount to 19,142.06.
However, I cannot directly use these data in order to make Monte Carlo simulations because part of the observations is missing. Instead, I have decided to fit conditional severity and unconditional frequency distribution to the data over the threshold and adjusted the frequency parameter.
For these purpose I have used the left-censored and shifted random variable for the exponential distribution.
(1)
where D represents the truncation point.
The first moment of this variable can be calculated from:
(2)
Because for the exponential distribution we have:

(3)
The mean for the left-censored and shifted random variable can be expressed as:

(4)
However, I was dealing with left- truncated and not left-censored data. In order to use the truncated data I have calculated the mean excess loss function:

(5)
For the exponential distribution, the cumulative distribution function is equal to:

(6)
Plugging equation (2), (4) and (6) into equation (5) I have obtained for the excess mean

(7)
On the over side an estimator of the value of the mean excess function can be calculated as:

(8)
because of the difference between the expected value of losses exceeding the threshold and the threshold. Putting (7) and (8) together, I have obtained a simple value for the parameter of the exponential distribution:

(9)
numerically:

The calculated value is exceeding the true value by some 12%.
After obtaining the value of the severity parameter, I have calculated the value of the adjusted frequency parameter:

(10)

(4)
However, I was dealing with left- truncated and not left-censored data. In order to use the truncated data I have calculated the mean excess loss function:

(5)
For the exponential distribution, the cumulative distribution function is equal to:

(6)
Plugging equation (2), (4) and (6) into equation (5) I have obtained for the excess mean

(7)
On the over side an estimator of the value of the mean excess function can be calculated as:

(8)
because of the difference between the expected value of losses exceeding the threshold and the threshold. Putting (7) and (8) together, I have obtained a simple value for the parameter of the exponential distribution:

(9)
numerically:

The calculated value is exceeding the true value by some 12%.
After obtaining the value of the severity parameter, I have calculated the value of the adjusted frequency parameter:

(10)
The observed number of events above the threshold is 21.666 (on a yearly base).
The adjustment coefficient 1-F(D) is equal to exp(-5,000/5,602.06)=0.409619
Therefore, the yearly number of events is equal to 21.667/0.409619=52.89.
This result is below the true value also by some 12%.
Finally using only truncated data I have obtained for the value at risk (VaR99.9% ) 497,692. This result is nearly equal to the Value at Risk for the full dataset (the two errors have compensated each other).
Bootstrapping frequency and severity distribution parameters
In order to make bootstrapping analysis the original number of observations for each month was taken from table 1 (column “number of events”) and so I have got 36 observations with 36 numbers each corresponding to a single month.
To make the first simulation I have picked up randomly one of the 36 numbers and repeated this action 12 times. Summing up these numbers, I have obtained the number of events for a 1-year period. The second simulation was done identically. Altogether 30,000 different simulations were executed. The obtained average yearly number of events is equal to 60.23 and is nearly identical to the number for LDA approach (60.33). 90% of results ranged from 47 to 74.
Similar calculation was done for truncated data. The yearly number of events is 21.65 (21.67 for LDA approach). 90% of results ranged from 14 to 29.
For severity parameter, I have obtained the following results:
Average loss: 5,024.16 (95 % of observations ranges from 4,358.11 to 5,738.93)
Average loss for truncated data: 10,663.69 (95 % of observations ranges from (9,896.61 to 11,949.44).
My conclusion is that bootstrapping has not brought any benefit compare to VaR; Parameters’ confidence interval can be obtained in a different and more efficient way.
Assessing VaR with Bootstrapping
Instead of using VaR in order to calculate severity or frequency I have checked if bootstrapping cannot be used to assess VaR directly without estimating frequency and severity parameters. Willing to make simulations, I have calculated 36 monthly losses for not truncated and truncated data.
I have executed 100,000 simulations for both data sets. The VaR99.9% for complete not truncated data is 497,575 and for truncated data 407,423. If the former result is similar to the one calculated under LDA framework, the former clearly underestimated the real amount of risk. To improve the result I have had to evaluate the number of events below the threshold. As the coefficient for LDA is equal to 0.409619, the number of losses below threshold should be 21.667/(1-0.409619) ≈31.22434.
The average size of the loss can be expressed as:

(11)
However, without data, we have to calculate this value with the help of the limited loss variable.

(12)
Its expected value is related to the average size of the loss by the relation:

(13)
It is also related to the average size of loss by the relation:

(14)
Combining expressions (11), ( 13), (14) and (4)
I have got:

(15)
The adjustment coefficient 1-F(D) is equal to exp(-5,000/5,602.06)=0.409619
Therefore, the yearly number of events is equal to 21.667/0.409619=52.89.
This result is below the true value also by some 12%.
Finally using only truncated data I have obtained for the value at risk (VaR99.9% ) 497,692. This result is nearly equal to the Value at Risk for the full dataset (the two errors have compensated each other).
Bootstrapping frequency and severity distribution parameters
In order to make bootstrapping analysis the original number of observations for each month was taken from table 1 (column “number of events”) and so I have got 36 observations with 36 numbers each corresponding to a single month.
To make the first simulation I have picked up randomly one of the 36 numbers and repeated this action 12 times. Summing up these numbers, I have obtained the number of events for a 1-year period. The second simulation was done identically. Altogether 30,000 different simulations were executed. The obtained average yearly number of events is equal to 60.23 and is nearly identical to the number for LDA approach (60.33). 90% of results ranged from 47 to 74.
Similar calculation was done for truncated data. The yearly number of events is 21.65 (21.67 for LDA approach). 90% of results ranged from 14 to 29.
For severity parameter, I have obtained the following results:
Average loss: 5,024.16 (95 % of observations ranges from 4,358.11 to 5,738.93)
Average loss for truncated data: 10,663.69 (95 % of observations ranges from (9,896.61 to 11,949.44).
My conclusion is that bootstrapping has not brought any benefit compare to VaR; Parameters’ confidence interval can be obtained in a different and more efficient way.
Assessing VaR with Bootstrapping
Instead of using VaR in order to calculate severity or frequency I have checked if bootstrapping cannot be used to assess VaR directly without estimating frequency and severity parameters. Willing to make simulations, I have calculated 36 monthly losses for not truncated and truncated data.
I have executed 100,000 simulations for both data sets. The VaR99.9% for complete not truncated data is 497,575 and for truncated data 407,423. If the former result is similar to the one calculated under LDA framework, the former clearly underestimated the real amount of risk. To improve the result I have had to evaluate the number of events below the threshold. As the coefficient for LDA is equal to 0.409619, the number of losses below threshold should be 21.667/(1-0.409619) ≈31.22434.
The average size of the loss can be expressed as:

(11)
However, without data, we have to calculate this value with the help of the limited loss variable.

(12)
Its expected value is related to the average size of the loss by the relation:

(13)
It is also related to the average size of loss by the relation:

(14)
Combining expressions (11), ( 13), (14) and (4)
I have got:

(15)
In this way, the average calculated loss was 1,259.21.
The adjustment of data below the threshold could be:
31.22434*1,259.21=39,317.9 given a total VaR99,9% 446,741.
This figure is still much lower than the true value of VaR99,9% ( 477,512) or value obtained with LDA with censored data ( 497,682) or value assessed by bootstrapping with complete data (497,575).
The underestimation is due to the fact that I was using expected values for low severity events. In the 99.9 quantile, these events should be rarer than higher events. An ad hoc correction procedure will be to replace the expected size of the loss by the limited loss variable. Now the results for VaR are 510,696 PLN a bit too conservative result.
A third empirical much simpler and ad hoc solution is to multiply the number of losses by half of the threshold, which gives the better estimates (484,484).
I have also checked VaR estimates under a bootstrapping framework with data truncated below 500 (without correction). The calculated VaR (30,000 simulations) was 489.198.9. A small correction for truncation (with any above method) should give correct estimate.
Usefulness of bootstrapping
If we have complete data, bootstrapping is as useful as LDA for calculating VaR. If we are not sure of the quality of parameters’ estimation under LDA framework, bootstrapping should be use, as it gives correct results, instead of LDA.
If we have truncated data, there is a clear bias in using bootstrapping technique. The size of the bias will depend on the location of the truncation point. If it is sufficiently “low”, a correction for data below the truncation point should give reasonable estimate. If the truncation point corresponds to a high number bootstrapping, even after correction, may give misleading results.
The adjustment of data below the threshold could be:
31.22434*1,259.21=39,317.9 given a total VaR99,9% 446,741.
This figure is still much lower than the true value of VaR99,9% ( 477,512) or value obtained with LDA with censored data ( 497,682) or value assessed by bootstrapping with complete data (497,575).
The underestimation is due to the fact that I was using expected values for low severity events. In the 99.9 quantile, these events should be rarer than higher events. An ad hoc correction procedure will be to replace the expected size of the loss by the limited loss variable. Now the results for VaR are 510,696 PLN a bit too conservative result.
A third empirical much simpler and ad hoc solution is to multiply the number of losses by half of the threshold, which gives the better estimates (484,484).
I have also checked VaR estimates under a bootstrapping framework with data truncated below 500 (without correction). The calculated VaR (30,000 simulations) was 489.198.9. A small correction for truncation (with any above method) should give correct estimate.
Usefulness of bootstrapping
If we have complete data, bootstrapping is as useful as LDA for calculating VaR. If we are not sure of the quality of parameters’ estimation under LDA framework, bootstrapping should be use, as it gives correct results, instead of LDA.
If we have truncated data, there is a clear bias in using bootstrapping technique. The size of the bias will depend on the location of the truncation point. If it is sufficiently “low”, a correction for data below the truncation point should give reasonable estimate. If the truncation point corresponds to a high number bootstrapping, even after correction, may give misleading results.

Wow, an interesting blog It treats operational risk both from academic point of view, giving details about underlying probability theory, as well as from practitioners point of view, presenting problems they face every day. Is there any book where I can read more about operational risk measurement?
OdpowiedzUsuńBy the way, the Recommendation M on the blog doesn’t seem to be up to date .
I don’t know any, but as for the truncated data problem, there is an interesting paper on Google Scholar:
Usuńwww.ams.sunysb.edu/~xizhou/papers/TruncatedData_R2.pdf
Authors use Bayesian estimation to deal with it.