It’s a model – but is it looking good? When banks’ internal models may be more style than substance.

Tobias Neumann.

Most large banks assess the capital they need for regulatory purposes using ‘internal models’.  The idea is that banks are in a better position to judge the risks on their own balance sheets.  But there are two fundamental problems that can arise when it comes to modelling.  The first is complexity.  We live in a complex world, but does that mean a complex model is always the best way of dealing with it? Probably not. The second problem is a lack of ‘events’ (eg defaults).  If we cannot observe an event, it is difficult to model it credibly, so internal models may not work well.

Matching capital with risk is indeed important.  But it comes at the cost of giving banks greater discretion, which can be problematic.  Different modelling approaches can lead to highly variable capital outcomes for similar risks.  Discretion might also tempt some to game the system, as has been highlighted in the US Senate in the JP Morgan Whale case.  Even short of outright manipulation there is evidence from academia that internal models can fail to adequately match capital with risk.


Models are a simplification of a complex reality and are always wrong.  But ideally, we want the model’s predictions to be right on average (‘unbiased’).  And if predictions are wrong they should not be far from the truth (low ‘variance’).

This can be illustrated using bull’s-eyes in the figure below.  The aim is to hit the centre often.  The left picture shows a model that has a lot of bias but little variance:  its estimates are close together but far from the centre.  The right picture shows a framework with no bias but much variance: on average it is right but the hits are far apart.

Neither picture is particularly desirable.  If there is a lot of bias in capital estimates, banks are systematically over- or undercapitalised.  And if there is a lot of variance banks may come to very different conclusions about risk – not because of modelling differences but because of random differences in their samples.  That means some lucky banks may have much lower capital requirements for the same risk than their peers (remember the study on excessive variability mentioned in the introduction).


The point to note is that it’s not just bias that makes a model perform badly but also variance.  This is often overlooked.  We tend to think ‘how wrong is the model’ rather than ‘how variable is its output’; but this bias towards ‘bias’ omits a key aspect of model performance.

In fact, there is generally direct trade-off between bias and variance.  A very simple model may not capture reality very well and have a high bias.  But because it is simple it is less affected by the vagaries of random sampling so its variance is lower.  Compare that to a highly complex model.  It may be correct on average but much more susceptible to mistaking random error for genuine signal (it ‘overfits’ the data).

To illustrate this, consider a single share of a company.  Its price goes up and down.  Usually volatility today will be related to volatility yesterday:  higher risk yesterday tends to mean higher risk today.  Let’s say in this particular case, risk today is determined by the volatility over the past three days.  A bank can model this risk – thereby determining its capital requirements – using a range of complexity of models.  The simplest model assumes that risk is only correlated with yesterday.  The more complex models make the assumption that risk is correlated with up to the last five days.

We can measure how well the bank’s model is doing by looking at how wrong its predictions are on average (using the so-called mean squared prediction error).  Chart 1 shows the results of a model calibrated on 10 years of simulated data.  That, by the way, is an extremely long time series to use for this kind of model.  The thing to note is that the simplest model with only 1 past day performs best – it has the lowest overall error.   It outperforms even the ‘true’ model (!).

I guess that result would surprise most people.  How can a wrong model outperform the ‘right’ model?  The answer lies in variance.   Chart 1 decomposes the overall error into the components due to bias (red) and variance (blue).  As expected, the models that assume fewer than 3 days’ of correlation have a greater bias because they are an incorrect description of the real world.  But as the models incorporate more days, and become more complex, their performance becomes more variable: they pick up noise and mistake it for signal because they are so flexible.

Chart 1: Sources of modelling error for simulated stock prices (10 years of data)

Chart 1

This is one reason to be wary of too much complexity in banking regulation.  Whether a complex or a simple model is better depends on the specific situation, but we should not assume that adding complexity makes the model better – even if it makes it ‘truer’ in the sense of resembling reality more closely.

The lesson is that when developing internal models practitioners and regulators should be alive to the bias-variance trade-off.  We should critically examine, and resist where necessary, the pressure that comes from the well-intended but at times misguided quest for ever greater model refinement.  Sometimes true sophistication lies not in intelligent addition but in clever omission.

Lack of observations

Excessive complexity is not the only potential problem for internal models.  Sometimes the event that is being modelled occurs very infrequently.  A good example is highly rated corporates.  There has never been a default of a company rated AAA by S&P (though they have, of course, been downgraded and defaulted later).  Only about 0.02% companies rated AA by S&P default on average.  Although that may sound like a lot, it is actually fairly little information to go by to create a sufficiently precise model.

The reason is that we need very high levels of precision at these kinds of default rates.  For the 0.02% default rate observed for AA-rated corporates, even a margin of error of only 0.01pp – small in many other practical applications – would mean an estimated range of 0.01% – 0.03%.  One bank might think the corporate is three times as risky as another bank.  That’s a lot of variability.

To be fair, this example is a bit extreme. In practice regulators have already cottoned on to this: internal models’ PD estimates are floored at 0.03% for corporates.  And, as suggested here, floors can be a potent tool to improve the performance of internal models.

That said, the general point holds: the more data we have, the more we can trust an unbiased model to be close to the true probability of default.  To illustrate this, assume we have 100 banks.  They all apply the same model (a simple average) to a sample of defaulted and non-defaulted companies.  The true probability of default is 0.1%.  Chart 2 shows the ranges in which the estimates will fall depending on the available sample size (the idea here is identical to that of confidence intervals in statistics).

The smaller the sample the bigger the variability of capital outcomes. This is driven entirely by random sampling variability not modelling differences.  Variability decreases at an ever slower rate as the sample size increases; so even very large samples will be subject to some variability.  The variability does not translate into variability of capital requirements one-for-one, but the effect remains substantial.  The variability for the sample size of 10,000 translates to some banks holding more than twice the capital for the same risk than others.  That is not a consistent or fair standard.  Imagine a grocer charging you twice as much as the one down the road.

Chart 2: Ranges of probability of default (PD) estimates by sample size

Chart 2

Chart 3: Variability of banks’ probability of default (PD) estimates in 2011 (most conservative / least conservative bank)

Chart 3

Some asset classes yield more data than others and may be not be problematic.  For example, the UK cards association estimates that there are 30,000,000 credit card holders in the UK.  That should inspire more confidence than the mere 193 members of the United Nations (which is an upper bound on the number of countries able to issue sovereign debt).  In fact, not only are there vastly more credit cards in the UK alone than sovereigns in the world, credit card holders also default more often.  All else equal, this should result in considerably more reliable results for credit cards than sovereigns.

As mentioned in the beginning of this post, the point about variability isn’t purely academic.  We have observed considerable variability in capital requirements (see Chart 3 for the results from the November 2012 Financial Stability Report, or this Basel study).  At least part of the variability might be due to fundamental data problems.  The lesson might simply be that if there isn’t enough data internal models are not appropriate.

Tobias Neumann works in the Bank’s Policy Strategy and Implementation Division.

Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.

If you want to get in touch, please email us at