Opening the machine learning black box

Andreas Joseph

Machine learning models are at the forefront of current advances in artificial intelligence (AI) and automation. However, they are routinely, and rightly, criticised for being black boxes. In this post, I present a novel approach to evaluate machine learning models similar to a linear regression – one of the most transparent and widely used modelling techniques. The framework rests on an analogy between game theory and statistical models. A machine learning model is rewritten as a regression model using its Shapley values, a payoff concept for cooperative games. The model output can then be conveniently communicated, eg using a standard regression table. This strengthens the case for the use of machine learning to inform decisions where accuracy and transparency are crucial.

Why do we need interpretable models?

Statistical models are often used to inform decisions, eg a bank deciding if it grants a mortgage to a customer. Let Alice be that customer. The bank could check her income and previous loan history to estimate how likely Alice is to repay a mortgage and set the terms for the loan. A standard approach for this type of analysis is to use a logistic regression. This returns a ‘probability of default’, ie the chance that Alice won’t be able to repay the loan, spelling trouble for the bank and herself.

A logistic regression is a very transparent model. It attributes well-defined risk weights to each of its inputs. The transparency comes at a cost though: logistic regressions assume a particular relationship between the explanatory factors. This may not hold, in which case the model’s predictions may be misleading. Machine learning models are more flexible and therefore capable to detect finer nuances provided there is enough data to ‘train’ a model. This is also the reason for their current success and increasing popularity ranging from personalised movie recommendations over credit scoring to medical applications.

However, the cost of this flexibility is opaqueness, which gives rise to the black box critique of machine learning models. It is often not clear which inputs are driving the predictions of the model. Furthermore, a well-grounded statistical analysis of the model is generally not possible. This can lead to ethical and legal challenges if models are used to inform decisions affecting the lives of individuals. This is particularly true for models from deep learning which are driving many AI developments.

There also are concerns regarding the interpretability of machine learning models more specific to policy makers, like eg at the Bank of England’s decision-making committees. First, when using machine learning alongside more traditional approaches, it is important to understand where they differ. Second, a challenge in decision making is trying to understand how relationships between variables might change in the light of policy actions. In both cases, interpretable and transparent models are likely to be very helpful.

Opening the black box

It would therefore be of great use to bring machine learning into the same playing field as currently used models. This would promote transparency and likely speed up model development while helping to avoid harmful bias. Such an approach is laid out here. The idea is to separate statistical inference into two steps. First, the contribution each variable makes to a model is measured. Second, these contributions are taken as the input to a standard regression analysis.

As for the first step, previous work established an analogy between cooperative game theory and statistical models. In the former, players work together to generate a payoff. In the latter, variables jointly determine the predictions of a model. Given that the skills of a player can complement or substitute those of another player, it is not clear how to assign payoffs between players. The same situation applies to variables which can be correlated or interact with each other in unknown ways.

Shapley values are an elegant solution from game theory to this problem. They were discovered by the Nobel Prize winning economist and mathematician Lloyd Shapley in 1953. The basic intuition of the Shapley value of a player in a cooperative task or game is the surplus this player generates given all other players in the game.

As an illustrative example imagine three siblings, one strong, one tall and one intelligent, who want to nick apples hanging over from branches of a neighbour’s tree. The apples are hanging too high for any of them alone to reach but two working together can get some, while all three joining forces would get even more. This is a simple cooperative ‘game’ where the payoff is the amount of apples the kids make off with.

The Shapley value of each sibling is the difference in the amount of apples they get between all three working together and any combination of the other two doing so, either together or individually, with the latter being none as no one can get any on its own. How do we get back to a statistical model? We make the following correspondence. We replace the neighbour’s apples with a value to calculate, eg Alice’s probability of loan default, and the kids with variables in a model, eg her employment status in the above logistic regression or an artificial neural network (NN) from machine learning. We then do a similar calculation as above, ie compare model predictions for including and excluding a variable from a model. The Shapley value of a variable now attributes the amount of model output due to that variable.

The sum of all Shapley values gives us the model prediction by construction. This is a linear relationship. Hence, we can now reformulate a model as a linear regression over its variables’ Shapley values. This, in turn, enables us to test hypotheses on the model. For example, we can test if a variable contributes significantly different from zero on average – a standard test in a regression analyses.

A simple application in a central banking context

What are the dependencies between macroeconomic variables and which variables best predict certain outcomes looking into the future are questions relevant to central banks, like the Bank of England. As an example we built a NN to predict changes in unemployment one year ahead using other macroeconomic and financial variables. One of these variables is past growth in private sector debt. We now can use model Shapley values to extract the contribution of changes in debt to changes in unemployment within the NN.

Figure 1: Input-output relation for private sector debt within an artificial neural network (NN) for modelling changes in UK unemployment. The red and green lines represent best-fit regression lines for negative and positive input values, respectively.

Figure 1 shows the relation between the two learned by the NN (blue squares). Input values (changes in debt) are shown on the horizontal axis (standardised to a mean of zero and standard deviation of one). The Shapley values of this variable in the model are shown on the vertical axis. These are the values the model adds to its prediction of change in unemployment coming from changes in debt.

This relationship is clearly non-linear as shown by the kink at the mean zero value. When growth of private debt is less than average it has a positive relationship with future unemployment (red line) but the opposite holds for above average debt growth (green dashed line). This non-linearity is not complex but important and would be hard to specify without prior knowledge. The NN is able to learn it directly from the data, while the Shapley regression framework can be used to examine this relation statistically. Table 1 shows the output from such an analysis in the form of a standardised regression table comparing results for the UK and the US for a subset of input variables into the NN using data from either country.

Table 1: Shapley share coefficients for a selection of input features for a NN modelling changes in UK and US unemployment. The standard error of coefficients and the p-value of the corresponding Shapley regression are shown in parentheses. Significance level: *: 10%, **: 5%, ***: 1%.

Machine learning models deliver the better fit to the data as shown by the ratios of root-mean squared errors (RMSE) between the NN and a linear regression model with the same inputs. This difference is sizable for the UK (about 11%) but even larger for the US (about 36%). This suggests that non-linearities are more important in the US case for this task.

The table shows so-called Shapley share coefficients (SSC), which measure the size and significance of each variable’s contribution to explaining changes in unemployment as a fraction of total model output. That is, the sum of the absolute values of SSC of all variables (including others not shown in Table 1 for brevity) sums to one by construction. The SSC for changes in private sector debt in the UK is +0.084***. This means that changes in private sector debt are on average positively associated with changes in unemployment, explaining about 8.4% of model output and that this relation is strongly significant. Note that this is the average effect across positive and negative input values in Figure 1 with differences on both sides. When interpreting non-linear models it is important to consider the possibility of such more complex variable relations. Machine learning models and the Shapley regression framework provide a way to do just that.


We have seen that machine learning models can be evaluated and communicated very similarly to linear regression models using the Shapley regression framework. This is likely to help decision makers to make the most of the advantages of these models. On a broader level this may also help to accelerate advances in AI research. Particularly in the presence of ever larger and richer datasets (Big Data), this approach can help to make state-of-the-art models more transparent and reduce or even avoid biases.

Andreas Joseph works in the Bank’s Advanced Analytics Division.

If you want to get in touch, please email us at or leave a comment below.

Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.

4 thoughts on “Opening the machine learning black box

  1. Indeed, Shapley values are a valuable tool for the analysis and understanding of models. But it is important to understand and clearly state their limitations. Due to those limitations, I am not convinced they are “.. an elegant solution from game theory to this problem”.

    Actually you state the main issue with Shapley values completely correctly: One needs to “compare model predictions for including and excluding a variable from a model.” The problem is, that it is by no means clear for models with non-additive interactions what that actually means.

    The easiest example is a simple interaction, i.e. a model Y = X_1 * X_2. What is the model without the independent variable X_1? Intuitively you might say it is Y = X_2, i.e. you take your original model and replace X_1 by one. But what do you do if X_1 < 1? You have to pick any other arbitrary number. In non-trivial cases your contribution will depend on this arbitrary choice … a lot.

    One way people have tried to avoid arbitrariness, is by setting X_1 to values related to statistical properties. For example the expectation E[X_1] of X_1. In this case the two explanatory components are: E[X_1]*X_2 and X_1 * E[X_2]. But this is no general solution of the problem. Notice that if X_1 and X_2 are centred both components are equal to zero and would contribute nothing.

    The root of the problem is the breakdown of the marginal principle for non-additive models. In additive models you have a clear and unequivocal definition of the components making up the coalition. This is the domain where the Shapley value is indeed a powerful tool. In all other cases, the contributions will depend on a decomposition, which is in general arbitrary.

    A more detailed description of these issues in the context of capital allocation on risk factors is in this presentation for the Swiss Society of Actuaries:


  2. Dear Guido,

    Thanks a lot for the detailed comments and please excuse the delayed reply.

    These are important issues you raise. When calculating Shapley values for models one generally “integrates out” the contribution from a certain variable, i.e. averaging contributions from a representative piece of the data. The crucial assumption here is that variables are (roughly) independent from each other which can be a problem depending on the application. Regarding non-additive models, e.g. the interactions you mention, the Shapley values also recover the exact functional form as demonstrated in the numerical exercise in the paper. Of course, you have to know this form but the Shapley value/regression framework allows you to do just that. One may also consider other decompositions of a model to address these issues but I am not aware of another way which would unite so many desirable properties.

    Thanks again for your comments.

    PS: The link you posted seems not to be accessible from the three different networks I tried.

  3. Dear Per,

    Thanks a lot for the detailed comments and please excuse the delayed reply.

    The two concepts you mention, fear and danger, are fairly abstract and measuring them in a model of any sort will always be up to a degree of subjectivity. If you have proxies for them like the risk weights and ratings you mention, you could extract and test their contributions in a machine learning model which you then can communicate to regulators using the proposed framework. Of course, what they make out of these results is a very different question and probably beyond the technical discussion if a statistical modelling tool.

    Many thanks again for your comment.

Comments are closed.