The language of rules: textual complexity in banking reforms

Zahid Amadxarif, James Brookes, Nicola Garbarino, Rajan Patel and Eryk Walczak

The banking reforms that followed the financial crisis of 2007-08 led to an increase in UK banking regulation from almost 400,000 to over 720,000 words. Did the increase in the length of regulation lead to an increase in complexity?

Tightening rules to reduce risks, such as the risk of a financial crisis, requires taking into account more factors and eventualities, increasing complexity. But complex regulation will require more effort and information to comprehend, and it can create uncertainty if full comprehension is not achieved, because agents might find it inefficient and settle for partial comprehension, preferring to use simple heuristics.

In a recent working paper, we use natural language processing and network analysis to calculate complexity measures on a novel dataset that covers the near universe of prudential regulation for banks in the United Kingdom before (2007) and after (2017) the reforms. The dataset includes both UK-specific rules and guidance, and EU Regulations and Technical Standards.

We define regulatory complexity as complexity that readers encounter when they process particular texts. This definition focuses on human comprehension of regulatory texts, as opposed to complexity related to the balance sheet or parameters that banks need to estimate.

Network of cross-references

In this blog post, we focus on complexity resulting for the network of cross-references that link individual rules. We use the term “global complexity” to refer to processing difficulties encountered while reading that are likely to be resolved only after accessing information outside the immediate context of the provision — but require following a (long) chain of cross-references. “Local complexity” instead refers to processing difficulties that are likely to be resolved while the reader is processing the text contained in the provision.

Figure 1 provides a visualisation of the networks for 2007 and 2017. Each point (node) represents a provision, and each edge (line between nodes) is a cross-reference. A provision (node) with more cross-references (edges) is more complex because a reader must leave the provision and visit other provisions/nodes to fully comprehend the provision itself.

In 2007, all rules and guidance were contained in the Handbook of the Financial Services Authority (we exclude conduct rules). For 2017, we also show the legal source for each provision: the EU Capital Requirements Regulation (CRR; yellow nodes), and related Technical Standards (red), the Rulebook (blue) and Supervisory Statements (green) published by the UK regulator, the Prudential Regulation Authority. For visual clarity, only nodes with at least one edge are displayed.

A visual comparison between the two figures highlights how the 2017 has a denser core, but also a larger periphery of nodes with only one of two edges. CRR provisions are at the centre of the network.

Figure 1: Network visualisation of the analysed banking regulations in 2007 and 2017

Note: More complex provisions concentrated in “core” were connected by many cross-references.

We test this further using well-established metrics from network science — namely degree and PageRank. Degree is one of the simplest descriptive statistics of a network. It describes a number of edges (or links) in a node, i.e. the number of connections from/to that node. PageRank summarises the centrality of a node within a network. Simplifying somewhat, PageRank counts the number and quality of cross-references to a provision to estimate how important the provision is. Degree only captures direct links while PageRank takes into account the whole web of indirect links that point towards a node.

The results for our measures of network structure are summarised in Figure 2. The left-hand side chart shows the distributions for degree. In the lower half of the distribution, the difference in degree between 2007 and 2017 appears small. A wedge appears in the upper half of the distribution. The nodes in the top bin (top ten per cent) in 2017 have a mean degree of about 30 in 2017, almost three times as high as in 2007.

Similarly, the chart on the right-hand side of Figure 2 plots the 2007 and 2017 distributions for PageRank. The difference is concentrated in the top bin, where mean PageRank is also about three times higher in 2017 compared to 2007 (but the absolute value of PageRank does not have an intuitive interpretation).

Figure 2: Network centrality measures on provision-level by year (2007 and 2017)

Note: To construct the decile plots, we calculate the relevant measure of complexity for each provision. Provisions in each year are then ordered by the relevant measure and then split into ten bins (deciles). We display the mean for each bin. For example, we calculate the length of each provision in our dataset for 2017. We then rank each of the provisions in 2017 terms of length, create ten decile bins, and calculate the average length within each bin. We repeat the process for 2007. Finally, we plot the average for each decile in 2007 and 2017, and compare the plots.


In particular, the comprehension of provisions within a tightly connected “core” requires following long chains of cross-references.

This complexity might be necessary for financial stability and regulations evolved this way due to lessons learned from the financial crisis. However, there are potential costs of these more complex rules. Excessive complexity may be counterproductive for competition if it is harder for small firms to deal with complex rules, and also for financial stability if it increases opacity.

In the future, machine-readable rules could overcome the computational limits of human language process. Machines can follow long chains of cross-references but are less well suited for assimilating vaguer, context-specific provisions. To the extent that cross-references help define the relevant context, the increase in network complexity could facilitate the introduction of “augmented” regulation.


Text and network data can be obtained using the PRA Rulebook R package.

Zahid Amadxarif, Nicola Garbarino and Rajan Patel work in the Bank’s Prudential Policy Directorate and James Brookes and Eryk Walczak work in the Bank’s Advanced Analytics Division.

If you want to get in touch, please email us at or leave a comment below.

Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.