Izzy Whizzy let’s get Vizzy – The magic of using visualisation to analyse and understand data.

Lyndsey Pereira-Brereton.

Like Sooty, the BBC’s yellow bear loved by generations of British children, central banks should wave the ‘magic wand’ of data visualisation over their large, granular or complex data sets, in order to gain further insight into the patterns and relationships contained within them. This blog draws on some examples to highlight how different visualisation techniques help not only the communication of data, but more importantly, how it can aid data exploration, analysis and understanding.

Visualisation is a powerful technique because our brains are naturally wired to process information visually. One of the best examples of this power is Anscombe’s Quartet, which comprises of 4 sets of data that have the same mean, variance, correlation, and linear regression. However, it is only when they are graphed that we immediately see very different patterns, and therefore interpretations, of the data:

Anscombe quartet table

Anscombe quartet graphs

And here are some real life examples showing potential techniques that central banks could use to help with data explanation and exploration:

1. Cath Sleeman’s winning entry from the Bank’s first data visualisation competition, ‘Recessions and Recoveries’, is a great example of how interactivity empowers the user to explore the data themselves. It also shows just how much information can be contained within one dynamic visualisation, without confusing the user.

You can explore it here.

Recessions & Recoveries entry

This visualisation contains a multi-dimensional dataset of: 3 economic variables (Inflation, productivity & GDP), 2 variables constructed from time (depth and duration of the recession), across 7 countries and over 155 years – in total around 800 data points.

There are 8 sorting options so you would need a minimum of 8 standard static bar or lines charts to show the same arrangement of data. Even then it would be very hard and time-consuming to compare them all. The pop up detail on each of the time points is also a great help in locating and understanding the different data points, especially when looking at such a long back run of data.

Ryland Thomas, a senior economist at the Bank, told me that some people had noted the similarities between the recent recession and the one following the crisis of 1907-1908, at least in US data (see paper here).  But what he found impressive was that Cath’s visualisation made it possible to “immediately” see that this was also true for the UK. When sorting the graphs in the bottom panel you can see that this period was one of relatively slow productivity growth and low inflation, which is similar to what we have observed recently.

The charts in this entry look very similar to the sorts of charts we normally use, but being able to sort several variables simultaneous makes it much easier to ‘see’ similarities or differences between them. There could be wide application of this interactive sorting technique to datasets held by central banks and supervisory authorities, such as comparisons between individual banks and insurers.

2. A research paper by Jo Wood et al uses a visual analytics approach to ‘Detecting name bias in alphabetically ordered ballot papers’. This looks at whether voting bias exists towards candidates with names higher up the alphabet (and therefore towards the top of a ballot paper). This example documents well the process of producing a complicated visualisation to analyse 15 variables simultaneously. The full paper is here, and the main visualisation is explained and copied below.

Each large square represents a London borough positioned approximately in relation to its geographic location (northerly boroughs towards the top, inner boroughs in the centre etc.). Each square is divided into smaller rectangles showing at ward level the candidates standing for the Conservatives, Labour or Liberal Democrats – symbolized by colour. The lighter the colour the fewer votes received by that candidate within their party (this allowed exploration of name ordering effects even in areas with strong party preferences). Candidates were ordered according to their position on the ballot paper, again within their party. This is so that candidates who were alphabetically first within their party appear in the top row, second in the middle row and third in the bottom row.

If no name order bias existed, dark and light cells would be randomly distributed in the top, middle and bottom thirds of each borough. The first graph shows this visual null hypothesis – where names have been randomly jumbled up rather than placed alphabetically within their party. There is no clear pattern:

Name bias visualisation-RANDOM

But when looking at the actual data in this second graph, we can literally see evidence of name bias. Here darker cells (indicating a candidate with the most votes within their party) are more common in the upper third (listed first on the ballot paper within their party) and lighter cells (least votes within party) are more common in the lower third (listed third within their party on the ballot paper).Name bias visualisation

This is an involved visualisation where you need to spend time to properly understand it. However, not all visualisations have to be instantly clear, especially when they are analytically focussed and the point of them is to help test a theory. It would be even more difficult to understand someone’s regression model simply by glancing at it. While this paper only has 9 pages a similar study for the US, using more traditional statistical techniques to answer the name bias question, took 100 pages to explain its findings!

This is different to using a visualisation purely for communication purposes, where you want the user to understand a known relationship quickly and easily, and where I think a lot of people think the benefits of visualisation end. This visualisation primarily acts as a means of understanding the data, and because it does such a good job of this it naturally has the additional benefit of making it easier to communicate the findings to a wider audience.

3. My last example is another finalist from our competition called ‘At midnight, all the agents’ by Arjun Viswanathan. This uses a network graph to map the relationships between time series variables rather than what it is more commonly used for, which is to show connections between entities such as firms or people. The image can be seen fully here.

agents

This visualisation uses the Bank’s agent company visit scores, broken down by sector. These scores are a quantitative assessment of economic variables (such as demand, exports, investment, employment, costs, profits – both current and future), based on information gathered from our agent’s confidential meetings with individual UK firms (and available in aggregated form here).

The data had to first be prepared by running a cross correlation on each series against all other series, with lags of 0 to 8 quarters. This creates a matrix of around 30,000 data points, which would be very hard to decipher by just looking at the numbers! But when you build a network chart plotting out the maximum correlation for each pair, and use colour to highlight the lagging and leading series, you can begin to see how particular variables lag or lead the other.

By looking at the data like this you can see that pay scores lag other variables (more blue). This is to be expected because macro-economic factors hit business first and then there is a pay response. And interestingly pre-tax profit, demand, export and employment scores from the distribution sector tend to lead other series (more orange). Simon Caunt, one of our agents, commented that distribution leading other variables made sense because it included the consumer facing SIC sectors – showing that these provide leading insight into other sectors and economic variables. The other thing he noted was that “changes feed through initially into demand and output variables and then cost and prices, which is consistent with my prior.”

Simon also commented that the links between ‘production’ future employment leading other sectors such as ‘transport’ labour costs and ‘distribution’ recruitment and exports was useful to know.

This shows how exploring the data in this different way has given some new insight for our agents. It re-iterates the point that looking at old data in new ways can help central bankers, and the public, glean new knowledge.

So data visualisation doesn’t just mean pretty pictures that help you communicate a point you already knew. It actually helps you to understand the data and draw new conclusions that you literally wouldn’t have seen otherwise.

Lyndsey Pereira-Brereton works in the Bank’s Advanced Analytics Division.

If you want to get in touch, please email us at bankunderground@bankofengland.co.uk

Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.