Ivan Yotzov, Nick Bloom, Philip Bunn, Paul Mizen, Pawel Smietanka and Greg Thwaites
Text data is often raw and unstructured, and yet it is the key means of human communication. Textual analysis techniques are increasingly being used in economic and financial research in a variety of different ways. In this post we apply these techniques to a new setting: the text comments left by respondents to the Decision Maker Panel (DMP) Survey, a UK-wide monthly business survey. Using over 20,000 comments, we show that: (i) these comments are a rich and unexplored data source, (ii) Brexit has been the dominant topic of comments since 2016, (iii) text-based indices match existing uncertainty measures from the DMP at both the aggregate and firm level, and (iii) sentiment among UK firms has been declining since 2016.
One of the most well-known applications of text analysis in economics is the Economic Policy Uncertainty Index developed by Baker, Bloom and Davis (2016), which tracks certain keywords in newspaper texts to measure the degree of uncertainty. A closely related term-frequency index has been constructed to measure the intensity of economic reform discussions around the world. Previous work at the Bank of England has used these techniques to analyse the write-ups of meetings with individual businesses held by the Bank’s network of Regional Agents. Another common approach to analysing textual data is sentiment analysis, particularly with the focus of predicting financial and economic variables. In our analysis, we apply similar methods to a new and rich source of data that has not been previously considered in the literature: voluntary text comments left by respondents to a business survey.
The Decision Maker Panel (DMP)
The DMP is a large and representative monthly online survey of UK businesses, with around 3,000 respondents each month. In addition to the regular questions about business conditions, which are of a quantitative nature, firms are also given the option to ‘provide any additional information that [they] feel may help us understand better [their] responses (…)’. This additional information is in the form of an open text box. On average, around 20% of firms provide some comments in any given month (see Figure 1). This gives us a data set of over 20,000 comments. While these comments are typically only a few sentences in length, the analysis presented in this blog shows that they are rich in informational content.
Figure 1: Number of comments made by DMP respondents
A first look at the data
We begin our analysis of the data by applying some standard pre-processing steps, such as removing common words and lemmatisation. Figure 2 visualises the most commonly used words in the text comments, where the size of each word reflects its relative frequency in the sample.
Figure 2: Word cloud of firm comments. Full sample (Sep. 2016 to Nov. 2020)
It is clear from Figure 2 that the dominant topic in the comments that firms have left in the DMP Survey has been the Brexit process. This is reflected not only in the use of the term itself being the most commonly used word, but also in the frequent use of related terms such as ‘deal’, ‘leave’, ‘EU’, and ‘Europe’. In addition, the term ‘uncertainty’ is the eighth most common in the sample, and around 60% of the comments that include ‘uncertainty’ also mention ‘Brexit’.
The word ‘coronavirus’ (see bottom of Figure 2) was also one of the most prominent terms mentioned by CFOs in the past four years, despite the pandemic covering less than a year of our sample period, and this was the most common subject of comments in 2020. There have also been frequent comments on the terms that relate to the regular questions in the survey such as ‘price’, ’sale’, ’cost’, ’work’, ’staff’, ’capital’, and ’expenditure’. Overall, this word cloud demonstrates that the data contain valuable information and warrant further analysis.
To capture how often words are used over time, we create several measures based on term-frequency. This approach has been used, among others, in the construction of the Economic Policy Uncertainty (EPU) index. We focused on the most common terms as captured by the world cloud shown in Figure 2, starting with Brexit. For each comment, we create a monthly ‘Brexit Text Index’ (BTI), which counts the number of occurrences of the term ‘Brexit’ in a given month scaled by the total number of words in that month. In Figure 3 we compare this text-based measure to the Brexit Uncertainty Index (BUI) derived from the DMP – a measure Brexit uncertainty based on responses to a survey question that represents the percentage of businesses who reported that Brexit was in the top three sources of uncertainty for their business.
Figure 3: Brexit Text Index and Brexit Uncertainty Index
The two aggregate Brexit indices have moved closely together: the correlation between them is 0.61. In 2020 there was some divergence between the two, with the text-based BTI weakening relative to the survey-based BUI. That might reflect the emergence of the Covid pandemic as a new topic that firms began to comment on, leading the share of comments about Brexit to fall.
There has also been a close relationship at the firm level between mentions of Brexit and whether these firms consider Brexit a key source of uncertainty. We found that there has been a strong positive association between these different metrics, even after controlling for monthly and firm fixed effects. Finally, we compared the aggregate BTI with a measure of uncertainty using newspaper data – the UK EPU Index – and found a strong correlation between the two measures, particularly since the start of 2018.
Overall, we conclude from this analysis that: (i) the variation of the BTI is meaningful both at the aggregate and firm level, (ii) the BUI can be validated using the text comments left by the same firms over the sample period, and (iii) discussions of Brexit have been a strong indicator of uncertainty associated with this process.
Focusing on DMP data from 2020, we can also analyse the mentions of ‘coronavirus’ by UK businesses, using a similarly constructed text index (see Figure 4). We see that mentions of coronavirus leapfrogged over both the Brexit and uncertainty indices in March 2020, and remained at elevated levels in subsequent months. Finally, in firm-level regressions we also show that the coronavirus text index is highly correlated with firms finding the pandemic to be a key source of uncertainty for their business.
Figure 4: Brexit, uncertainty, and coronavirus text indices
A new measure of business sentiment
Going deeper into the content of the texts, we try to capture the average sentiment of each comment by creating a ‘net polarity index’. The polarity of each comment is measured by looking at the balance of positive and negative words. We use the dictionary constructed by Tim Loughran and Bill McDonald. This dictionary is constructed from 10-K financial reports filed by companies, and has been widely used for sentiment analysis in the financial literature.
For each comment, we calculate the ‘net polarity index’ by counting the number of positive and negative words and scaling by the total number of words. In addition, we apply a technique called dependency parsing in order to handle simple cases of negation (eg ‘uncertainty is not good’) from being misclassified. To be clear, for each comment:
We then aggregate the sentiment at the monthly level (Figure 5, Panel B).
Figure 5: Net polarity index
Panel A: Most common positive (in green) and negative (in red) words
Panel B: Monthly net polarity index
Figure 5b shows that the sentiment in the text comments has been net negative and has generally declined since 2016. Sentiment appears to have been weaker in periods when Brexit uncertainty was higher. Sentiment did improve in late 2019/early 2020 following the UK’s General Election and as a withdrawal agreement was reached between the UK and the EU, although it soon fell back again as the Covid pandemic hit, reaching a trough in May 2020.
We study over 20,000 text comments left by firms responding the DMP Survey using tools from computational linguistics. Voluntary comments left by business survey participants have not been widely analysed before. Our results show that these comments have high analytical value at both the firm and aggregate level. The term-frequency indices are highly correlated with quantitative uncertainty measures, and business sentiment has been closely related to economic events and has overall fallen since 2016. Future research can build on this work to study attitudes toward specific terms or events and use machine learning to analyse the topics of these comments in a more systematic way.
Ivan Yotzov works at the University of Warwick, Nick Bloom works at Stanford University, Philip Bunn works in the Bank’s Structural Economics Division, Paul Mizen works at the University of Nottingham, Pawel Smietanka works at Deutsche Bundesbank and Greg Thwaites works at the University of Nottingham.
If you want to get in touch, please email us at email@example.com or leave a comment below.
Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge – or support – prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Deutsche Bundesbank, Bank of England, or its policy committees.