Angelina Carvalho, Chiranjit Chakraborty and Georgia Latsi.
Policy makers have access to more and more detailed datasets. These can be joined together to give an unprecedentedly rich description of the economy. But the data are often noisy and individual entries are not uniquely identifiable. This leads to a trade-off: very strict matching criteria may result in a limited and biased sample; making them too loose risks inaccurate data. The problem gets worse when joining large datasets as the potential number of matches increases exponentially. Even with today’s astonishing computer power, we need efficient techniques. In this post we describe a bipartite matching algorithm on such big data to deal with these issues. Similar algorithms are often used in online dating, closely modelled as the stable marriage problem.