Will Big Data Keep Their Promises?

From translator


This is a translation of the report of the leading economist of the Bank of England on the possibilities of using big data in the activities of this organization. Original title "Will Big Data Keep Its Promise?" April 30, 2018.

They talk a lot about Big Data, but, personally, I often get the impression that we, as Aboriginal people in loincloths, are imposing worthless trinkets on us. This report, in my opinion, confirms that, in fact, working with Big Data is a rather complicated process, but productive.

Of course, this report is very different from what they tell us, because you yourself understand that this report is designed for a different audience. The quality, in my opinion, is exemplary. In aggregate, the status of the Bank of England organization and the specialist who presented these results, it becomes clear how, in many cases, Big Data analysis can be used. Its some conclusions, in small formulations, can be formulated as follows:

  • it is very carefully necessary to determine the direction and level of data decomposition by the example of the actions of the Swiss Bank;
  • in a significant number of cases, the value may appear in new results, for example, the formation of the wording of the Bank of England management documents.

He only hints about some results when the gaming industry can replace the Monte Carlo method.

Machine translation, with minor corrections, so as not to cut the ear after the kata.

I'm glad to be here to launch the Data Analytics for Finance and Macro (DAFM) research center at King's College Business School. I would like to congratulate professors Georgios Kapetanios and Georgios Cortareas, as Co-directors (as well as former colleagues), for organizing a launch pad for the center and preparing it for take-off.

I believe that the application of data analysis methods to solve many pressing issues in the field of finance and macroeconomics has great prospects. For this reason, the Bank of England about four years ago created its own data analysis unit. And that is why I very much welcome the creation of this new center as a means of realizing this promise.

But will big data keep their promise? I want to try to illustrate some of these promises of big data, as well as potential pitfalls, using examples from recent studies of the Bank of England on the economic and financial system. In conclusion, I will give some more speculative thoughts about future research on big data. 1 (1- Cœuré (2017) offers an excellent summary of the potential for Big Data to improve policymaking, in particular in central banks.)

The path less followed


The first thing to say is that big data and data analysis methods are not new. However, in recent years they have become one of the fastest growing growth areas in academic and commercial circles. Over this period, the data became new oil; data analysis methods have become oil refineries and refineries of their time; and information companies have become new oil giants. 2 (2- For example, the Economist (2017), Henke et al (2016).)

Nevertheless, economics and finance have so far been rather restrained regarding the universal approval of this "oil fever." For economics and finance, the use of data analysis methods was less common, at least compared to other disciplines. One of the simple diagnostics in this regard comes from the consideration of very different interpretations of the expression "data mining" by those who are inside and outside the economy and finance.

For economists, few sins are more disgusting than data mining. This last villain’s remedy is to engage in a “regression hunt,” reporting only those regression results that best fit the hypothesis that the researcher first intended to test. This is what puts con in econometrics. 3 (3- Leamer 1983) for most economists, this analysis of data bears a sad resemblance to oil drilling - a dirty, mining business that is accompanied by a large detrimental effect on health.

For data scientists, the situation may be more different. For them, data mining is a means of extracting new valuable resources and using them. This allows you to get new ideas, create new products, establish new relationships, promote new technologies. This is the raw material for a new wave of productivity and innovation, the nascent Fourth Industrial Revolution. 4 (4 - See, for example, Schwab 2017)

What explains the caution of some economists regarding big data? Part of the answer lies in methodology. 5 (5 - Haldane 2016) A decent chunk of the economy followed the methodological footsteps of Karl Popper in the 1930s. Popper advocated a deductive approach to scientific progress. 6 (6 - Popper (1934) and Popper (1959)) This began with axioms, passed from axioms to theory, and only then accepted hypotheses for data. In other words, theory preceded measurement.

There is an alternative, inductive approach. This has even deeper roots in the work of Francis Bacon from the early 1600s. 7 (7 - Bacon 1620) He begins with data not constrained by axioms and hypotheses, and then uses them to inform the selection of behaviors. In other words, data precedes theory. Indeed, some data researchers have suggested that such an approach could signal the “end of the theory.” 8 (8 - Anderson 2008)

Therefore, where some economists tend to see pitfalls in big data, data experts see promising prospects. Where some economists tend to see the environmental threat it poses, data analysts see economic potential. I'm a little caricatured, but just a little bit. So who is right? And does the big data era signal an oil flow or oil spill?

True, as often happens, it probably lies somewhere in the middle. Both deductive and inductive approaches can offer insights into the world. They are better regarded as methodological additions than as substitutes. In other words, using one approach in isolation increases the risk of erroneous conclusions and potentially serious errors in understanding and politics. Let me give you some examples to illustrate.

During the global financial crisis, it is now pretty well accepted that the dynamic stochastic general equilibrium (DSGE) model of the macroeconomic main workhorse fell on the first fence. 9 (9 - For example, Stiglitz 2018) She could not explain the dynamics of the business cycle during or after the crisis. Although theoretically pure, it turned out to be empirically fragile. I believe that this empirical fragility has arisen due to excessive methodological dependence on deductive methods. Or, in other words, due to the fact that too little attention is paid to real data of the past, including crises.

As a counter-example, in 2008 Google launched a prognostic model for outbreaks of influenza based on phrases such as “signs of the flu.” 10 (10 - Ginsberg et al 2009) This has done tremendous work in tracking outbreaks of influenza in the US in 2009-10. But in the years that followed, the predictions of this model suffered a crushing defeat.11 (11 - Lazer et al 2014) I would suggest that this empirical fragility arose because of an excessive dependence on empirical laws and an excessive commitment to inductive methods. Or, in other words, too little attention is paid to the deep medical causes of past flu outbreaks.

In the first case, empirical fragility arose because of too narrow a set of axioms and limitations, because of too much emphasis on theory, and not on real correlations and historical experience. In the second case, empirical fragility arose due to a too weak set of axioms and restrictions due to the observed empirical correlations, which play too large a role with respect to theory and causality.

In both cases, these errors could be reduced if the inductive and deductive approaches were used in a complementary or iterative way. This iterative approach has a strong pedigree in other disciplines. The history of progress in many scientific disciplines included a two-way learning process between theory and empiricism, when in some cases the theory stimulated measurements and in others the theory of measurement motivation in a continuous feedback loop. 12 (12 - Bacon (1620) summarises this well: “Those who have handled sciences have been either men of experiment or men of dogmas. The men of experiment are like the ant, they only collect and use; the reasoners resemble spiders, who make cobwebs out of their own substance. But the bee takes a middle course: it gathers its material from the flowers of the garden and of the field,but transforms and digests it by a power of its own. ")

One example of this approach, discussed by Governor Carney during the launch of his own data analysis program of the Bank, concerns the dynamics of planetary motion. (13 - Carney 2015) It was Sir Isaac Newton (the former owner of the Royal Mint who also printed money) who developed the physical theory of heavenly motion. But this theory was built on the empirical shoulders of another scientific giant, Johannes Kepler. When it comes to planetary motion, empiricism first led the theory, inductance led the deductive.

It was the same from time to time when we understood the movement of the economy and financial markets. Keynesian and monetarist theories were built on empirical experience from the time of the Great Depression. Phillips curve originated as an empirical Kepler law, which only later received a Newtonian theoretical basis. Many finance puzzles that have been haunted by theoreticians for decades have begun as empirical anomalies in asset markets. 14 (14 - Obstfeld and Rogoff (2001) discuss six major puzzles in international macroeconomics, such as the excess volatility of exchange rates relative to fundamentals.) In each case, empiricism led the theory, inductive led the deductive.

My conclusion from all this is clear. If this iterative learning process between empiricism and theory will continue to bear fruit in the economy, then deductive and inductive approaches may require generally equal billing. If this is so, then I think that economics and finance will get a high return by making further intellectual investments in big data and accompanying them with analytical methods in the future.

Big Data Definition


If big data is promising, then it is probably helpful to start by defining what it is. This is not quite easy. Like beauty, what is considered big data is in the eyes of the beholder. It is also a fluid concept. For example, it is clear that data no longer means just numbers and words. Indeed, in recent years there has been an increase in research on semantics, including in the field of economics and finance.

What is less controversial is that over the past decade the most extraordinary revolution has taken place in the creation, extraction and collection of data, in the broad sense of the word. This was partly the result of Moore’s law and related advances in information technology. 15 (15 - Moore (1965) noted the annual doubling in the number of components per integrated circuit) Unlike oil, whose resources are limited, new data is created with unprecedented speed and have virtually unlimited reserves.

It is estimated that 90% of all data ever generated was in the last two years. 16 (16 - SINTEF 2013) a good chunk came from social networks. About 1.5 billion people use Facebook daily and 2.2 billion monthly. In 2017, there were 4.4 billion smartphone subscriptions, more than one for every second person on the planet. According to forecasts, by 2023 the number of smartphone subscribers will be 7.3 billion, that is, almost one for each person.17 (17 - Ericsson Mobility Report 2017) According to estimates, in 2017, 1.2 trillion photos were taken, which is 25 % of all photos taken ever. 18 (18 - See www.statista.com/chart/10913/number-of-photos-taken-worldwide )

Another view of this information revolution opens up when we look at the number of scientists studying data. Using job data from the Reed job search site, more than 300 job postings in the UK have recently been posted for data processing professionals. 19 (19 - Using dataset in Turrell et al (forthcoming)) There were almost none back in 2012. Estimates based on self-identification on the Linked-In social networking site suggest that there can be more than 20,000 data processing specialists in the world.20 (20 - Dwoskin (2015). The true number of data scientists worldwide is highly uncertain. Many individuals work on data science without necessarily using that job title, but the opposite is also true.)

At the same time, there has been a rapid growth in new methods of processing, filtering and extracting information from this data. Machine learning techniques are developing rapidly. The so-called “deep learning” methods complement existing approaches, such as tree-based models, support vector machines, and support vector machines and clustering techniques.21 (21 - Chakraborty and Joseph 2017) Vocabulary methods, vector models are rapidly gaining momentum in the field of text analysis spaces and semantic analysis. 22 (22 - Bholat et al 2015)

All these methods offer various ways of extracting information and obtaining reliable conclusions in situations where empirical relationships can be complex, non-linear and evolving and when data can arrive at different frequencies and in different formats. These approaches differ significantly from the classical econometric methods of inference and testing, often used in economics and finance.

This revolution in the presentation of data and in the methods of understanding them offers analytical wealth. The extraction of these riches requires, however, considerable caution. For example, data privacy issues loom much wider with granular, in some cases personalized, data. Recently, these issues have rightfully occupied a prominent place. At the same time, the protection of big data is one of the key tasks of the Bank in its research.

Big Data Promise


To the extent that big data can be characterized, this is usually done using “three Vs”: volume, velocity, and variety. Using the three Vs as an organizational structure, let me discuss some examples of how these data and methods have been used in recent banking research to improve our understanding of the functioning of the economy and the financial system.

Volume


The statistical foundation of macroeconomic analysis, at least since the mid-20th century, is national accounts. National accounts have always relied on an eclectic dataset. 23 (23 - Coyle 2014) In the past, land use, crop, and livestock accounts were used to estimate agricultural production. Industrial production was measured by such various sources as the number of iron blast furnaces and books listed by the British Library. And the output of services was estimated based on the tonnage of the merchant fleet. 24 (24 - Fouquet and Broadberry 2015)

With more data coming in than ever before, the use of new and eclectic data sources and methods, for that matter, is becoming more common in statistical offices. In the field of consumer price measurement, the MIT Billion Prices Project uses data from more than 1,000 online stores in approximately 60 countries to collect 15 million prices on a daily basis. This approach has been found to provide more timely (and cheaper) consumer price information than traditional surveys. 25 (25 - Cavallo and Rigobon 2016) It has also been found that online price data improves short-term inflation forecasts in some markets.26 (26 - Cœuré 2017)

In the same vein, the UK Office of National Statistics (ONS) is exploring the possibility of using “web scraping” in addition to existing pricing methods. Today, they focus on items such as food and clothing. Despite the early years, the potential benefits in terms of increasing sample sizes and granularity seem significant. For example, the ONS has so far collected 7,000 price offers per day for a group of grocery products, which is more than the current monthly fee for these products in the CPI. 27 (27 - See www.ons.gov.uk/economy/inflationandpriceindices/articles/researchindicesusingwebscrapedpricedata / august2017update )

As for measuring GDP, new sources and methods are also appearing here. One recent study used satellite imagery to measure the amount of unnatural light emitted from various regions of the world. It has been found that this has a statistically significant relationship with economic activity. 28 (28 - Henderson, Storeygard and Weil (2011), this approach could potentially help track activities in regions that are geographically removed, where statistical survey methods are poor or where problems are incorrect measurements are sharp.

A more mundane example used by the ONS of Great Britain and other statistical agencies is the so-called administrative data. This includes data collected by government agencies as part of their activities - for example, on tax revenues and benefits. In the UK, some of this data has recently become available for wider use as part of a government open data initiative, although it has undergone serious verification.

One example is the VAT data received from SMEs in a number of industries that have recently been used by the ONS to compile GDP estimates based on output. As with prices, the gain in sample size and granularity from using such administrative data is potentially large. The monthly review of ONS business activity, as a rule, is based on a sample of approximately 8,000 firms representing this subgroup of SMEs. This is currently complemented by VAT returns of approximately 630,000 reporting units. 29 (29 - www.ons.gov.uk/economy/grossdomesticproductgdp/articles/vatturnoverinitialresearchanalysisuk/december )

These new data supplement, not replace, existing survey methods. They have the potential to improve the timeliness and accuracy of national accounts data on aggregate economic trends. ONS has its own data science center to lead this effort. And new research organizations, such as the Alan Turing Institute, are doing an excellent job applying new data and methods to economic dimensions.

Another potentially fruitful area of ​​research in tracking activity flows in the economy is financial data. Almost all economic activity leaves a financial mark on the balance sheet of a financial institution. Tracking cash flows between financial institutions can help determine the size of this footprint and, thus, indirectly, to track economic activity.

Over the past years, we have relied at the bank on the database of sales of products of the Financial Regulatory and Supervision Authority (PSD). This is a very detailed source of administrative data on mortgage products of the tenant, released in the UK. It contains data on nearly 16 million mortgages since mid-2005. PSPS provided the bank with a new, higher-resolution tool for analyzing household and housing behavior.

For example, in 2014, the PSD was used by the Bank’s Financial Policy Committee (FPC) to inform and calibrate its decisions on macroprudential restrictions on high-income mortgages for UK households. 30 (30 - June 2014 Financial Stability Report) Since then, we used this data to track the characteristics of existing mortgages with high incomes on loans and high loans at cost over time.31 (31 - Chakraborty, Gimpelewicz and Uluc 2017) PSD data was used to understand pricing decisions in the UK housing market.32 (32 - Bracke and Tenreyro (2016) and Benetton, Bracke and Garbarino (2018)) And they were also used to calibrate the UK multi-industry agent model of the housing market. 33 (33- Baptista et al 2016).

In recent years, the bank and ONS have been developing a more complete set of data on flows of funds between institutions. It is hoped that these data will help track not only portfolio changes, but also how they can affect financial markets and the economy as a whole. For example, do portfolio redistributions by institutional investors affect asset markets and do they have a stimulating effect on spending? 34 (34 - Bank of England and Procyclicality Working Group 2014) Answers to such questions help, for example, in assessing the effectiveness of quantitative easing. 35 (35 - For example, Albertazzi, Becker and Boucinha (2018) show evidence of the portfolio rebalancing channel from the ECB »s asset purchase program)

New, highly detailed data is also fed to the payment, credit and bank flows. Some of them were used to predict or track changes in economic activity. They have achieved some success. For example, in the United States, a data set of more than 12 billion credit and debit card transactions over a 34-month period was recently used to analyze consumption patterns by age, company size, metropolitan area, and sector.36 (36 - Farrell and Wheat 2015 )

Over time, perhaps these types of data can help create a map of financial and real-time activity flows across the economy, in much the same way that is already done for traffic, information, or weather flows. After they are mapped, it will be possible to model and modify these flows using a policy. I first spoke about this idea six years ago. Today he looks closer than ever to being within our reach. 37 (37 - Ali, Haldane and Nahai-Williamson 2012)

These are all areas where DAFM can make an important contribution to efforts to improve the quality and timeliness of macroeconomic and financial system data. It is well known that the opportunities for improving the quality of national accounts data are very large. 38 (38 - For example, Bean 2016) And these measurement problems will only increase as we move towards an increasingly digital and service-oriented economy.

Speed


The second aspect of the big data revolution is its great frequency and timeliness. More frequent data may provide a new or more accurate picture of trends in financial markets and the economy. It can also sometimes help in solving complex identification problems that otherwise interfere with both big data (as the Google flu example showed) and classical econometric methods (as the DSGE example showed).

The crisis has shown that in stressful situations, some of the largest and deepest financial markets in the world may lack liquidity. This has led some of these markets to be captured. In response to this, as one of its first acts, the G20 in 2009 agreed to collect much more data on transactions in these markets to help better understand their dynamics in stressful situations. 39 (39 - See, for example, FSB 2010 ) This data is stored in trading repositories.

In recent years, these trading repositories have begun to collect data on a highly detailed, trading basis. This means that they quickly accumulated a large supply of data. For example, about 11 million reports are collected every working day in the foreign exchange market. They provide a rich source of data when it comes to high-frequency financial market dynamics and locations.

One example of such a bias occurred when the Swiss franc was de-pegged in January 2015. This unexpected move caused big shifts in asset prices. Frank showed a sharp V-shaped movement for several hours immediately after unlinking. By analyzing trade repository data on forward contracts at the Swiss franc to euro rates, some of the driving forces behind these changes can be identified. 40 (40 - Cielinska et al (2017). Other recent research papers using trade repository data include Abad et al (2016 ) and Bonollo et al (2016))

For example, high-frequency fluctuations in the Swiss currency can be compared with the volume of trading on forward contracts. These transactions can be further decomposed by counterparties, for example, large banks-dealers and end investors. This type of decomposition method shows that it was the withdrawal of liquidity by large banks - dealers that caused Frank's overrun - a classic sign in times of market turmoil. 41 (41 - See, for example, Duffie, Gârleanu and Pedersen (2005) and Lagos, Rocheteau and Weill (2011 )) This movement partially reversed as soon as dealers resumed market production.

Trade repository data can also be used to assess whether a weakening franc peg has had any lasting effect on market performance. A study by the bank showed that this is so, with constant fragmentation in the forward francs market. Liquidity and inter-dealer activity were structurally lower, and market volatility was steadily higher after this episode.

Additional refinement of this data allows us to tell a quasi-causal story about the driving forces of the V-shaped movement in asset markets after unlinking. Using parallel tick-by-tick and trade-by-trade data allows you to identify triggers and amplifiers in a way that would otherwise be impossible.

A second example of a study that uses faster data to improve our understanding of economic dynamics is the labor market. Understanding the combined behavior of employment and wages remains one of the central issues of modern macroeconomics. Recently, this dynamics has been complicated by changes in the world of work, when automation changes both the nature and structure of labor.

Recent banking research has used detailed job openings to shed light on this dynamic. 42 (42 - Turrell et al (forthcoming)) The study analyzes about 15 million vacancies over a ten-year period. Instead of classifying vacancies by sector, occupation, or region, he uses machine learning methods in the job description text to classify and cluster vacancies. The result is a more “job description” classification scheme for labor demand.

This approach provides a different way of classifying and describing how the world of work is developing — for example, the types of skills needed in an automation environment. The classification scheme was also useful in determining the relationship between labor demand and wages. Using classifications based on job descriptions helps to identify a clearer relationship between labor demand and offered and agreed wages.
Diversity

One of the potentially most productive areas of big data research in the macro and financial sectors is the use of words, not numbers, as data. Semantic data and semantic search methods have a rich pedigree in other social sciences, such as sociology and psychology. But until now, their use in economics and finance has been relatively limited. 43 (43 - Notable examples include Schonhardt-Bailey (2013) and Goldsmith-Pinkham, Hirtle and Lucca (2016))

Like other social sciences, economics and finance are associated with human choice. And we know that people often rely on heuristics or stories, rather than statistics, when they make sense of the world and make decisions. Thus, the semantic perception of these stories is important for understanding human behavior and making decisions.

For example, the Bank recently began to learn the language that it uses in external communication, whether with financial firms or the public at large. For example, Michael McMahon of Oxford University and I recently appreciated how simplifying the wording of the Monetary Policy Committee (MPC) in the inflation report late last year increased public understanding of monetary policy messages. 44 (44 - Haldane and McMahon (forthcoming)).

The second example examines a much less studied aspect of a bank's decision-making — its oversight of financial firms. 45 (45 - Bholat et al 2017) This is based on a textual analysis of the Bank's confidential periodic summary meetings (PSMs) sent to financial firms. These are perhaps the single most important letters that the prudential regulatory authority (PRA) regularly sends to firms, setting out an assessment of the firms ’risk by supervisors and requiring action to mitigate those risks. Using a machine learning method called random forests, researchers analyze these letters and extract data about their tone and content.

This type of analysis has a number of policy applications. It can be used to assess whether letters send a clear and consistent supervisory message to firms. For example, you can compare the strength and content of these letters with the bank's internal assessment of the strengths and weaknesses of firms. Are these two approaches consistent with the Bank's oversight system? In general, studies have shown that they are.

This approach can also be used to evaluate how a style of surveillance has evolved over time. For example, how has it changed since the transition in supervisory models from financial services management (FSA) to PRA? The study showed that, compared with these two modes, the exchange of supervisory messages has become more promising, formal and meaningful, which is consistent with the new model of supervisory activities of the PRA.

This exercise, I think, is a good example of applying the new methodology (random forests) to a completely new database (bank supervisory assessments) in the field of politics, which has not been studied previously by researchers (supervision of financial firms). He comes to conclusions that are directly related to policy issues. So, I think it beautifully emphasizes the prospects of big data.

In my last example, not new, but old data is used. However, I think this is a good illustration of how new methods can also be used to understand the past. Long before the Bank became responsible for monetary policy and financial stability, one of the key roles of the bank was to provide loans, as a last resort, to commercial banks experiencing liquidity pressure.

It is difficult to date accurately, but the bank began to conduct such operations seriously, probably around the time when Great Britain faced a constant series of banking panics in 1847, 1857 and 1866. The bank responded to this panic by providing liquidity to support banks. The last credit facility came into being, as Badgehot subsequently began to call it. 46 (46 - Bagehot 1873) In fact, later Bagehot determined the principles of such lending: it should occur freely, with a penalty for good collateral.

An interesting historical question related to today is whether the bank really adhered to these principles when lending to the last resort during the panic of 1847, 1857 and 1866. To evaluate this, we took data from gigantic paper books that record changes in the bank's balance sheet, where these interventions were recorded on credit for the loan, counterparty for the counterparty, interest rate for the interest rate. 47 (47 - Anson et al 2017)

Deciphering this data was beneficial in that the handwritten notes in the books were made by a small number of clerks during the three crises — one of the indirect advantages of business continuity. While the data was mainly recorded manually, the project developed an image recognition system using the neural network algorithm, which we will use in the future to turn historical book operations into 21st century machine-readable data.

The data on historical lending to the bank's last resort are new and very detailed, big data from a bygone era. This shows that the Bank's approach to lending in extreme cases has changed significantly during the crises of the mid-19th century. This meant that by the time of the crisis of 1866 the Bank more or less followed the principles of lending as a last resort, later set forth by Badgehot. This is another example of leading empirical theory.

Machine learning methods are applied to statistics regularly collected and reported by the bank. In particular, these methods are used to identify errors or anomalies in the source data provided to the bank. This makes data cleaning much more systematic and efficient than possible with manual processes. Data analysis methods can also be used to compare new sources of granular data. This not only provides another way to verify the reliability of the data, but can also give an idea that individual data sources cannot disclose on their own. 48 (48 - Bahaj, Foulis and Pinter (2017), for example, match firm-level accounting datatransaction-level house price data and loan-level residential mortgage data to show how the house price of the director of an SME can affect their firm »s investment and wage bill.) In the Bank of England, as elsewhere, robots are on the rise.

A look into the future


Looking to the future, it can be noted that there are many potential areas in which these new sources and new methods can be expanded to improve the bank's understanding of the economic and financial system. From a long list, let me discuss one that seems to me to be of particular importance.

Behavioral economics, by right, has made a big splash over the past few years in changing the way economists think about how human decisions are made. Human decisions and actions deviate, often significantly and consistently, from rational expectations, which are often accepted as normal. 49 (49 - Rotemberg (1984), for example, discusses the statistical rejection of rational expectations models for consumption and labor demand.) Rules of thumb and heuristics dominate human decision making. And the expectations formed by people are often shaped largely by the history, emotions and actions of other people, as well as by rational calculation.

This behavior seems to be important both for individuals (microeconomics) and for societies (macroeconomics). For example, popular narratives that develop in financial markets and in everyday public discourse have proven to be important empirical factors for fluctuations in asset prices and economic activity. 50 (50 - Tuckett and Nyman (2017), Shiller (2017) and Nyman et al (2018) ) These narratives can be especially important during periods of economic and financial stress, when emotions are heating up, and social stories acquire additional significance.

And yet, when it comes to measuring such behavior, whether at the micro or macroeconomic level, our existing methods are often poorly equipped. Catching the true feelings and preferences of people is damn difficult. Traditional surveys of market participants or the general public, as a rule, are biased in their sample and are formulated in the responses. As in quantum physics, the act of observation itself can change behavior.

These realities may require the study of unconventional ways to identify people's preferences and moods. As a recent example, one can cite data on music downloads from Spotify, which were used in tandem with semantic search methods applied to song lyrics to provide an indicator of people's mood. Interestingly, the resulting sentiment index, at least, tracks consumer spending just as well as the Michigan Consumer Confidence Survey.51 (51 - Sabouni 2018).

And why dwell on music? The tastes of people in books, television and radio can also open a window into their souls. Just like their taste in games. Indeed, I am interested in the potential of using gaming techniques not only to extract data about people's preferences, but also as a means of generating data about preferences and actions.

Existing models, empirical and theoretical, often make strong assumptions about agent behavior. Theoretical models are based on axiomatic assumptions. Empirical models are based on historical patterns of behavior. These restrictions may or may not be confirmed in future behavior. If this is not the case, then the model will be decomposed into a sample, as the (deductive) DSGE model and the (inductive) Google flu model did.

The gaming environment can be used to understand behavior so that there are fewer restrictions. The behavior of people will be observed directly in the act of the game, which, provided that this behavior is a reasonable reflection of true behavior, will give us new data. Since this is a virtual, not a real world where Shocks are controlled and regulated, this can facilitate the resolution of causality and identification issues in response to shocks, including political Shocks.

There are already games involving several people and primitive economies that allow goods and money to change hands between the participants. These include EVE Online and World of Warcraft. Some economists have begun to use gaming technology to understand behavior.52 (52 - For example, Lehdonvirta and Castronova (2014) For example, Stephen Levitt (from the Freakonomics fame) used gaming platforms to understand the demand curve for virtual goods.53 (53 - Levitt et al (2016)

The idea here would be to use a dynamic game with several people to study behavior in a virtual economy. This will include the interaction of players - for example, the emergence of popular narratives that form expenses or savings. And this may include the reaction of players to interference in politics - for example, their reaction to monetary and regulatory policies. Indeed, in the latter role, the game could serve as a test bench for political action — a large-scale, dynamic, digital focus group. 54 (54 - Yanis Varoufakis has previously been involved with a similar idea: uk.businessinsider.com/yanis-varoufakis-valve -gameeconomy-greek-finance-2015-2 )

Artificial intelligence specialists create virtual environments to accelerate the process of studying the dynamics of systems. “Learning with reinforcement” allows algorithms to learn and update based on interactions between virtual players, rather than limited historical experience. 55 (55 - See deepmind.com/blog/deep-reinforcement-learning for a discussion) At least in In principle, a virtual economy would allow politicians to participate in their own reinforcement training, speeding up their discovery process about the behavior of a complex economic and financial system.

Conclusion


So will big data keep their promise? I am sure that it will be so. Economics and finance need to constantly invest in big data and data analytics to balance methodological scales. And early studies, including at the bank, show that returns on such activities can be high, deepening our understanding of the economy and financial system.

These results will be best obtained if close collaboration is established between statistical authorities, policy makers, the commercial sector, research centers and academia. The Bank of England can play a catalytic role in pooling this expertise. DAFM can do the same. I wish DAFM every success and look forward to working with you.

References
Abad, J, Aldasoro, I, Aymanns, C, D»Errico, M, Rousová, L F, Hoffmann, P, Langfield, S, Neychev, M and Roukny, T (2011), «Shedding light on dark markets: First insights from the new EU-wide OTC derivatives dataset», ESRB Occasional Paper Series, No. 11.

Albertazzi, U, Becker, B and Boucinha, M (2018), «Portfolio rebalancing and the transmission of largescale asset programmes: evidence from the euro area», ECB Working Paper Series, No. 2125.

Ali, R, Haldane, A and Nahai-Williamson, P (2012), «Towards a common financial language», paper available at www.bankofengland.co.uk/paper/2012/towards-a-common-financial-language
Anderson, C (2008), «The End of Theory: The Data Deluge Makes The Scientific Method Obsolete», Wired Magazine, 23 June.

Anson, M, Bholat, D, Kang, M and Thomas, R (2017), «The Bank of England as lender of last resort: new historical evidence from daily transactional data», Bank of England Staff Working Paper, No. 691.

Bacon, F (1620), Novum Organum.

Bagehot, W (1873), Lombard Street: A Description of the Money Market, Henry S. King & Co.

Bahaj, S, Foulis, A and Pinter, G (2017), «Home values and firm behaviour», Bank of England Staff Working Paper, No. 679.

Bank of England and Procyclicality Working Group (2014), «Procyclicality and structural trends in investment allocation by insurance companies and pension funds», Discussion Paper, July.

Baptista, R, Farmer, JD, Hinterschweiger, M, Low, K, Tang, D and Uluc, A (2016), «Macroprudential policy in an agent-based model of the UK housing market», Bank of England Staff Working Paper, No. 619.

Bean, C (2016), «Independent Review of UK Economic Statistics», available at www.gov.uk/government/publications/independent-review-of-uk-economic-statistics-final-report
Benetton, M, Bracke, P and Garbarino, N (2018), «Down payment and mortgage rates: evidence from equity loans», Bank of England Staff Working Paper, No. 713.

Bholat, D, Brookes, J, Cai, C, Grundy, K and Lund, J (2017), «Sending firm messages: text mining letters from PRA supervisors to banks and building societies they regulate, Bank of England Staff Working Paper, No. 688.

Bholat, D, Hansen, S, Santos, P and Schonhardt-Bailey, C (2015), «Text mining for central banks», Bank of England Centre for Central Bank Studies Handbook.

Bonollo, M, Crimaldi, I, Flori, A, Gianfanga, L and Pammolli, F (2016), «Assessing financial distress dependencies in OTC markets: a new approach using trade repositories data», Financial Markets and Portfolio Management, Vol. 30, No. 4, pp. 397-426.

Bracke, P and Tenreyro, S (2016), «History dependence in the housing market», Bank of England Staff Working Paper, No. 630.

Carney, M (2015), speech at Launch Conference for One Bank Research Agenda, available at www.bankofengland.co.uk/speech/2015/one-bank-research-agenda-launch-conference

Cavallo, A and Rigobon, R (2016), «The Billion Prices Project: Using Online Prices for Measurement and Research», Journal of Economic Perspectives, Vol. 30, No. 2, pp. 151-78.

Chakraborty, C, Gimpelewicz, M and Uluc, A (2017), «A tiger by the tail: estimating the UK mortgage market vulnerabilities from loan-level data, Bank of England Staff Working Paper, No. 703.

Chakraborty, C and Joseph, A (2017), «Machine learning at central banks», Bank of England Staff Working Paper, No. 674.

Cielenska, O, Joseph, A, Shreyas, U, Tanner, J and Vasios, M (2017), «Gauging market dynamics using trade repository data: the case of the Swiss franc de-pegging», Bank of England Financial Stability Paper, No. 41.

Cœuré, B (2017), «Policy analysis with big data», speech at the conference on «Economic and Financial Regulation in the Era of Big Data».

Coyle, D (2014), GDP: A Brief but Affectionate History, Princeton University Press.

Duffie, D, Gârleanu, N and Pedersen, L (2005), «Over-the-Counter Markets», Econometrica, Vol. 73, No.6, pp. 1815-1847.

Dwoskin, E (2015), «New Report Puts Numbers on Data Scientist Trend», Wall Street Journal, 7 October.

Economist (2017), «The world»s most valuable resource is no longer oil, but data», article on 6 May 2017.

Ericsson (2017), Ericsson Mobility Report, November 2017.

Farrell, D and Wheat, C (2015), «Profiles of Local Consumer Commerce», JPMorgan Chase & Co. Institute.

Financial Stability Board (2010), «Implementing OTC Derivatives Market Reforms», Financial Stability Board.

Fouquet, R and Broadberry, S (2015), «Seven Centuries of European Economic Growth and Decline», Journal of Economic Perspectives, Vol. 29, No. 4, pp. 227-244.

Ginsberg, J, Hohebbi, M, Patel, R, Brammer, L, Smolinski, M and Brilliant, L (2009), «Detecting influenza epidemics using search engine data», Nature, Vol. 457, pp. 1012-1014.

Goldsmith-Pinkham, P, Hirtle, B and Lucca, D (2016), «Parsing the Content of Bank Supervision», Federal Reserve Bank of New York Staff Reports, No. 770.

Haldane, A (2016), «The Dappled World», speech available at www.bankofengland.co.uk/speech/2016/the-dappled-world

Haldane, A and McMahon, M (forthcoming), «Central Bank Communication and the General Public», American Economic Review: Papers & Proceedings.

Henderson, V, Storeygard, A and Weil, D (2011), «A Bright Idea for Measuring Economic Growth», American Economic Review: Papers & Proceedings, Vol. 101, No. 3, pp. 194-99.

Henke, N, Bughin, J, Chui, M, Manyika, J, Saleh, T, Wiseman, B and Sethupathy, G (2016), «The Age of Analytics: Competing in a Data-Driven World», McKinsey Global Institute.

IMF (2018), «Cyclical Upswing, Structural Change», World Economic Outlook, April 2018.

Lagos, R, Rocheteau, G and Weill, P-O (2011), «Crises and liquidity in over-the-counter markets», Journal of Economic Theory, Vol. 146, No. 6, pp. 2169-2205.

Lazer, D, Kennedy, R, King, G and Vespignani, A (2014), «The Parable of Google Flu: Traps in Big Data Analysis», Science, Vol. 343, pp. 1203-1205.

Leamer, E (1983), «Let»s Take the Con Out of Econometrics», American Economic Review, Vol. 73, No. 1, pp. 31-43.

Lehdonvirta, V and Castronova, E (2014), Virtual Economies: Design and Analysis, MIT Press.

Levitt, S, List, J, Neckermann, S and Nelson, D (2016), «Quantity discounts on a virtual good: The results of a massive pricing experiment at Kind Digital Entertainment», Proceedings of the National Academy of Sciences of the United States of America, Vol. 113, No. 27, pp. 7323-7328.

Moore, G (1965), «Cramming more components onto integrated circuits», Electronics, Vol. 38, No. 8.

Nyman, R, Kapadia, S, Tuckett, D, Gregory, D, Ormerod, P and Smith, R (2018), «News and narratives in financial systems: exploiting big data for systemic risk assessment», Bank of England Staff Working Paper, No. 704.

Obstfeld, M and Rogoff, K (2001), «The Six Major Puzzles in International Macroeconomics: Is There a Common Cause?», NBER Macroeconomics Annual, Vol. 15, MIT Press.

Popper, K (1934), Logik der Forschung, Akademie Verlag.

Popper, K (1959), The Logic of Scientific Discovery, Routledge.

Rotemberg, J (1984), «Interpreting the Statistical Failures of Some Rational Expectations Models», American Economic Review, Vol. 74, No. 2, pp. 188-193.

Sabouni, H (2018), «The Rhythm of Markets», mimeo.

Schonhardt-Bailey, C (2013), Deliberating American Monetary Policy: A Textual Analysis, MIT Press.

Schwab, K (2017), The Fourth Industrial Revolution, Portfolio Penguin.

Shiller, R (2017), «Narrative Economics», American Economic Review, Vol. 104, No. 4, pp. 967-1004.

SINTEF (2013), «Big Data, for better or worse: 90% of world»s data generated over last two years», ScienceDaily, 22 May.

Stiglitz, J (2018), «Where modern macroeconomics went wrong», Oxford Review of Economy Policy, Vol. 34, No. 1-2, pp. 70-106.

Tuckett, D and Nyman, R (2017), «The relative sentiment shift series for tracking the economy», mimeo.

Turrell, A, Speigner, B, Thurgood, J, Djumalieva, J and Copple, D (forthcoming), «Using Online Vacancies to Understand the UK Labour Market from the Bottom-Up», Bank of England Staff Working Paper.

All Articles