nep-big 2019-03-18 papers

on Big Data

Issue of 2019‒03‒18
fifteen papers chosen by
Tom Coupé
University of Canterbury

Shapley regressions: A framework for statistical inference on machine learning models By Andreas Joseph
Shapley regressions: a framework for statistical inference on machine learning models By Joseph, Andreas
Spatial inequality, geography and economic activity By Sandra Achten; Christian Leßmann
Nowcasting Recessions using the SVM Machine Learning Algorithm By Alexander James; Yaser S. Abu-Mostafa; Xiao Qiao
Investments in big data analytics and firm performance: an empirical investigation of direct and mediating effects By Elisabetta Raguseo; Claudio Vitari
Financial Applications of Gaussian Processes and Bayesian Optimization By Joan Gonzalvez; Edmond Lezmi; Thierry Roncalli; Jiali Xu
'Whatever it Takes' to Change Belief: Evidence from Twitter By Michael Stiefel; Rémi Vivès
'Whatever it Takes' to Change Belief: Evidence from Twitter By Michael Stiefel; Rémi Vivès
Measuring Monetary Policy Surprises Using Text Mining: The Case of Korea By Youngjoon Lee; Soohyon Kim; Ki Young Park
Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data By John M. Abowd; Joelle Abramowitz; Margaret C. Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann M. Rodgers; Matthew D. Shapiro; Nada Wasi
Tracking Uncertainty through the Relative Sentiment Shift Series By Seohyun Lee; Rickard Nyman
High-dimensional sparse financial networks through a regularised regression model By Bernardi, Mauro; Costola, Michele
Epidemiology of Inflation Expectations and Internet Search- An Analysis for India By Jha, Saakshi; Sahu, Sohini; Chattopadhyay, Siddhartha
Social Pressure or Rational Reactions to Incentives? A Historical Analysis of Reasons for Referee Bias in the Spanish Football By Tena Horrillo, Juan de Dios; Reade, J. James; Cabras, Stefano
Learning about digital trade: Privacy and e-commerce in CETA and TPP By Robert Wolfe

Shapley regressions: A framework for statistical inference on machine learning models

By:	Andreas Joseph
Abstract:	Machine learning models often excel in the accuracy of their predictions but are opaque due to their non-linear and non-parametric structure. This makes statistical inference challenging and disqualifies them from many applications where model interpretability is crucial. This paper proposes the Shapley regression framework as an approach for statistical inference on non-linear or non-parametric models. Inference is performed based on the Shapley value decomposition of a model, a pay-off concept from cooperative game theory. I show that universal approximators from machine learning are estimation consistent and introduce hypothesis tests for individual variable contributions, model bias and parametric functional forms. The inference properties of state-of-the-art machine learning models - like artificial neural networks, support vector machines and random forests - are investigated using numerical simulations and real-world data. The proposed framework is unique in the sense that it is identical to the conventional case of statistical inference on a linear model if the model is linear in parameters. This makes it a well-motivated extension to more general models and strengthens the case for the use of machine learning to inform decisions.
Date:	2019–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1903.04209&r=all

Shapley regressions: a framework for statistical inference on machine learning models

By:	Joseph, Andreas (Bank of England)
Abstract:	Machine learning models often excel in the accuracy of their predictions but are opaque due to their non-linear and non-parametric structure. This makes statistical inference challenging and disqualifies them from many applications where model interpretability is crucial. This paper proposes the Shapley regression framework as an approach for statistical inference on non-linear or non-parametric models. Inference is performed based on the Shapley value decomposition of a model, a pay-off concept from cooperative game theory. I show that universal approximators from machine learning are estimation consistent and introduce hypothesis tests for individual variable contributions, model bias and parametric functional forms. The inference properties of state-of-the-art machine learning models — like artificial neural networks, support vector machines and random forests — are investigated using numerical simulations and real-world data. The proposed framework is unique in the sense that it is identical to the conventional case of statistical inference on a linear model if the model is linear in parameters. This makes it a well-motivated extension to more general models and strengthens the case for the use of machine learning to inform decisions.
Keywords:	Machine learning; statistical inference; Shapley values; numerical simulations; macroeconomics; time series
JEL:	C45 C52 C71 E47
Date:	2019–03–08
URL:	http://d.repec.org/n?u=RePEc:boe:boeewp:0784&r=all

Spatial inequality, geography and economic activity

By:	Sandra Achten; Christian Leßmann
Abstract:	We study the effect of spatial inequality on economic activity. Given that the relationship is highly simultaneous in nature, we use exogenous variation in geographic features to construct an instrument for spatial inequality, which is independent from any man-made factors. Inequality measures and instruments are calculated based on grid-level data for existing countries as well as for artificial countries. In the construction of the instrumental variable, we use both a parametric regression analysis as well as a random forest classification algorithm. Our IV regressions show a significant negative relationship between spatial inequality and economic activity. This result holds if we control for country-level averages of different geographic variables. Therefore, we conclude that geographic heterogeneity is an important determinant of economic activity.
Keywords:	regional inequality, spatial inequality, economic activity, development, geography, machine learning
JEL:	R12 O15
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_7547&r=all

Nowcasting Recessions using the SVM Machine Learning Algorithm

By:	Alexander James; Yaser S. Abu-Mostafa; Xiao Qiao
Abstract:	We introduce a novel application of Support Vector Machines (SVM), an important Machine Learning algorithm, to determine the beginning and end of recessions in real time. Nowcasting, "forecasting" a condition about the present time because the full information about it is not available until later, is key for recessions, which are only determined months after the fact. We show that SVM has excellent predictive performance for this task, and we provide implementation details to facilitate its use in similar problems in economics and finance.
Date:	2019–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1903.03202&r=all

Investments in big data analytics and firm performance: an empirical investigation of direct and mediating effects

By: Elisabetta Raguseo (Polito - Politecnico di Torino [Torino]); Claudio Vitari (IAE de Paris)

Date: 2017–11–06

URL: http://d.repec.org/n?u=RePEc:hal:journl:halshs-01923259&r=all

Financial Applications of Gaussian Processes and Bayesian Optimization

By:	Joan Gonzalvez; Edmond Lezmi; Thierry Roncalli; Jiali Xu
Abstract:	In the last five years, the financial industry has been impacted by the emergence of digitalization and machine learning. In this article, we explore two methods that have undergone rapid development in recent years: Gaussian processes and Bayesian optimization. Gaussian processes can be seen as a generalization of Gaussian random vectors and are associated with the development of kernel methods. Bayesian optimization is an approach for performing derivative-free global optimization in a small dimension, and uses Gaussian processes to locate the global maximum of a black-box function. The first part of the article reviews these two tools and shows how they are connected. In particular, we focus on the Gaussian process regression, which is the core of Bayesian machine learning, and the issue of hyperparameter selection. The second part is dedicated to two financial applications. We first consider the modeling of the term structure of interest rates. More precisely, we test the fitting method and compare the GP prediction and the random walk model. The second application is the construction of trend-following strategies, in particular the online estimation of trend and covariance windows.
Date:	2019–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1903.04841&r=all

'Whatever it Takes' to Change Belief: Evidence from Twitter

By:	Michael Stiefel (Department of Economics, University of Zurich); Rémi Vivès (Aix-Marseille Univ., CNRS, EHESS, Centrale Marseille, AMSE)
Abstract:	The sovereign debt literature emphasizes the possibility of avoiding a self-fulfilling default crisis if markets anticipate the central bank to act as lender of last resort. This paper investigates the extent to which changes in belief about an intervention of the European Central Bank (ECB) explain the sudden reduction of government bond spreads for the distressed countries in summer 2012. We study Twitter data and extract belief using machine learning techniques. We find evidence of strong increases in the perceived likelihood of ECB intervention and show that those increases explain subsequent decreases in the bond spreads of the distressed countries.
Keywords:	self-fulfilling default crisis, unconventional monetary policy, Twitter data
JEL:	E44 E58 D83 F34
Date:	2019–02
URL:	http://d.repec.org/n?u=RePEc:aim:wpaimx:1907&r=all

'Whatever it Takes' to Change Belief: Evidence from Twitter

By:	Michael Stiefel (Department of Economics, University of Zurich); Rémi Vivès (AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - Ecole Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique)
Abstract:	The sovereign debt literature emphasizes the possibility of avoiding a self-fulfilling default crisis if markets anticipate the central bank to act as lender of last resort. This paper investigates the extent to which changes in belief about an intervention of the European Central Bank (ECB) explain the sudden reduction of government bond spreads for the distressed countries in summer 2012. We study Twitter data and extract belief using machine learning techniques. We find evidence of strong increases in the perceived likelihood of ECB intervention and show that those increases explain subsequent decreases in the bond spreads of the distressed countries.
Keywords:	self-fulfilling default crisis,unconventional monetary policy,Twitter data
Date:	2019–02
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:halshs-02053429&r=all

Measuring Monetary Policy Surprises Using Text Mining: The Case of Korea

By:	Youngjoon Lee (School of Business, Yonsei University); Soohyon Kim (Economic Research Institute, Bank of Korea); Ki Young Park (School of Economics, Yonsei University)
Abstract:	We propose a novel approach to measure monetary policy shocks using sentiment analysis. We quantify the tones of 24,079 news articles around 152 dates of Monetary Policy Board (MPB) meetings of the Bank of Korea (BOK) from March 2005 to November 2017. We then measure monetary policy surprises using the changes of those tones following monetary policy announcements and estimate the impact of monetary policy surprises on asset prices. Our measure of monetary policy surprises better explains changes in long-term rates, while changes in the Bank of Korea's base rate are more closely associated with changes in short-term rates (maturity of one year less). Our results strongly suggest that using a text mining approach to measure monetary policy surprises sheds light on information related to forward guidance and market expectations on future monetary policy.
Keywords:	Monetary policy; Text mining; Central banking; Bank of Korea
JEL:	E43 E52 E58
Date:	2019–03–06
URL:	http://d.repec.org/n?u=RePEc:bok:wpaper:1911&r=all

Optimal Probabilistic Record Linkage: Best Practice for Linking Employers in Survey and Administrative Data

By:	John M. Abowd; Joelle Abramowitz; Margaret C. Levenstein; Kristin McCue; Dhiren Patki; Trivellore Raghunathan; Ann M. Rodgers; Matthew D. Shapiro; Nada Wasi
Abstract:	This paper illustrates an application of record linkage between a household-level survey and an establishment-level frame in the absence of unique identifiers. Linkage between frames in this setting is challenging because the distribution of employment across firms is highly asymmetric. To address these difficulties, this paper uses a supervised machine learning model to probabilistically link survey respondents in the Health and Retirement Study (HRS) with employers and establishments in the Census Business Register (BR) to create a new data source which we call the CenHRS. Multiple imputation is used to propagate uncertainty from the linkage step into subsequent analyses of the linked data. The linked data reveal new evidence that survey respondents’ misreporting and selective nonresponse about employer characteristics are systematically correlated with wages.
Keywords:	Probabilistic record linkage; survey data; administrative data; multiple imputation; measurement error; nonresponse
Date:	2019–03
URL:	http://d.repec.org/n?u=RePEc:cen:wpaper:19-08&r=all

Tracking Uncertainty through the Relative Sentiment Shift Series

By:	Seohyun Lee (Economic Research Institute, Bank of Korea); Rickard Nyman (University College London, Centre for the Study of Decision-Making Uncertainty)
Abstract:	We examine the causal dynamic relationship between economic policy uncertainty and economic activities, using a Local Projection model with external instruments. Based on the psychological theory of conviction narratives, we construct a Relative Sentiment Shift (RSS) index and use it as an instrumental variable that captures exogenous variations in economic policy uncertainty. Our empirical results suggest that an increase in economic policy uncertainty induces recessionary pressures in the economy: reductions in production and employment, a sharp stock market downturn, and a constrained financial market.
Keywords:	Economic narratives, Algorithmic text analysis, Uncertainty, Dynamic causal effect, Local projection, IV Regression
JEL:	E2 D81 C1
Date:	2019–03–06
URL:	http://d.repec.org/n?u=RePEc:bok:wpaper:1912&r=all

High-dimensional sparse financial networks through a regularised regression model

By:	Bernardi, Mauro; Costola, Michele
Abstract:	We propose a shrinkage and selection methodology specifically designed for network inference using high dimensional data through a regularised linear regression model with Spike-and-Slab prior on the parameters. The approach extends the case where the error terms are heteroscedastic, by adding an ARCH-type equation through an approximate Expectation-Maximisation algorithm. The proposed model accounts for two sets of covariates. The first set contains predetermined variables which are not penalised in the model (i.e., the autoregressive component and common factors) while the second set of variables contains all the (lagged) financial institutions in the system, included with a given probability. The financial linkages are expressed in terms of inclusion probabilities resulting in a weighted directed network where the adjacency matrix is built "row by row". In the empirical application, we estimate the network over time using a rolling window approach on 1248 world financial firms (banks, insurances, brokers and other financial services) both active and dead from 29 December 2000 to 6 October 2017 at a weekly frequency. Findings show that over time the shape of the out degree distribution exhibits the typical behavior of financial stress indicators and represents a significant predictor of market returns at the first lag (one week) and the fourth lag (one month).
Keywords:	VAR estimation,Financial Networks,Bayesian inference,Sparsity,Spike-and-Slab prior,Stochastic Search Variable Selection,Expectation-Maximisation
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:zbw:safewp:244&r=all

Epidemiology of Inflation Expectations and Internet Search- An Analysis for India

By:	Jha, Saakshi; Sahu, Sohini; Chattopadhyay, Siddhartha
Abstract:	This paper investigates how inflation expectations of individuals are formed in India. We investigate if the news on inflation plays a role in the formation of inflation expectations following the epidemiology-based work by Carroll (2003). The standard literature on this topic considers news coverage by the print and audio-visual media as the sources of formation of inflation expectations. Instead, we consider the Internet as a potential common source of information based on which agents form their expectations about future inflation. Based on data extracted from Google Trends, our results indicate that during the period 2006 to 2018, the Internet has indeed been a common source of information based on which agents have formed their expectations about future inflation, and the Internet search sentiment has had some impact on inflation expectations. Additionally, based on the inflation expectations series derived from the Google Trends data, we find that there is presence of “information stickiness” in the system since only a small fraction of the population update their inflation expectations each period.
Keywords:	Inflation expectations, Epidemiology, Internet search, Google Trends, India.
JEL:	D84 E31 E58
Date:	2019–03–06
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:92666&r=all

Social Pressure or Rational Reactions to Incentives? A Historical Analysis of Reasons for Referee Bias in the Spanish Football

By:	Tena Horrillo, Juan de Dios; Reade, J. James; Cabras, Stefano
Abstract:	A relevant question in social science is whether cognitive bias can be instigated by social pressure or is it just a rational reaction to incentives in place. Sport, and association football in particular, offers settings in which to gain insights into this question. In this paper we estimate the determinants of the length of time between referee appointments in Spanish soccer as a function of referee decisions in favour of the home and away team in the most recent match by means of a deep-learning model. This approach allows us to capture all interactions among a high-dimensional set of variables without the necessity of specifying them beforehand. Furthermore, deep-learning models are nowadays the state of the art among the predicting models which are needed and here used for estimating effects of a cause. We do not find strong evidence of an incentive scheme that counteracts well-known home referee biases. Our results also suggest that referees are incentivised to deliver a moderate amount of surprise in the outcome of the game what is consistent with the objective function of consumers and tournament organisers.
Keywords:	Sport; Social pressure; Referee bias; Deep-learning model; Causal analysis
Date:	2019–03
URL:	http://d.repec.org/n?u=RePEc:cte:wsrepe:28198&r=all

Learning about digital trade: Privacy and e-commerce in CETA and TPP

By:	Robert Wolfe
Abstract:	It is a truth universally acknowledged that every ambitious 21st century trade agreement is in want of a chapter on electronic commerce. One of the most politically sensitive and technically challenging issues is personal privacy, including cross-border transfer of information by electronic means, use and location of computing facilities, and personal information protection. States are learning to solve the problem of state responsibility for something that does not respect their borders while still allowing 21st century commerce to develop. A comparison of the Canada-European Union Comprehensive Economic and Trade Agreement (CETA) and the Trans-Pacific Partnership (TPP) allows us to see the evolution of the issues thought necessary for an e-commerce chapter, since both include Canada, and to see the differing priorities of the U.S. and the EU, since they are each signatory to one of the agreements, but not of the other. I conclude by seeking generalizations about why we see a mix of aspirational and obligatory provisions in free trade agreements. I suggest that the reasons are that governments are learning how to work with each other in a new domain, and learning about the trade implications of these issues.
Keywords:	digital trade, electronic commerce, trade agreements
Date:	2018–05
URL:	http://d.repec.org/n?u=RePEc:rsc:rsceui:2018/27&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Elisabetta Raguseo (Polito - Politecnico di Torino [Torino]); Claudio Vitari (IAE de Paris)
Date:	2017–11–06
URL:	http://d.repec.org/n?u=RePEc:hal:journl:halshs-01923259&r=all