nep-big New Economics Papers
on Big Data
Issue of 2022‒05‒23
twelve papers chosen by
Tom Coupé
University of Canterbury

  1. How causal machine learning can leverage marketing strategies: Assessing and improving the performance of a coupon campaign By Henrika Langen; Martin Huber
  2. Interpretable Prediction of Urban Mobility Flows with Deep Neural Networks as Gaussian Processes By Aike Steentoft, Bu-Sung Lee, Markus Schläpfer
  3. Adversarial Estimators By Jonas Metzger
  4. The Innovation Index in Europe By Leogrande, Angelo; Laureti, Lucio; Costantiello, Alberto
  5. JAQ of All Trades: Job Mismatch, Firm Productivity and Managerial Quality By Luca Coraggio; Marco Pagano; Annalisa Scognamiglio; Joacim Tåg
  6. Advisory algorithms and liability rules By Marie Obidzinski; Yves Oytana
  7. AI, Ageing and Brain-Work Productivity: Technological Change in Professional Japanese Chess By Eiji Yamamura; Ryohei Hayashi
  8. More Than Words: Fed Chairs’ Communication During Congressional Testimonies By Michelle Alexopoulos; Xinfen Han; Oleksiy Kryvtsov; Xu Zhang
  9. THE EFFECTS OF THE ECB COMMUNICATIONS ON FINANCIAL MARKETS BEFORE AND DURING COVID-19 PANDEMICAbstract:The paper aims to estimate the effects of the European Central Bank communications on the sectoral returns of STOXX Europe 600 from 2013 to 2021. Previous literature has investigated the effects of communications of central banks and checked their effects on macroeconomics and financial data. New opportunities offered by text mining analysis allow us to find new insights into these aspects. However, studies focusing on how text mining indices derived from central banks’ communications can affect different financial sectors are more limited. In this paper, we use different sentiment and topic indices derived from the European Central Bank’s speeches. The paper shows how these different topics and sentiment indices affect the returns on different financial sectors. Our results indicate that the topic of communications is more influential on returns of sectoral indices than the type of communications. Moreover, we find that monetary policy and financial stability topics are the most relevant. We also find that during the COVID-19 time, the number of negative speeches is relevant for almost all the sectoral index returns. By Luca Alfieri; Mustafa Hakan Eratalay; Darya Lapitskaya; Rajesh Sharma
  10. Measuring Firm Activity from Outer Space By Katarzyna A. Bilicka; André Seidel
  11. Diagnosis in a fish farmer’s backpack By Barnes, Andrew C.; Das, Suvra; Silayeva, Oleksandra; Wilkinson, Shaun; Cagua, Fernando; Delamare-Deboutteville, Jerome
  12. Privacy Costs and Consumer Data Acquisition: An Economic Analysis of Data Privacy Regulation By Zhijun Chen

  1. By: Henrika Langen; Martin Huber
    Abstract: We apply causal machine learning algorithms to assess the causal effect of a marketing intervention, namely a coupon campaign, on the sales of a retail company. Besides assessing the average impacts of different types of coupons, we also investigate the heterogeneity of causal effects across subgroups of customers, e.g. across clients with relatively high vs. low previous purchases. Finally, we use optimal policy learning to learn (in a data-driven way) which customer groups should be targeted by the coupon campaign in order to maximize the marketing intervention's effectiveness in terms of sales. Our study provides a use case for the application of causal machine learning in business analytics, in order to evaluate the causal impact of specific firm policies (like marketing campaigns) for decision support.
    Date: 2022–04
  2. By: Aike Steentoft, Bu-Sung Lee, Markus Schläpfer
    Abstract: The ability to understand and predict the flows of people in cities is crucial for the planning of transportation systems and other urban infrastructures. Deep-learning approaches are powerful since they can capture non-linear relations between geographic features and the resulting mobility flow from a given origin location to a destination location. However, existing methods cannot quantify the uncertainty of the predictions, limiting their interpretability and thus their use for practical applications in urban infrastructure planning. To that end, we propose a Bayesian deep-learning approach that formulates deep neural networks as Gaussian processes and integrates automatic variable selection. Our method provides uncertainty estimates for the predicted origin-destination flows while also allowing to identify the most critical geographic features that drive the mobility patterns. The developed machine learning approach is applied to large-scale taxi trip data from New York City.
    Keywords: mobility, Bayesian deep learning, smart cities, transportation system planning
    JEL: C45 R41
    Date: 2022–05
  3. By: Jonas Metzger
    Abstract: We develop an asymptotic theory of adversarial estimators (`A-estimators'). Like maximum-likelihood-type estimators (`M-estimators'), both the estimator and estimand are defined as the critical points of a sample and population average respectively. A-estimators generalize M-estimators, as their objective is maximized by one set of parameters and minimized by another. The continuous-updating Generalized Method of Moments estimator, popular in econometrics and causal inference, is among the earliest members of this class which distinctly falls outside the M-estimation framework. Since the recent success of Generative Adversarial Networks, A-estimators received considerable attention in both machine learning and causal inference contexts, where a flexible adversary can remove the need for researchers to manually specify which features of a problem are important. We present general results characterizing the convergence rates of A-estimators under both point-wise and partial identification, and derive the asymptotic root-n normality for plug-in estimates of smooth functionals of their parameters. All unknown parameters may contain functions which are approximated via sieves. While the results apply generally, we provide easily verifiable, low-level conditions for the case where the sieves correspond to (deep) neural networks. Our theory also yields the asymptotic normality of general functionals of neural network M-estimators (as a special case), overcoming technical issues previously identified by the literature. We examine a variety of A-estimators proposed across econometrics and machine learning and use our theory to derive novel statistical results for each of them. Embedding distinct A-estimators into the same framework, we notice interesting connections among them, providing intuition and formal justification for their recent success in practical applications.
    Date: 2022–04
  4. By: Leogrande, Angelo; Laureti, Lucio; Costantiello, Alberto
    Abstract: The following article analyzes the determinants of the innovation index in Europe. The data refer to the European Innovation Scoreboard-EIS of the European Commission for the period between 2010 and 2019 for 36 countries. The data are analyzed using the following econometric techniques: Panel Data with Random Effects, Panel Data with Fixed Effects, Dynamic Panel Data, Pooled OLS, WLS. The results show that the Innovation Index is negatively connected to some variables, among which the most significant are "GDP per capita", "R&D expenditure public sector", "Venture capital", "Tertiary education", and positively connected to some variables among which the most relevant are: "Government procurement of advanced technology products", "Average annual population growth", "Finance and support", "Human resources", "Marketing or organisational innovators", "Linkages". A clustering was then carried out using the unsupervised k-Means algorithm optimized with the Silhouette coefficient which shows the presence of 2 clusters per value of the Innovation Index. Eight machine learning algorithms has been used for prediction with real data. The Tree Ensemble Regression algorithm has been chosen as best performer. A further prediction has been made with the augmented data. The result shows that the best performing algorithm is Linear Regression with an innovation index value predicted to grow by approximately 3.38%.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O30 O31 O32 O33 O34
    Date: 2022–04–23
  5. By: Luca Coraggio (University of Naples Federico II); Marco Pagano (University of Naples Federico II and EIEF); Annalisa Scognamiglio (University of Naples Federico II); Joacim Tåg (Research Institute of Industrial Economics (IFN))
    Abstract: Does the matching between workers and jobs help explain productivity differentials across firms? To address this question we develop a job-worker allocation quality measure (JAQ) by combining employer-employee administrative data with machine learning techniques. The proposed measure is positively and significantly associated with labor earnings over workers’ careers. At firm level, it features a robust positive correlation with firm productivity, and with managerial turnover leading to an improvement in the quality and experience of management. JAQ can be constructed for any employer-employee data including workers’ occupations, and used to explore the effect of corporate restructuring on workers’ allocation and careers.
    Date: 2022
  6. By: Marie Obidzinski (Université Paris Panthéon Assas, CRED EA 7321, 75005 Paris, France); Yves Oytana (CRESE EA3190, Univ. Bourgogne Franche-Comté, F-25000 Besançon, France)
    Abstract: We study the design of optimal liability rules when the use of an advisory algorithm by a human operator (she) may generate an external harm. An artificial intelligence (AI) manufacturer (he) chooses the level of quality with which the algorithm is developed and the price at which it is distributed. The AI gives a prediction about the state of the world to the human operator who buys it, who can then decide to exert a judgment effort to learn the payoffs in each possible state of the world. We show that when the human operator overestimates the algorithm's accuracy (overestimation bias), imposing a strict liability rule on her is not optimal, because the AI manufacturer will exploit the bias by under-investing in the quality of the algorithm. Conversely, imposing a strict liability rule on the AI manufacturer may not be optimal either, since it has the adverse effect of preventing the human operator from exercising her judgment effort. We characterize the liability sharing rule that achieves the highest possible quality level of the algorithm, while ensuring that the human operator exercises a judgment effort. We then show that, when it can be used, a negligence rule generally achieves the first-best optimum. To conclude, we discuss the pros and cons of each type of rule.
    Keywords: Liability rules, Decision-making, Artificial intelligence, Cognitive bias, Judgment, Prediction
    JEL: K4
    Date: 2022–05
  7. By: Eiji Yamamura; Ryohei Hayashi
    Abstract: Using Japanese professional chess (Shogi) players records in the novel setting, this paper examines how and the extent to which the emergence of technological changes influences the ageing and innate ability of players winning probability. We gathered games of professional Shogi players from 1968 to 2019. The major findings are: (1) diffusion of artificial intelligence (AI) reduces innate ability, which reduces the performance gap among same-age players; (2) players winning rates declined consistently from 20 years and as they get older; (3) AI accelerated the ageing declination of the probability of winning, which increased the performance gap among different aged players; (4) the effects of AI on the ageing declination and the probability of winning are observed for high innate skill players but not for low innate skill ones. This implies that the diffusion of AI hastens players retirement from active play, especially for those with high innate abilities. Thus, AI is a substitute for innate ability in brain-work productivity.
    Date: 2022–04
  8. By: Michelle Alexopoulos; Xinfen Han; Oleksiy Kryvtsov; Xu Zhang
    Abstract: We measure soft information contained in the congressional testimonies of U.S. Federal Reserve Chairs and analyze its effect on financial markets. Our measures of Fed Chairs’ emotions expressed in words, voice and facial expressions are created using machine learning. Increases in the Chair’s text-, voice-, or face-emotion indices during these testimonies generally raise the SandP500 index and lower the VIX—indicating that these cues help shape market responses to Fed communications. These effects add up and propagate after the testimony, reaching magnitudes comparable to those after a policy rate cut. Markets respond most to the Chair’s emotions expressed about issues related to monetary policy.
    Keywords: Central bank research; Financial markets; Monetary policy communications
    JEL: E52 E58 E71
    Date: 2022–05
  9. By: Luca Alfieri; Mustafa Hakan Eratalay; Darya Lapitskaya; Rajesh Sharma
    Keywords: Monetary policy, Central banking, Text mining, COVID-19
    Date: 2022
  10. By: Katarzyna A. Bilicka; André Seidel
    Abstract: To understand how global firm networks operate, we need consistent information on their activities, unbiased by their reporting choices. In this paper, we collect a novel dataset on the light that factories emit at night for a large sample of car manufacturing plants. We show that nightlight data can measure activity at such a granular level, using annual firm financial data and high-frequency data related to Covid-19 pandemic production shocks. We use this data to quantify the extent of misreported global operations of these car manufacturing firms and examine differences between sources of nightlight.
    JEL: F23 H26 H32
    Date: 2022–04
  11. By: Barnes, Andrew C.; Das, Suvra; Silayeva, Oleksandra; Wilkinson, Shaun; Cagua, Fernando; Delamare-Deboutteville, Jerome
    Abstract: Fish underpin future nutritional security, supplying high quality protein, iron, iodine and vitamin A that are critical to childhood development and deficient in many staple foods. In 2018, 54.1 million tonnes of fish were produced by farming, generating US$138.5 billion and directly employing 19.3 million people, mostly in developing nations. With expansion and intensification, disease losses are increasing and are a priority for the FAO sub-committee on aquaculture. In most developing countries, disease mitigation comprises over-stocking to compensate, and use of readily available antibiotics. Indeed 67 different antimicrobials are used in the 11 major producing countries, contributing to the global pool of antimicrobial resistance (AMR). Accurate identification of the causes and sources of infectious disease is essential for implementation of evidence-based treatment, biosecurity and prevention. Pathogen genomics can provide sufficiently detailed information but has, to date, been too expensive and time consuming. Lab-in-a-backpack uses nanopore sequencing technology and low-cost, low-waste sample preparation to generate whole pathogen genome sequence data from diagnostic samples on the farm without laboratory support. Our simplified safe workflow includes a cloud-based identification tool that returns near real-time information about the pathogen using any laptop or smartphone. This enables evidence-based treatment, epidemiological tracing, AMR surveillance and the production of simple low-cost locally produced ‘autogenous’ vaccines to protect the next crop. These big-data-informed but locally implemented solutions align well with FAO’s recently proposed Progressive Management Pathway for Improving Aquaculture Biosecurity, and can deliver real advances in local economy, nutritional security, antimicrobial stewardship and animal welfare.
    Keywords: Food Consumption/Nutrition/Food Safety
    Date: 2021
  12. By: Zhijun Chen (Monash University, Department of Economics)
    Abstract: General Data Protection Regulation (GDPR) aims to protect consumer data privacy, however, its adverse effects have been widely documented. We present a new model for the analysis of consumer data acquisition under privacy regulation. We treat both data and analytics as separate strategic variables and consider the heterogeneity of privacy costs across consumers. Using this model to examine the impact of GDPR, we identify a market failure before GDPR and find that GDPR activates a market for data acquisition by imposing consent requirements on data acquisition. We further study the optimal design of the mechanism for consumer data acquisition and deliver important policy implications for implementing the social optimum.
    Keywords: Data acquisition, Privacy Costs, and Data Analytics
    JEL: D47 L11 L40 K21
    Date: 2022–04

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.