nep-big New Economics Papers
on Big Data
Issue of 2020‒05‒18
seventeen papers chosen by
Tom Coupé
University of Canterbury

  1. An Economic Approach to Regulating Algorithms By Ashesh Rambachan; Jon Kleinberg; Sendhil Mullainathan; Jens Ludwig
  2. The Impact of the Wuhan Covid-19 Lockdown on Air Pollution and Health: A Machine Learning and Augmented Synthetic Control Approach By Matthew A Cole; Robert J R Elliott; Bowen Liu
  3. The perils of misusing remote sensing data: The case of forest cover By Leopoldo Fergusson; Santiago Saavedra; Juan F. Vargas
  4. Measuring the Occupational Impact of AI: Tasks, Cognitive Abilities and AI Benchmarks By Songul Tolan; Annarosa Pesole; Fernando Martinez-Plumed; Enrique Fernandez-Macias; José Hernandez-Orallo; Emilia Gomez
  5. Estimating Full Lipschitz Constants of Deep Neural Networks By Calypso Herrera; Florian Krach; Josef Teichmann
  6. Denise: Deep Learning based Robust PCA for Positive Semidefinite Matrices By Calypso Herrera; Florian Krach; Josef Teichmann
  7. Expert Imitation in P2P Markets By Ge Gao; Mustafa Caglayan; Yuelei Li; Oleksandr Talavera
  8. Intelligence artificielle, croissance et emploi : le rôle des politiques By Philippe Aghion; Céline Antonin; Simon Bunel
  9. Digitising Agrifood: Pathways and Challenges By Renda, Andrea; Reynolds, Nicole; Laurer, Moritz; Cohen, Gal
  10. Political referenda and investment: evidence from Scotland By Azqueta-Gavaldon, Andres
  11. Artificial Intelligence and Cybersecurity - Task Force Evaluation of the HLEG Trustworthy AI Assessment List (Pilot Version) By Pupillo, Lorenzo; Ferreira, Afonso; Fantin, Stefano
  12. Local Governance Quality and the Environmental Cost of Forced Migration By Aksoy, Cevat Giray; Tumen, Semih
  13. Forecasting Foreign Exchange Rate Movements with k-Nearest-Neighbour, Ridge Regression and Feed-Forward Neural Networks By Milan Fičura
  14. Structural Regularization By Jiaming Mao; Zhesheng Zheng
  15. Drawing policy suggestions to fight Covid-19 from hardly reliable data. A machine-learning contribution on lockdowns analysis. By Bonacini, Luca; Gallo, Giovanni; Patriarca, Fabrizio
  16. Do ECB introductory statements help to predict monetary policy: evidence from tone analysis By Paweł Baranowski; Hamza Bennani; Wirginia Doryń
  17. Words and deeds in managing expectations: empirical evidence on an inflation targeting economy By Paweł Baranowski; Wirginia Doryń; Tomasz Łyziak; Ewa Stanisławska

  1. By: Ashesh Rambachan; Jon Kleinberg; Sendhil Mullainathan; Jens Ludwig
    Abstract: There is growing concern about "algorithmic bias" - that predictive algorithms used in decision-making might bake in or exacerbate discrimination in society. When will these "biases" arise? What should be done about them? We argue that such questions are naturally answered using the tools of welfare economics: a social welfare function for the policymaker, a private objective function for the algorithm designer and a model of their information sets and interaction. We build such a model that allows the training data to exhibit a wide range of "biases." Prevailing wisdom is that biased data change how the algorithm is trained and whether an algorithm should be used at all. In contrast, we find two striking irrelevance results. First, when the social planner builds the algorithm, her equity preference has no effect on the training procedure. So long as the data, however biased, contain signal, they will be used and the algorithm built on top will be the same. Any characteristic that is predictive of the outcome of interest, including group membership, will be used. Second, we study how the social planner regulates private (possibly discriminatory) actors building algorithms. Optimal regulation depends crucially on the disclosure regime. Absent disclosure, algorithms are regulated much like human decision-makers: disparate impact and disparate treatment rules dictate what is allowed. In contrast, under stringent disclosure of all underlying algorithmic inputs (data, training procedure and decision rule), once again we find an irrelevance result: private actors can use any predictive characteristic. Additionally, now algorithms strictly reduce the extent of discrimination against protected groups relative to a world in which humans make all the decisions. As these results run counter to prevailing wisdom on algorithmic bias, at a minimum, they provide a baseline set of assumptions that must be altered to generate different conclusions.
    JEL: C54 D6 J7 K00
    Date: 2020–05
  2. By: Matthew A Cole (University of Birmingham); Robert J R Elliott (University of Birmingham); Bowen Liu (University of Birmingham)
    Abstract: We quantify the impact of the Wuhan Covid-19 lockdown on concentrations of four air pollutants using a two-step approach. First, we use machine learning to remove the confounding effects of weather conditions on pollution concentrations. Second, we use a new Augmented Synthetic Control Method (Ben-Michael et al. 2019) to estimate the impact of the lockdown on weather normalised pollution relative to a control group of cities that were not in lockdown. We find NO2 concentrations fell by as much as 24 ug/m3 during the lockdown (a reduction of 63% from the pre-lockdown level), while PM10 concentrations fell by a similar amount but for a shorter period. The lockdown had no discernible impact on concentrations of SO2 or CO. We calculate that the reduction of NO2 concentrations could have prevented as many as 496 deaths in Wuhan city, 3,368 deaths in Hubei province and 10,822 deaths in China as a whole.
    Keywords: Air pollution, Covid-19, machine learning, synthetic control, health.
    JEL: Q53 Q52 I18 I15 C21 C23
    Date: 2020–05
  3. By: Leopoldo Fergusson; Santiago Saavedra; Juan F. Vargas
    Abstract: Research on deforestation has grown exponentially due to the availability of satellite-based measures of forest cover. One of the most popular is Global Forest Change (GFC). Using GFC, we estimate that the Colombian civil conflict increases ‘forest cover’. Using an alternative source that validates the same remote sensing images in the ground, we find the opposite effect. This occurs because, in spite of its name, GFC measures tree cover, including vegetation other than native forest. Most users of GFC seem unaware of this. In our case, most of the conflicting results are explained by GFC’s misclassification of oil palm crops as ‘forest’. Our findings call for caution when using automated classification of imagery for specific research questions.
    Keywords: Forest Cover, Conflict, Measurement
    JEL: D74 Q23 Q34
    Date: 2020–05–07
  4. By: Songul Tolan (European Commission – JRC); Annarosa Pesole (European Commission - JRC); Fernando Martinez-Plumed (European Commission - JRC); Enrique Fernandez-Macias (European Commission - JRC); José Hernandez-Orallo (Universitat Politècnica de València); Emilia Gomez (European Commission - JRC)
    Abstract: In this paper we develop a framework for analysing the impact of AI on occupations. Leaving aside the debates on robotisation, digitalisation and online platforms as well as workplace automation, we focus on the occupational impact of AI that is driven by rapid progress in machine learning. In our framework we map 59 generic tasks from several worker surveys and databases to 14 cognitive abilities (that we extract from the cognitive science literature) and these to a comprehensive list of 328 AI benchmarks used to evaluate progress in AI techniques. The use of these cognitive abilities as an intermediate mapping, instead of mapping task characteristics to AI tasks, allows for an analysis of AI’s occupational impact that goes beyond automation. An application of our framework to occupational databases gives insights into the abilities through which AI is most likely to affect jobs and allows for a ranking of occupation with respect to AI impact. Moreover, we find that some jobs that were traditionally less affected by previous waves of automation may now be subject to relatively higher AI impact.
    Keywords: artificial intelligence, occupations, tasks
    Date: 2020–04
  5. By: Calypso Herrera (Department of Mathematics, ETH Z\"urich, Switzerland); Florian Krach (Department of Mathematics, ETH Z\"urich, Switzerland); Josef Teichmann (Department of Mathematics, ETH Z\"urich, Switzerland)
    Abstract: We estimate the Lipschitz constants of the gradient of a deep neural network and the network itself with respect to the full set of parameters. We first develop estimates for a deep feed-forward densely connected network and then, in a more general framework, for all neural networks that can be represented as solutions of controlled ordinary differential equations, where time appears as continuous depth. These estimates can be used to set the step size of stochastic gradient descent methods, which is illustrated for one example method.
    Date: 2020–04
  6. By: Calypso Herrera (Department of Mathematics, ETH Z\"urich, Switzerland); Florian Krach (Department of Mathematics, ETH Z\"urich, Switzerland); Josef Teichmann (Department of Mathematics, ETH Z\"urich, Switzerland)
    Abstract: We introduce Denise, a deep learning based algorithm for decomposing positive semidefinite matrices into the sum of a low rank plus a sparse matrix. The deep neural network is trained on a randomly generated dataset using the Cholesky factorization. This method, benchmarked on synthetic datasets as well as on some S&P500 stock returns covariance matrices, achieves comparable results to several state-of-the-art techniques, while outperforming all existing algorithms in terms of computational time. Finally, theoretical results concerning the convergence of the training are derived.
    Date: 2020–04
  7. By: Ge Gao (University of Birmingham); Mustafa Caglayan (Heriot-Watt University); Yuelei Li (Tianjin University); Oleksandr Talavera (University of Birmingham)
    Abstract: This paper investigates expert bidding imitation in peer-to-peer lending platforms. We employ data from, which contains information about 169,779 investors who placed 3,947,996 bids on 111,284 loan listings from 2010 to 2018. The experts are defined as investors who either have more central roles or who spend more time or money on the network. We find that an average investor mimics the bids of expert lenders. Inactive lenders learn top investors' lending behaviour through observational learning and then follow their actions, although they do not know the experts' identity. Finally, we show that experts rarely imitate other experts, yet they exhibit herding behaviour.
    Keywords: Peer-to-Peer Lending; Network Analysis; Expert Imitation; Big Data; Financial Technology
    JEL: G11 G21
    Date: 2020–05
  8. By: Philippe Aghion (Harvard University); Céline Antonin (Observatoire français des conjonctures économiques); Simon Bunel
    Abstract: Dans cet article, nous défendons l’idée que les effets de l’intelligence artificielle (IA) et de l’automatisation sur la croissance et l’emploi dépendent pour une large part des institutions et des politiques. Notre analyse s’articule en deux temps. Dans une première partie, nous démontrons que l’IA peut stimuler la croissance en remplaçant la main‑d’oeuvre par du capital, tant en matière de production de biens et services que de production d’idées. Toutefois, nous soutenons que l’IA peut inhiber la croissance si elle est associée à une politique concurrentielle inadaptée. Dans une seconde partie, nous discutons l’effet de la robotisation sur l’emploi en France au cours de la période 1994‑2014. D’après notre analyse empirique sur données françaises, nous démontrons premièrement que la robotisation réduit l’emploi global au niveau des zones d’emploi, et deuxièmement que les travailleurs ayant un faible niveau d’éducation sont davantage pénalisés par la robotisation que les travailleurs ayant un fort niveau d’éducation. Ce constat suggère que des politiques inadaptées en matière de marché du travail et d’éducation réduisent l’impact positif que l’IA et l’automatisation pourraient avoir sur l’emploi.
    Keywords: Intelligence artificielle; Croissance; Automatisation; Robots; Emploi; Artificial intelligence; Growth; Automation; Robots; Employment
    JEL: J24 O3 O4
    Date: 2019–12
  9. By: Renda, Andrea; Reynolds, Nicole; Laurer, Moritz; Cohen, Gal
    Abstract: As climate change increasingly poses an existential risk for the Earth, scientists and policymakers turn to agriculture and food as areas for urgent and bold action, which need to return within acceptable Planet Boundaries. The links between agriculture, biodiversity and climate change have become so evident that scientists propose a Great Food Transformation towards a healthy diet by 2050 as a major way to save the planet. Achieving these milestones, however, is not easy, both based on current indicators and on the gloomy state of global dialogue in this domain. This is why digital technologies such as wireless connectivity, the Internet of Things, Artificial Intelligence and blockchain can and should come to the rescue. This report looks at the many ways in which digital solutions can be implemented on the ground to help the agrifood chain transform itself to achieve more sustainability. Together with the solution, we identify obstacles, challenges, gaps and possible policy recommendations. Action items are addressed at the European Union both as an actor of change at home, and in global governance, and are spread across ten areas, from boosting connectivity and data governance to actions aimed at empowering small farmers and end users.
    Date: 2019–12
  10. By: Azqueta-Gavaldon, Andres
    Abstract: We present evidence that referenda have a significant, detrimental outcome on investment. Employing an unsupervised machine learning algorithm over the period 2008-2017, we construct three important uncertainty indices underlying reports in the Scottish news media: Scottish independence (IndyRef)-related uncertainty; Brexit-related uncertainty; and Scottish policy-related uncertainty. Examining the relationship of these indices with investment on a longitudinal panel of 3,589 Scottish firms, the evidence suggests that Brexit-related uncertainty associates more strongly than IndyRef -related uncertainty to investment. Our preferred specification suggests that a one standard-deviation increase in Brexit uncertainty foreshadows a reduction in investment by 8% on average in the following year. Besides we find that the uncertainty associated with the Scottish referendum for independence while negligible at the aggregate level, relates more strongly with the investment of listed firms as well as those operating on the border with England. In addition, we present evidence of greater sensitivity to these indices among firms that are financially constrained or whose investment is to a greater degree irreversible. JEL Classification: C80, D80, E22, E66, G18, G31
    Keywords: investment, machine learning, political uncertainty, textual-data
    Date: 2020–05
  11. By: Pupillo, Lorenzo; Ferreira, Afonso; Fantin, Stefano
    Abstract: The Centre for European Policy Studies launched a Task Force on Artificial Intelligence (AI) and Cybersecurity in September 2019. The goal of this Task Force is to bring attention to the market, technical, ethical and governance challenges posed by the intersection of AI and cybersecurity, focusing both on AI for cybersecurity but also cybersecurity for AI. The Task Force is multi-stakeholder by design and composed of academics, industry players from various sectors, policymakers and civil society. The Task Force is currently discussing issues such as the state and evolution of the application of AI in cybersecurity and cybersecurity for AI; the debate on the role that AI could play in the dynamics between cyber attackers and defenders; the increasing need for sharing information on threats and how to deal with the vulnerabilities of AI-enabled systems; options for policy experimentation; and possible EU policy measures to ease the adoption of AI in cybersecurity in Europe. As part of such activities, this report aims at assessing the High-Level Expert Group (HLEG) on AI Ethics Guidelines for Trustworthy AI, presented on April 8, 2019. In particular, this report analyses and makes suggestions on the Trustworthy AI Assessment List (Pilot version), a non-exhaustive list aimed at helping the public and the private sector in operationalising Trustworthy AI. This report would like to contribute to this revision by addressing in particular the interplay between AI and cybersecurity. This evaluation has been made according to specific criteria: whether and how the items of the Assessment List refer to existing legislation (e.g. GDPR, EU Charter of Fundamental Rights); whether they refer to moral principles (but not laws); whether they consider that AI attacks are fundamentally different from traditional cyberattacks; whether they are compatible with different risk levels; whether they are flexible enough in terms of clear/easy measurement, implementation by AI developers and SMEs; and overall, whether they are likely to create obstacles for the industry.
    Date: 2020–01
  12. By: Aksoy, Cevat Giray (European Bank for Reconstruction and Development); Tumen, Semih (TED University)
    Abstract: Can high-quality local governance alleviate the environmental impact of large-scale refugee migration? The recent surge in refugee flows has brought additional challenges to local governments in Europe, the Middle East and certain regions of Africa and Asia. In this paper, we focus on the case of Syrian refugees in Turkey and show that the quality of local governance plays a critical role in mitigating the environmental deterioration. We employ text analysis methods to construct a unique data set on local governance quality from the independent audit reports on municipalities. Using a quasi-experimental econometric strategy, we show that the Syrian refugee influx has worsened environmental outcomes along several dimensions in Turkey. Specifically, we find that the deterioration in environmental outcomes is almost entirely driven by provinces with poor-quality governance. Those provinces fail to invest sufficiently in waste management practices and environmental services in response to increased refugee settlements. We argue that good local governance practices can smooth out the refugee integration process and complement the efforts of central governments.
    Keywords: Syrian refugees, environment, waste management, local governance, text analysis
    JEL: F22 H76 Q53
    Date: 2020–04
  13. By: Milan Fičura
    Abstract: Three different classes of data mining methods (k-Nearest Neighbour, Ridge Regression and Multilayer Perceptron Feed-Forward Neural Networks) are applied for the purpose of quantitative trading on 10 simulated time series, as well as real world time series of 10 currency exchange rates ranging from 1.11.1999 to 12.6.2015. Each method is tested in multiple variants. The k-NN algorithm is applied alternatively with the Euclidian, Manhattan, Mahalanobis and Maximum distance function. The Ridge Regression is applied as Linear and Quadratic, and the Feed-Forward Neural Network is applied with either 1, 2 or 3 hidden layers. In addition to that Principal Component Analysis (PCA) is eventually applied for the dimensionality reduction of the predictor set and the meta-parameters of the methods are optimized on the validation sample. In the simulation study a Stochastic-Volatility Jump-Diffusion model, extended alternatively with 10 different non-linear conditional mean patterns, is used, to simulate the asset price behaviour to which the tested methods are applied. The results show that no single method was able to profit on all of the non-linear patterns in the simulated time series, but instead different methods worked well for different patterns. Alternatively, past price movements and past returns were used as predictors. In the case when the past price movements were used, quadratic ridge regression achieved the most robust results, followed by some of the k-NN methods. In the case when past returns were used, k-NN based methods were the most consistently profitable, followed by the linear ridge regression and quadratic ridge regression. Neural networks, while being able to profit on some of the time series, did not achieve profit on most of the others. No evidence was further found of the PCA method to improve the results of the tested methods in a systematic way. In the second part of the study, the models were applied to empirical foreign exchange rate time series. Overall the profitability of the methods was rather low, with most of them ending with a loss on most of the currencies. The most profitable currency was EURUSD, followed by EURJPY, GBPJPY and EURGBP. The most successful methods were the linear ridge regression and the Manhattan distance based k-NN method which both ended with profits for most of the time series (unlike the other methods). Finally, a forward selection procedure using the linear ridge regression was applied to extend the original predictor set with some technical indicators. The selection procedure achieved limited success in improving the out-sample results for the linear ridge regression model but not the other models.
    Keywords: Ridge regression, k-Nearest Neighbour, Artificial Neural Networks, Principal Component Analysis, Exchange rate forecasting, Investment strategy, Market efficiency
    JEL: C45 C63 G11 G14 G17
    Date: 2019–11–13
  14. By: Jiaming Mao; Zhesheng Zheng
    Abstract: We propose a novel method for modeling data by using structural models based on economic theory as regularizer for statistical models. We show that even if a structural model is misspecified, as long as it is informative about the data-generating mechanism, our method can outperform both the (misspecified) structural model and un-structural-regularized statistical models. Our method permits a Bayesian interpretation of theory as prior knowledge and can be used both for statistical prediction and causal inference. It contributes to transfer learning by showing how incorporating theory into statistical modeling can significantly improve out-of-domain predictions and offers a way to synthesize reduced-form and structural approaches to causal effect estimation. Simulation experiments demonstrate the potential of our method in various settings, including first-price auctions, dynamic models of entry and exit, and demand estimation with instrumental variables. Our method has potential applications not only in economics, but in other (social) scientific disciplines whose theoretical models offer important insight but are subject to significant misspecification concerns.
    Date: 2020–04
  15. By: Bonacini, Luca; Gallo, Giovanni; Patriarca, Fabrizio
    Abstract: Feedback control-based mitigation strategies for COVID-19 are threatened by the time span occurring before an infection is detected in official data. Such a delay also depends on behavioral, technological and procedural issues other than the incubation period. We provide a machine learning procedure to identify structural breaks in detected positive cases dynamics using territorial level panel data. In our case study, Italy, three structural breaks are found and they can be related to the three different national level restrictive measures: the school closure, the main lockdown and the shutdown of non-essential economic activities. This allows assessing the detection delays and their relevant variability among the different measures adopted and the relative effectiveness of each of them. Accordingly we draw some policy suggestions to support feedback control based mitigation policies as to decrease their risk of failure, including the further role that wide swap campaigns may play in reducing the detection delay. Finally, by exploiting the huge heterogeneity among Italian provinces features, we stress some drawbacks of the restrictive measures specific features and of their sequence of adoption, among which, the side effects of the main lockdown on social and economic inequalities.
    Keywords: Covid-19,coronavirus,lockdown,feedback control,mitigation strategies
    JEL: C63 I14 I18
    Date: 2020
  16. By: Paweł Baranowski (Institute of Econometrics, University of Lodz); Hamza Bennani (EconomiX-CNRS, Universite Paris Nanterre); Wirginia Doryń (Institute of Economics, University of Lodz)
    Abstract: In this paper, we examine whether a tone shock derived from ECB communication helps to predict ECB monetary policy decisions. For that purpose, we first use a bag-of-words approach and several dictionaries on the ECB’s Introductory Statements to derive a measure of tone. Next, we orthogonalize the tone measure on the latest data available to market participants to compute the tone shock. Finally, we relate the tone shock to future ECB monetary policy decisions. We find that the tone shock is significantly and positively related to future ECB monetary policy decisions, even when controlling for market expectations of economic conditions and monetary policy and the ECB’s Governing Council inter-meeting communication. Further extensions show that the predictive power of the tone shock regarding future monetary policy decisions is robust to (i) the normalization of the tone measure, (ii) alternative market expectations about monetary policy and (iii) the macroeconomic variables used in the Taylortype monetary policy. These findings thus highlight an additional channel by which ECB communication improves monetary policy predictability, and suggest that the ECB may have private information that it communicates through its Introductory Statements.
    Keywords: Central Bank Communication; European Central Bank; Tone; Forecasts; Taylor Rule
    JEL: E43 E52 E58
    Date: 2020
  17. By: Paweł Baranowski (University of Łódź); Wirginia Doryń (University of Łódź); Tomasz Łyziak (Narodowy Bank Polski); Ewa Stanisławska (Narodowy Bank Polski)
    Abstract: The conduct of monetary policy nowadays involves not only interest rate decisions but also central bank communication, aimed at managing the expectations of the private sector. In this paper, we apply epidemiological model to private-sector experts’ forecasts regarding interest rates and inflation in Poland—an economy with over 20 years of inflation targeting history. We show that both of these factors affect interest rates and inflation expectations. Our study contributes to the literature by including a wide set of factors affecting expectations with a special focus on central bank decisions, projections and the tone of official documents. In general, the textual content of monetary policy minutes affects experts’ expectations more at the shortest horizons (nowcasts and one quarter ahead), while GDP and inflation projections released by the central bank play a larger role for slightly longer horizons (two quarters ahead or longer). As far as monetary policy actions are concerned, a positive interest rate surprise produces an upward shift in the whole path of interest rate expectations and leads to a decrease in one-year-ahead inflation expectations.
    Keywords: central bank communication, inflation expectations, interest rate expectations, text mining
    JEL: E52 E58
    Date: 2020

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.