nep-big New Economics Papers
on Big Data
Issue of 2022‒02‒21
twelve papers chosen by
Tom Coupé
University of Canterbury

  1. Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values By Thomas R. Cook; Greg Gupton; Zach Modig; Nathan M. Palmer
  2. A machine learning search for optimal GARCH parameters By Luke De Clerk; Sergey Savl'ev
  3. Reconstructing production networks using machine learning By Lafond, François; Farmer, J. Doyne; Mungo, Luca; Astudillo-Estévez, Pablo
  4. Socioeconomic disparities and COVID-19: the causal connections By Tannista Banerjee; Ayan Paul; Vishak Srikanth; Inga Str\"umke
  5. Deciding Not To Decide By Ellsaesser, Florian; Fioretti, Guido
  6. Effect of Toxic Review Content on Overall Product Sentiment By Mayukh Mukhopadhyay; Sangeeta Sahney
  7. Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement By Brett R. Gordon; Robert Moakler; Florian Zettelmeyer
  8. Artificial Intelligence and Reduced SMEs' Business Risks. A Dynamic Capabilities Analysis During the COVID-19 Pandemic By Drydakis, Nick
  9. Fair learning with bagging By Jean-David Fermanian; Dominique Guegan
  10. Algorithm is Experiment: Machine Learning, Market Design, and Policy Eligibility Rules By Narita, Yusuke; Yata, Kohei
  11. Exchange rate and Economic Growth - a comparative analysis of the possible relationship between them By Pramanik, Subhajit
  12. Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment By Augustine Denteh; Helge Liebert

  1. By: Thomas R. Cook; Greg Gupton; Zach Modig; Nathan M. Palmer
    Abstract: Machine learning and artificial intelligence methods are often referred to as “black boxes” when compared with traditional regression-based approaches. However, both traditional and machine learning methods are concerned with modeling the joint distribution between endogenous (target) and exogenous (input) variables. Where linear models describe the fitted relationship between the target and input variables via the slope of that relationship (coefficient estimates), the same fitted relationship can be described rigorously for any machine learning model by first-differencing the partial dependence functions. Bootstrapping these first-differenced functionals provides standard errors and confidence intervals for the estimated relationships. We show that this approach replicates the point estimates of OLS coefficients and demonstrate how this generalizes to marginal relationships in machine learning and artificial intelligence models. We further discuss the relationship of partial dependence functions to Shapley value decompositions and explore how they can be used to further explain model outputs.
    Keywords: Machine learning; Artificial intelligence; Explainable machine learning; Shapley values; Model interpretation
    JEL: C14 C15 C18
    Date: 2021–11–15
  2. By: Luke De Clerk; Sergey Savl'ev
    Abstract: Here, we use Machine Learning (ML) algorithms to update and improve the efficiencies of fitting GARCH model parameters to empirical data. We employ an Artificial Neural Network (ANN) to predict the parameters of these models. We present a fitting algorithm for GARCH-normal(1,1) models to predict one of the model's parameters, $\alpha_1$ and then use the analytical expressions for the fourth order standardised moment, $\Gamma_4$ and the unconditional second order moment, $\sigma^2$ to fit the other two parameters; $\beta_1$ and $\alpha_0$, respectively. The speed of fitting of the parameters and quick implementation of this approach allows for real time tracking of GARCH parameters. We further show that different inputs to the ANN namely, higher order standardised moments and the autocovariance of time series can be used for fitting model parameters using the ANN, but not always with the same level of accuracy.
    Date: 2022–01
  3. By: Lafond, François; Farmer, J. Doyne; Mungo, Luca; Astudillo-Estévez, Pablo
    Abstract: The vulnerability of supply chains and their role in the propagation of shocks has been high- lighted multiple times in recent years, including by the recent pandemic. However, while the importance of micro data is increasingly recognised, data at the firm-to-firm level remains scarcely available. In this study, we formulate supply chain networks' reconstruction as a link prediction problem and tackle it using machine learning, specifically Gradient Boosting. We test our approach on three di↵erent supply chain datasets and show that it works very well and outperforms three benchmarks. An analysis of features' importance suggests that the key data underlying our predictions are firms' industry, location, and size. To evaluate the feasibility of reconstructing a network when no production network data is available, we attempt to predict a dataset using a model trained on another dataset, showing that the model's performance, while still better than a random predictor, deteriorates substantially.
    Keywords: Supply chains, Network reconstruction, Link prediction, Machine learning
    JEL: C53 C67 C81
    Date: 2022–01
  4. By: Tannista Banerjee; Ayan Paul; Vishak Srikanth; Inga Str\"umke
    Abstract: The analysis of causation is a challenging task that can be approached in various ways. With the increasing use of machine learning based models in computational socioeconomics, explaining these models while taking causal connections into account is a necessity. In this work, we advocate the use of an explanatory framework from cooperative game theory augmented with $do$ calculus, namely causal Shapley values. Using causal Shapley values, we analyze socioeconomic disparities that have a causal link to the spread of COVID-19 in the USA. We study several phases of the disease spread to show how the causal connections change over time. We perform a causal analysis using random effects models and discuss the correspondence between the two methods to verify our results. We show the distinct advantages a non-linear machine learning models have over linear models when performing a multivariate analysis, especially since the machine learning models can map out non-linear correlations in the data. In addition, the causal Shapley values allow for including the causal structure in the variable importance computed for the machine learning model.
    Date: 2022–01
  5. By: Ellsaesser, Florian; Fioretti, Guido
    Abstract: Sometimes unexpected, novel, unconceivable events enter our lives. The cause-effect mappings that usually guide our behaviour are destroyed. Surprised and shocked by possibilities that we had never imagined, we are unable to make any decision beyond mere routine. Among them there are decisions, such as making investments, that are essential for the long-term survival of businesses as well as the economy at large. We submit that the standard machinery of utility maximization does not apply, but we propose measures inspired by scenario planning and graph analysis, pointing to solutions being explored in machine learning.
    Keywords: Uncertainty, Cognitive Maps, Machine Learning, Scenario Planning, Sense-Making, Bounded Rationality
    JEL: C8 C81 D8 D81
    Date: 2022–01–10
  6. By: Mayukh Mukhopadhyay; Sangeeta Sahney
    Abstract: Toxic contents in online product review are a common phenomenon. A content is perceived to be toxic when it is rude, disrespectful, or unreasonable and make individuals leave the discussion. Machine learning algorithms helps the sell side community to identify such toxic patterns and eventually moderate such inputs. Yet, the extant literature provides fewer information about the sentiment of a prospective consumer on the perception of a product after being exposed to such toxic review content. In this study, we collect a balanced data set of review comments from 18 different players segregated into three different sectors from google play-store. Then we calculate the sentence-level sentiment and toxicity score of individual review content. Finally, we use structural equation modelling to quantitatively study the influence of toxic content on overall product sentiment. We observe that comment toxicity negatively influences overall product sentiment but do not exhibit a mediating effect over reviewer score to influence sector-wise relative rating.
    Date: 2022–01
  7. By: Brett R. Gordon; Robert Moakler; Florian Zettelmeyer
    Abstract: Randomized controlled trials (RCTs) have become increasingly popular in both marketing practice and academia. However, RCTs are not always available as a solution for advertising measurement, necessitating the use of observational methods. We present the first large-scale exploration of two observational methods, double/debiased machine learning (DML) and stratified propensity score matching (SPSM). Specifically, we analyze 663 large-scale experiments at Facebook, each of which is described using over 5,000 user- and experiment-level features. Although DML performs better than SPSM, neither method performs well, despite using deep learning models to implement the propensity scores and outcome models. The median absolute percentage point difference in lift is 115%, 107%, and 62% for upper, mid, and lower funnel outcomes, respectively. These are large measurement errors, given that the median RCT lifts are 28%, 19%, and 6% for the funnel outcomes, respectively. We further leverage our large sample of experiments to characterize the circumstances under which each method performs comparatively better. However, broadly speaking, our results suggest that state-of-the-art observational methods are unable to recover the causal effect of online advertising at Facebook. We conclude that observational methods for estimating ad effectiveness may not work until advertising platforms log auction-specific features for modeling.
    Date: 2022–01
  8. By: Drydakis, Nick
    Abstract: The study utilises the International Labor Organization's SMEs COVID-19 pandemic business risks scale to determine whether Artificial Intelligence (AI) applications are associated with reduced business risks for SMEs. A new 10-item scale was developed to capture the use of AI applications in core services such as marketing and sales, pricing and cash flow. Data were collected from 317 SMEs between April and June 2020, with follow-up data gathered between October and December 2020 in London, England. AI applications to target consumers online, offer cash flow forecasting and facilitate HR activities are associated with reduced business risks caused by the COVID-19 pandemic for both small and medium enterprises. The study indicates that AI enables SMEs to boost their dynamic capabilities by leveraging technology to meet new types of demand, move at speed to pivot business operations, boost efficiency and thus, reduce their business risks.
    Keywords: SMEs,Business Risks,COVID-19 pandemic,Artificial Intelligence,Dynamic Capabilities
    JEL: O33 Q55 L26
    Date: 2022
  9. By: Jean-David Fermanian (Ensae-Crest); Dominique Guegan (UP1 - Université Paris 1 Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, University of Ca’ Foscari [Venice, Italy])
    Abstract: The central question of this paper is how to enhance supervised learning algorithms with fairness requirement ensuring that any sensitive input does not "'unfairly"' influence the outcome of the learning algorithm. To attain this objective we proceed by three steps. First after introducing several notions of fairness in a uniform approach, we introduce a more general notion through conditional fairness definition which englobes most of the well known fairness definitions. Second we use a ensemble of binary and continuous classifiers to get an optimal solution for a fair predictive outcome using a related-post-processing procedure without any transformation on the data, nor on the training algorithms. Finally we introduce several tests to verify the fairness of the predictions. Some empirics are provided to illustrate our approach.
    Keywords: fairness,nonparametric regression,classification,accuracy
    Date: 2021–11
  10. By: Narita, Yusuke; Yata, Kohei
    Abstract: Algorithms produce a growing portion of decisions and recommendations both in policy and business. Such algorithmic decisions are natural experiments (conditionally quasirandomly assigned instruments) since the algorithms make decisions based only on observable input variables. We use this observation to develop a treatment-effect estimator for a class of stochastic and deterministic decision-making algorithms. Our estimator is shown to be consistent and asymptotically normal for well-defined causal effects. A key special case of our estimator is a multidimensional regression discontinuity design. We apply our estimator to evaluate the effect of the Coronavirus Aid, Relief, and Economic Security (CARES) Act, where hundreds of billions of dollars worth of relief funding is allocated to hospitals via an algorithmic rule. Our estimates suggest that the relief funding has little effect on COVID- 19-related hospital activity levels. Naive OLS and IV estimates exhibit substantial selection bias.
    Date: 2022–01
  11. By: Pramanik, Subhajit
    Abstract: In this article, a comparative analysis has been done on the possibility of a correlation between economic growth and exchange rates. To represent the factor of growth GDP, inflation, growth has been examined in different cases. In the same way for exchange rates, nominal exchange rate, real exchange rate has been analysed based on the specific cases. Here we also see a machine learning approach to find the correlation in both short and long run time periods. Some empirical tests and data analysis was done by many economists have also been clustered here to map the total overview in a better manner.
    Keywords: GDP, Machine Learning, Exchange rates, Nominal exchange rate, Real exchange rate, Inflation, Trade
    JEL: F0 F41 F43 F47
    Date: 2021–04
  12. By: Augustine Denteh (Department of Economics, Tulane University); Helge Liebert (Department of Economics, University of Zurich)
    Abstract: We provide new insights into the finding that Medicaid increased emergency department (ED) use from the Oregon experiment. Using nonparametric causal machine learning methods, we find economically meaningful treatment effect heterogeneity in the impact of Medicaid coverage on ED use. The effect distribution is widely dispersed, with significant positive effects concentrated among high-use individuals. A small group - about 14% of participants - in the right tail with significant increases in ED use drives the overall effect. The remainder of the individualized treatment effects is either indistinguishable from zero or negative. The average treatment effect is not representative of the individualized treatment effect for most people. We identify four priority groups with large and statistically significant increases in ED use - men, prior SNAP participants, adults less than 50 years old, and those with pre-lottery ED use classified as primary care treatable. Our results point to an essential role of intensive margin effects - Medicaid increases utilization among those already accustomed to ED use and who use the emergency department for all types of care. We leverage the heterogeneous effects to estimate optimal assignment rules to prioritize insurance applications in similar expansions.
    Date: 2022–01

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.