nep-big New Economics Papers
on Big Data
Issue of 2021‒10‒18
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Deep Learning of Potential Outcomes By Bernard Koch; Tim Sainburg; Pablo Geraldo; Song Jiang; Yizhou Sun; Jacob Gates Foster
  2. Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction By Mahdieh Yazdani
  3. Towards Robust Representation of Limit Orders Books for Deep Learning Models By Yufei Wu; Mahmoud Mahfouz; Daniele Magazzeni; Manuela Veloso
  4. Food Insecurity Through Machine Learning Lens: Identifying Vulnerable Households By Meerza, Syed Imran Ali; Meerza, Syed Irfan Ali; Ahamed, Afsana
  5. Hotel Preference Rank based on Online Customer Review By Muhammad Apriandito Arya Saputra; Andry Alamsyah; Fajar Ibnu Fatihan
  6. Dyadic Double/Debiased Machine Learning for Analyzing Determinants of Free Trade Agreements By Harold D Chiang; Yukun Ma; Joel Rodrigue; Yuya Sasaki
  7. Artificial intelligence and systemic risk By Danielsson, Jon; Macrae, Robert; Uthemann, Andreas
  8. Do robots dream of paying taxes? By Rebecca Christie
  9. But clouds got in my way: Bias and bias correction of VIIRS nighttime lights data in the presence of clouds By Ayush Patnaik; Ajay Shah; Anshul Tayal; Susan Thomas
  10. Efficient Estimation in NPIV Models: A Comparison of Various Neural Networks-Based Estimators By Jiafeng Chen; Xiaohong Chen; Elie Tamer
  11. Discovering new plausibility checks for supervisory data By Romano, Stefania; Martinez-Heras, Jose; Raponi, Francesco Natalini; Guidi, Gregorio; Gottron, Thomas
  12. Competition and Mergers with Strategic Data Intermediaries By David Bounie; Antoine Dubus; Patrick Waelbroeck
  13. Robots versus labor skills: a complementarity/substitutability analysis By M. Battisti; M. Del Gatto; A. F. Gravina; C. F. Parmeter
  14. Wer setzt die richtigen Weichen für die Künstliche Intelligenz? Eine Analyse der Wahlprogramme der Parteien im deutschen Bundestag By Goecke, Henry; Rusche, Christian

  1. By: Bernard Koch; Tim Sainburg; Pablo Geraldo; Song Jiang; Yizhou Sun; Jacob Gates Foster
    Abstract: This review systematizes the emerging literature for causal inference using deep neural networks under the potential outcomes framework. It provides an intuitive introduction on how deep learning can be used to estimate/predict heterogeneous treatment effects and extend causal inference to settings where confounding is non-linear, time varying, or encoded in text, networks, and images. To maximize accessibility, we also introduce prerequisite concepts from causal inference and deep learning. The survey differs from other treatments of deep learning and causal inference in its sharp focus on observational causal estimation, its extended exposition of key algorithms, and its detailed tutorials for implementing, training, and selecting among deep estimators in Tensorflow 2 available at al-Inference.
    Date: 2021–10
  2. By: Mahdieh Yazdani
    Abstract: In recent years several complaints about racial discrimination in appraising home values have been accumulating. For several decades, to estimate the sale price of the residential properties, appraisers have been walking through the properties, observing the property, collecting data, and making use of the hedonic pricing models. However, this method bears some costs and by nature is subjective and biased. To minimize human involvement and the biases in the real estate appraisals and boost the accuracy of the real estate market price prediction models, in this research we design data-efficient learning machines capable of learning and extracting the relation or patterns between the inputs (features for the house) and output (value of the houses). We compare the performance of some machine learning and deep learning algorithms, specifically artificial neural networks, random forest, and k nearest neighbor approaches to that of hedonic method on house price prediction in the city of Boulder, Colorado. Even though this study has been done over the houses in the city of Boulder it can be generalized to the housing market in any cities. The results indicate non-linear association between the dwelling features and dwelling prices. In light of these findings, this study demonstrates that random forest and artificial neural networks algorithms can be better alternatives over the hedonic regression analysis for prediction of the house prices in the city of Boulder, Colorado.
    Date: 2021–10
  3. By: Yufei Wu; Mahmoud Mahfouz; Daniele Magazzeni; Manuela Veloso
    Abstract: The success of machine learning models is highly reliant on the quality and robustness of representations. The lack of attention on the robustness of representations may boost risks when using data-driven machine learning models for trading in the financial markets. In this paper, we focus on representations of the limit order book (LOB) data and discuss the opportunities and challenges of representing such data in an effective and robust manner. We analyse the issues associated with the commonly-used LOB representation for machine learning models from both theoretical and experimental perspectives. Based on this, we propose new LOB representation schemes to improve the performance and robustness of machine learning models and present a guideline for future research in this area.
    Date: 2021–10
  4. By: Meerza, Syed Imran Ali; Meerza, Syed Irfan Ali; Ahamed, Afsana
    Keywords: Food Consumption/Nutrition/Food Safety, Agribusiness, Marketing
    Date: 2021–08
  5. By: Muhammad Apriandito Arya Saputra; Andry Alamsyah; Fajar Ibnu Fatihan
    Abstract: Topline hotels are now shifting into the digital way in how they understand their customers to maintain and ensuring satisfaction. Rather than the conventional way which uses written reviews or interviews, the hotel is now heavily investing in Artificial Intelligence particularly Machine Learning solutions. Analysis of online customer reviews changes the way companies make decisions in a more effective way than using conventional analysis. The purpose of this research is to measure hotel service quality. The proposed approach emphasizes service quality dimensions reviews of the top-5 luxury hotel in Indonesia that appear on the online travel site TripAdvisor based on section Best of 2018. In this research, we use a model based on a simple Bayesian classifier to classify each customer review into one of the service quality dimensions. Our model was able to separate each classification properly by accuracy, kappa, recall, precision, and F-measure measurements. To uncover latent topics in the customer's opinion we use Topic Modeling. We found that the common issue that occurs is about responsiveness as it got the lowest percentage compared to others. Our research provides a faster outlook of hotel rank based on service quality to end customers based on a summary of the previous online review.
    Date: 2021–10
  6. By: Harold D Chiang; Yukun Ma; Joel Rodrigue; Yuya Sasaki
    Abstract: This paper presents novel methods and theories for estimation and inference about parameters in econometric models using machine learning of nuisance parameters when data are dyadic. We propose a dyadic cross fitting method to remove over-fitting biases under arbitrary dyadic dependence. Together with the use of Neyman orthogonal scores, this novel cross fitting method enables root-$n$ consistent estimation and inference robustly against dyadic dependence. We illustrate an application of our general framework to high-dimensional network link formation models. With this method applied to empirical data of international economic networks, we reexamine determinants of free trade agreements (FTA) viewed as links formed in the dyad composed of world economies. We document that standard methods may lead to misleading conclusions for numerous classic determinants of FTA formation due to biased point estimates or standard errors which are too small.
    Date: 2021–10
  7. By: Danielsson, Jon; Macrae, Robert; Uthemann, Andreas
    Abstract: Artificial intelligence (AI) is rapidly changing how the financial system is operated, taking over core functions for both cost savings and operational efficiency reasons. AI will assist both risk managers and the financial authorities. However, it can destabilize the financial system, creating new tail risks and amplifying existing ones due to procyclicality, unknown-unknowns, the need for trust, and optimization against the system.
    Keywords: ES/K002309/1; EP/P031730/1; UKRI fund
    JEL: F3 G3
    Date: 2021–08–28
  8. By: Rebecca Christie
    Abstract: Robot taxes embody the more futuristic challenges of managing automation and legacy workers. As machines and artificial intelligence take on more roles that used to be performed by humans, policymakers and technologists are assessing the costs this transition imposes and what parts of society will pay them. A robot tax on companies that replace employees with automated systems is easy to dismiss in its most simplistic forms but should be...
    Date: 2021–10
  9. By: Ayush Patnaik (xKDR Forum); Ajay Shah (xKDR Forum); Anshul Tayal (xKDR Forum); Susan Thomas (xKDR Forum)
    Abstract: The VIIRS nighttime lights dataset constitutes progress in the measurement of night lights radiance, with monthly data at a pixel of roughly 0.5km × 0.5km. We identify a downward bias in the reported radiance when the number of cloud-free images in a month is low. This bias often takes on large values from -10% to -30%. We develop a cautious bias-correction scheme which partially addresses this problem. This scheme is applied upon the pixel-level dataset to create an improved dataset. The bias-corrected data hews closer to the ground truth as seen in household survey data.
    JEL: C8 E0 E1 R1
    Date: 2021–10
  10. By: Jiafeng Chen; Xiaohong Chen; Elie Tamer
    Abstract: We investigate the computational performance of Artificial Neural Networks (ANNs) in semi-nonparametric instrumental variables (NPIV) models of high dimensional covariates that are relevant to empirical work in economics. We focus on efficient estimation of and inference on expectation functionals (such as weighted average derivatives) and use optimal criterion-based procedures (sieve minimum distance or SMD) and novel efficient score-based procedures (ES). Both these procedures use ANN to approximate the unknown function. Then, we provide a detailed practitioner's recipe for implementing these two classes of estimators. This involves the choice of tuning parameters both for the unknown functions (that include conditional expectations) but also for the choice of estimation of the optimal weights in SMD and the Riesz representers used with the ES estimators. Finally, we conduct a large set of Monte Carlo experiments that compares the finite-sample performance in complicated designs that involve a large set of regressors (up to 13 continuous), and various underlying nonlinearities and covariate correlations. Some of the takeaways from our results include: 1) tuning and optimization are delicate especially as the problem is nonconvex; 2) various architectures of the ANNs do not seem to matter for the designs we consider and given proper tuning, ANN methods perform well; 3) stable inferences are more difficult to achieve with ANN estimators; 4) optimal SMD based estimators perform adequately; 5) there seems to be a gap between implementation and approximation theory. Finally, we apply ANN NPIV to estimate average price elasticity and average derivatives in two demand examples.
    Date: 2021–10
  11. By: Romano, Stefania; Martinez-Heras, Jose; Raponi, Francesco Natalini; Guidi, Gregorio; Gottron, Thomas
    Abstract: In carrying out its banking supervision tasks as part of the Single Supervisory Mechanism (SSM), the European Central Bank (ECB) collects and disseminates data on significant and less significant institutions. To ensure harmonised supervisory reporting standards, the data are represented through the European Banking Authority’s data point model, which defines all the relevant business concepts and the validation rules. For the purpose of data quality assurance and assessment, ECB experts may implement additional plausibility checks on the data. The ECB is constantly seeking ways to improve these plausibility checks in order to detect suspicious or erroneous values and to provide high-quality data for the SSM. JEL Classification: C18, C63, C81, E58, G28
    Keywords: machine learning, plausibility checks, quality assurance, supervisory data, validation rules
    Date: 2021–10
  12. By: David Bounie; Antoine Dubus; Patrick Waelbroeck
    Abstract: We analyze competition between data intermediaries collecting information on consumers, which they sell to firms for price discrimination purposes. We show that competition between data intermediaries benefits consumers by increasing competition between firms, and by reducing the amount of consumer data collected. We argue that merger policy guidelines should investigate the effect of the data strategies of large intermediaries on competition and consumer surplus in related markets.
    Keywords: data, mergers, competition, consumer surplus
    JEL: L13 L40 L86
    Date: 2021
  13. By: M. Battisti; M. Del Gatto; A. F. Gravina; C. F. Parmeter
    Abstract: The rise of artificial intelligence and automation is fueling anxiety about the replacementof workers with robots and digital technologies. Relying upon a (country-sector-year) constructed measure of robotic capital (RK), we study the extent of complementarity/substitutabilitybetween robots and workers at different skill levels (i.e., high-, medium- and low-skilled workers). The analysis points to a higher elasticity of substitution (EoS) - i.e., higher substitutability - between RK and unskilled labor, compared to skilled labor. Furthermore, we find evidence of polarizing effects, according to which middle-skilled workers, typically employed in intermediate routine and/or codifiable tasks, are the most vulnerable to robotization. Results turn out to be robust to using different - i) definitions of EoS; ii) computations of RK; iii) samples of countries and industries (WIOD vs EU KLEMS data); iv) skill grouping.
    Keywords: Automation;robotization;elasticity of substitution;technology;polarization
    Date: 2021
  14. By: Goecke, Henry; Rusche, Christian
    Abstract: Die Künstliche Intelligenz (KI) wird in Deutschland als entscheidende Schlüsseltechnologie gesehen (BMWi, 2019). Bereits 2018 erklärte Bundeskanzlerin Merkel in ihrem Video-Podcast, dass Deutschland bei KI führend sein muss, 'wenn wir Wachstum und damit auch neue Arbeitsplätze und Wohlstand für uns alle haben wollen' (Die Bundeskanzlerin, 2018). Auf Ebene der Europäischen Union (EU) wird das Thema als ähnlich bedeutend eingestuft. So strebt die EU an, die Potenziale von KI zu nutzen (Europäische Kommission, 2021) und die weltweite Führerschaft bei vertrauenswürdiger KI zu erreichen (Ebenda, 4). Die neu gewählte EU-Kommissionspräsidentin Ursula von der Leyen sagte diesbezüglich bereits 2019 bei der Vorstellung ihrer Prioritäten und Ziele: 'Wir müssen unseren Binnenmarkt für das digitale Zeitalter fit machen, wir müssen die Vorteile von künstlicher Intelligenz und Big Data optimal nutzen, [...]' (Europäische Kommission, 2019). Somit ist das Thema KI bereits auf der Agenda der führenden Politikerinnen und Politiker. [...]
    JEL: H54 O25 O38
    Date: 2021

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.