nep-big New Economics Papers
on Big Data
Issue of 2022‒10‒03
seventeen papers chosen by
Tom Coupé
University of Canterbury

  1. Catching the Political Leader's Signals: Economic policy uncertainty and firm investment in China By ITO Asei; LIM Jaehwan; ZHANG Hongyong
  2. Using Natural Language Processing to Measure COVID19-Induced Economic Policy Uncertainty for Canada and the US* By Shafiullah Qureshi; Ba Chu; Fanny S. Demers; Michel Demers
  3. Mitigation Strategies to Improve Reproducibility of Poverty Estimations From Remote Sensing Images Using Deep Learning By J. Machicao; A. Ben Abbes; L. Meneguzzi; P L P Corrêa; A. Specht; Romain David; G. Subsol; D. Vellenich; R. Devillers; S. Stall; N. Mouquet; M. Chaumont; L Berti‐equille; D. Mouillot
  4. Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response By Phillip Heiler
  5. Working Paper No. 355: The artificial intelligence (AI) data access regime: what are the factors affecting the access and sharing of industrial AI data? By Long, Vicky; Bjuggren, Per-Olof
  6. A review of Knowledge Graph and Graph Neural Network application By Elda Xhumari; Suela Maxhelaku; Endrit Xhina
  7. Properties of Aggregation Operators Relevant for Ethical Decision Making in Artificial Intelligence By Federico Fioravanti; Iyad Rahwan; Fernando Tohmé
  8. Artificial Collusion: Examining Supracompetitive Pricing by Q-learning Algorithms By Arnoud V. den Boer; Janusz M. Meylahn; Maarten Pieter Schinkel
  9. The Shifting Attention of Political Leaders: Evidence from Two Centuries of Presidential Speeches By Oscar Calvo-Gonz\'alez; Axel Eizmendi; Germ\'an Reyes
  10. Testing big data in a big crisis: Nowcasting under COVID-19 By Barbaglia, Luca; Frattarolo, Lorenzo; Onorante, Luca; Pericoli, Filippo Maria; Ratto, Marco; Tiozzo Pezzoli, Luca
  11. An Experimental Analysis of Investor Sentiment By Béatrice BOULU-RESHEF; Catherine BRUNEAU; Maxime NICOLAS; Thomas RENAULT
  12. The Efficient Market Hypothesis for Bitcoin in the context of neural networks By Mike Kraehenbuehl; Joerg Osterrieder
  13. AI for trading strategies By Danijel Jevtic; Romain Deleze; Joerg Osterrieder
  14. Big data analytics for supply chain risk management: research opportunities at process crossroads By Leonardo de Assis Santos; Leonardo Marques
  15. The Impact of New Doctorate Graduates on Innovation Systems in Europe By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
  16. The Impact of the #MeToo Movement on Language at Court -- A text-based causal inference approach By Henrika Langen
  17. Stock Market Prediction using Natural Language Processing -- A Survey By Om Mane; Saravanakumar kandasamy

  1. By: ITO Asei; LIM Jaehwan; ZHANG Hongyong
    Abstract: This study uses a text dataset of the Chinese President’s speeches and reports from November 2012 to December 2021 to construct an original economic policy uncertainty (EPU) index: President Xi Jinping’s EPU (XiEPU). XiEPU moderately correlates with a previous study’s representative EPU, showing notably different peaks. Our index spiked in April 2016 after a sharp decline in the Chinese stock market index and late 2020, reflecting the global COVID-19 pandemic. Using firm-level panel data, we find that a higher value of XiEPU is associated with a lower investment rate at the quarterly level and has a larger and longer-lasting effect than the existing China EPUs. Moreover, there are noteworthy heterogeneous effects among firms and periods. Specifically, we find a stronger effect of XiEPU on manufacturing sectors, a weaker effect on state-owned enterprises, and a stronger effect in the second term of Xi Jinping’s presidential tenure after November 2017.
    Date: 2022–08
  2. By: Shafiullah Qureshi (Department of Economics, Carleton University); Ba Chu (Department of Economics, Carleton University); Fanny S. Demers (Department of Economics, Carleton University); Michel Demers (Department of Economics, Carleton University)
    Abstract: We develop an economic policy uncertainty (EPU) index for Canada and the US using natural language processing (NLP) methods. Our EPU-NLP index is based on an application of several algorithms, including a rapid automatic keyword extraction algorithm (RAKE), a combination of the RoBERTa and the SentenceBERT algorithms, a PyLucene search engine, and the GrapeNLP local grammar engine. Classification-
    Date: 2022–01–18
  3. By: J. Machicao (USP - University of São Paulo, Escola Politecnica da Universidade de Sao Paulo [Sao Paulo]); A. Ben Abbes (FRB - Fondation pour la recherche sur la Biodiversité , UMA - Université de la Manouba [Tunisie]); L. Meneguzzi (USP - University of São Paulo, Escola Politecnica da Universidade de Sao Paulo [Sao Paulo]); P L P Corrêa (Polytechnic School of the University of São Paulo (Brazil) - USP - Universidade de São Paulo, Escola Politecnica da Universidade de Sao Paulo [Sao Paulo]); A. Specht (USQ - University of Southern Queensland); Romain David (ERINHA-AISBL - European Research Infrastructure on Highly Pathogenic Agents); G. Subsol (UMR 228 Espace-Dev, Espace pour le développement - IRD - Institut de Recherche pour le Développement - UPVD - Université de Perpignan Via Domitia - AU - Avignon Université - UR - Université de La Réunion - UG - Université de Guyane - UA - Université des Antilles - UM - Université de Montpellier); D. Vellenich (USP - University of São Paulo, Escola Politecnica da Universidade de Sao Paulo [Sao Paulo]); R. Devillers (UMR 228 Espace-Dev, Espace pour le développement - IRD - Institut de Recherche pour le Développement - UPVD - Université de Perpignan Via Domitia - AU - Avignon Université - UR - Université de La Réunion - UG - Université de Guyane - UA - Université des Antilles - UM - Université de Montpellier); S. Stall (American Geophysical Union); N. Mouquet (CESAB - Centre de Synthèse et d’Analyse sur la Biodiversité - FRB - Fondation pour la recherche sur la Biodiversité , UNIMES - Université de Nîmes); M. Chaumont (LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier - CNRS - Centre National de la Recherche Scientifique - UM - Université de Montpellier); L Berti‐equille (UMR 228 Espace-Dev, Espace pour le développement - IRD - Institut de Recherche pour le Développement - UPVD - Université de Perpignan Via Domitia - AU - Avignon Université - UR - Université de La Réunion - UG - Université de Guyane - UA - Université des Antilles - UM - Université de Montpellier); D. Mouillot (LSEA MARBEC - Laboratoire Service d' Experimentations Aquacoles [Palavas les Flots] - UMR MARBEC - MARine Biodiversity Exploitation and Conservation - IRD - Institut de Recherche pour le Développement - IFREMER - Institut Français de Recherche pour l'Exploitation de la Mer - CNRS - Centre National de la Recherche Scientifique - UM - Université de Montpellier, UM - Université de Montpellier)
    Abstract: The challenges of Reproducibility and Replicability (R & R) in computer science experiments have become a focus of attention in the last decade, as efforts to adhere to good research practices have increased. However, experiments using Deep Learning (DL) remain difficult to reproduce due to the complexity of the techniques used. Challenges such as estimating poverty indicators (e.g. wealth index levels) from remote sensing imagery, requiring the use of huge volumes of data across different geographic locations, would be impossible without the use of DL technology. To test the reproducibility of DL experiments, we report a review of the reproducibility of three DL experiments which analyse visual indicators from satellite and street imagery. For each experiment, we identify the challenges found in the datasets, methods and workflows used. As a result of this assessment we propose a checklist incorporating relevant FAIR principles to screen an experiment for its reproducibility. Based on the lessons learned from this study, we recommend a set of actions aimed to improve the reproducibility of such experiments and reduce the likelihood of wasted effort. We believe that the target audience is broad, from researchers seeking to reproduce an experiment, authors reporting an experiment, or reviewers seeking to assess the work of others.
    Keywords: Reproducibility,Replicability,Deep learning,Machine learning,FAIR,poverty indicators
    Date: 2022
  4. By: Phillip Heiler
    Abstract: In this paper we propose a method for nonparametric estimation and inference for heterogeneous bounds for causal effect parameters in general sample selection models where the initial treatment can affect whether a post-intervention outcome is observed or not. Treatment selection can be confounded by observable covariates while the outcome selection can be confounded by both observables and unobservables. The method provides conditional effect bounds as functions of policy relevant pre-treatment variables. It allows for conducting valid statistical inference on the unidentified conditional effect curves. We use a flexible semiparametric de-biased machine learning approach that can accommodate flexible functional forms and high-dimensional confounding variables between treatment, selection, and outcome processes. Easily verifiable high-level conditions for estimation and misspecification robust inference guarantees are provided as well.
    Date: 2022–09
  5. By: Long, Vicky (The Ratio Institute); Bjuggren, Per-Olof (The Ratio Institute)
    Abstract: This paper decomposes the factors that govern the access and sharing of machine-generated industrial data in the artificial intelligence era. Through a mapping of the key technological, institutional, and firm-level factors that affect the choice of governance structures, this study provides a synthesised view of AI data-sharing and coordination mechanisms. The question to be asked here is whether the hitherto de facto control—bilateral contracts and technical solution-dominating industrial practices in data sharing—can handle the long-run exchange needs or not.
    Keywords: Artificial intelligence (AI); governance structure; intellectual property rights (IPRs); data trade; industrial data
    JEL: D23 K10 K24 L14 L86 O30
    Date: 2022–05–05
  6. By: Elda Xhumari (University of Tirana, Faculty of Natural Sciences, Department of Informatics); Suela Maxhelaku (University of Tirana, Faculty of Natural Sciences, Department of Informatics); Endrit Xhina (University of Tirana, Faculty of Natural Sciences, Department of Informatics)
    Abstract: Many learning activities include working with graph data, which offers a wealth of relational information between parts. Modeling physical systems, learning molecular fingerprints, predicting protein interfaces, and diagnosing illnesses all need the use of a model that can learn from graph inputs. In other fields, such as learning from non-structural data such as texts and images, reasoning on extracted structures (such as phrase dependency trees and image scene graphs) is a major topic that requires graph reasoning models. Graph neural networks (GNNs) are neural models that use message transmission between graph nodes to represent graph dependency. Variants of GNNs have recently showed ground-breaking performance on a variety of deep learning tasks. This paper represents a review of the literature on Knowledge Graphs and Graph Neural Networks, with a particular focus on Graph Embeddings and Graph Neural Networks applications as a powerful tool for organizing structured data and making sense of unstructured data, which can be applied to a variety of real-world problems.
    Keywords: Knowledge Graph, Graph Neural Network, DeepWalk, Node2Vec, Structural Deep Network Embedding
    JEL: C45
    Date: 2022–07
  7. By: Federico Fioravanti (Universidad Nacional del Sur/CONICET); Iyad Rahwan (Max Planck Institute for Human Development); Fernando Tohmé (Universidad Nacional del Sur/CONICET)
    Abstract: We present an axiomatic study of aggregation operators that could be applied to ethical AI decision making. The information is given here by different preferences over the decisions to be made by automated systems. We consider two different but very intuitive notions of preference of an alternative over another one, namely pairwise majority and position dominance. Preferences are represented by permutation processes over alternatives and aggregation rules are applied to obtain results that are socially considered to be ethically correct. We address the problem of the stability of the aggregation process, which is important when the information is variable. In this setting we find many aggregation rules that satisfy desirable properties for an autonomous system.
    Keywords: Aggregation Operators; Permutation Process; Decision Analysis
    Date: 2022–09
  8. By: Arnoud V. den Boer (University of Amsterdam); Janusz M. Meylahn (University of Twente); Maarten Pieter Schinkel (University of Amsterdam)
    Abstract: We examine recent claims that a particular Q-learning algorithm used by competitors ‘autonomously’ and systematically learns to collude, resulting in supracompetitive prices and extra profits for the firms sustained by collusive equilibria. A detailed analysis of the inner workings of this algorithm reveals that there is no immediate reason for alarm. We set out what is needed to demonstrate the existence of a colluding price algorithm that does form a threat to competition.
    Keywords: keywords
    JEL: C63 L13 L44 K21
    Date: 2022–09–21
  9. By: Oscar Calvo-Gonz\'alez; Axel Eizmendi; Germ\'an Reyes
    Abstract: This paper proposes a novel methodology to measure political leaders' attention that combines text data with machine learning algorithms. We use this method on a hand-collected database of presidential ``state-of-the-union''-type speeches spanning ten countries and two centuries to study the determinants of political attention, its shifts over time, and its impacts on countries' outcomes. We find that presidential attention can be characterized by a compact set of topics whose relative importance remains stable over long periods. Contrary to presidential rhetoric, using a differences-in-differences design, we show that presidents' attention has precisely-estimated null effects on growth and other policy outcomes.
    Date: 2022–09
  10. By: Barbaglia, Luca (European Commission); Frattarolo, Lorenzo (European Commission); Onorante, Luca (European Commission); Pericoli, Filippo Maria (European Monitoring Centre for Drugs and Drug Addiction); Ratto, Marco (European Commission); Tiozzo Pezzoli, Luca (European Commission)
    Abstract: During the COVID-19 pandemic, economists have struggled to obtain reliable economic predictions, with standard models becoming outdated and their forecasting performance deteriorating rapidly. This paper presents two novelties that could be adopted by forecasting institutions in unconventional times. The first innovation is the construction of an extensive data set for macroeconomic forecasting in Europe. We collect more than a thousand time series from conventional and unconventional sources, complementing traditional macroeconomic variables with timely big data indicators and assessing their added value at nowcasting. The second novelty consists of a methodology to merge an enormous amount of non-encompassing data with a large battery of classical and more sophisticated forecasting methods in a seamlessly dynamic Bayesian framework. Specifically, we introduce an innovative “selection prior†that is used not as a way to influence model outcomes, but as a selecting device among competing models. By applying this methodology to the COVID-19 crisis, we show which variables are good predictors for nowcasting Gross Domestic Product and draw lessons for dealing with possible future crises
    Keywords: Bayesian Model Averaging, Big Data, COVID-19 Pandemic, Nowcasting
    JEL: C11 C30 E3 E37
    Date: 2022–08
  11. By: Béatrice BOULU-RESHEF; Catherine BRUNEAU; Maxime NICOLAS; Thomas RENAULT
    Keywords: , Investor sentiment, Efficient market hypothesis, Natural language, Emojis, Social media, Experimental finance, Behavioral finance.
    Date: 2022
  12. By: Mike Kraehenbuehl; Joerg Osterrieder
    Abstract: This study examines the weak form of the efficient market hypothesis for Bitcoin using a feedforward neural network. Due to the increasing popularity of cryptocurrencies in recent years, the question has arisen, as to whether market inefficiencies could be exploited in Bitcoin. Several studies we refer to here discuss this topic in the context of Bitcoin using either statistical tests or machine learning methods, mostly relying exclusively on data from Bitcoin itself. Results regarding market efficiency vary from study to study. In this study, however, the focus is on applying various asset-related input features in a neural network. The aim is to investigate whether the prediction accuracy improves when adding equity stock indices (S&P 500, Russell 2000), currencies (EURUSD), 10 Year US Treasury Note Yield as well as Gold&Silver producers index (XAU), in addition to using Bitcoin returns as input feature. As expected, the results show that more features lead to higher training performance from 54.6% prediction accuracy with one feature to 61% with six features. On the test set, we observe that with our neural network methodology, adding additional asset classes, no increase in prediction accuracy is achieved. One feature set is able to partially outperform a buy-and-hold strategy, but the performance drops again as soon as another feature is added. This leads us to the partial conclusion that weak market inefficiencies for Bitcoin cannot be detected using neural networks and the given asset classes as input. Therefore, based on this study, we find evidence that the Bitcoin market is efficient in the sense of the efficient market hypothesis during the sample period. We encourage further research in this area, as much depends on the sample period chosen, the input features, the model architecture, and the hyperparameters.
    Date: 2022–06
  13. By: Danijel Jevtic; Romain Deleze; Joerg Osterrieder
    Abstract: In this bachelor thesis, we show how four different machine learning methods (Long Short-Term Memory, Random Forest, Support Vector Machine Regression, and k-Nearest Neighbor) perform compared to already successfully applied trading strategies such as Cross Signal Trading and a conventional statistical time series model ARMA-GARCH. The aim is to show that machine learning methods perform better than conventional methods in the crude oil market when used correctly. A more detailed performance analysis was made, showing the performance of the different models in different market phases so that the robustness of individual models in high and low volatility phases could be examined more closely. For further investigation, these models would also have to be analyzed in other markets.
    Date: 2022–06
  14. By: Leonardo de Assis Santos (UFRJ - Universidade Federal do Rio de Janeiro); Leonardo Marques (Audencia Business School)
    Abstract: Purpose The purpose of this study is to map current knowledge on big data analytics (BDA) for supply chain risk management (SCRM) while providing future research needs. Design/methodology/approach The research team systematically reviewed 53 articles published between 2015 and 2021 and further contrasted the synthesis of these articles with four in-depth interviews with BDA startups that provider solutions for SCRM. Findings The analysis is framed in three perspectives. First, supply chain visibility – i.e. the number of tiers in the solutions; second, BDA analytical approach – descriptive, prescriptive or predictive approaches; third, the SCRM processes from risk monitoring to risk optimization. The study underlines that the forefront of innovation lies in multi-tiered, multi-directional solutions based on prescriptive BDA to support risk response and optimization (SCRM). In addition, we show that research on these innovations is scant, thus offering an important avenue for future studies. Originality/value This study makes relevant contributions to the field. We offer a theoretical framework that highlights the key relationships between supply chain visibility, BDA approaches and SCRM processes. Despite being at forefront of the innovation frontier, startups are still an under-explored agent. In times of major disruptions such as COVID-19 and the emergence of a plethora of new technologies that reshape businesses dynamically, future studies should map the key role of such actors to the advancement of SCRM.
    Date: 2022
  15. By: Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
    Abstract: In this article we investigate the determinants of “New Doctorate Graduates” in Europe. We use data from the EIS-European Innovation Scoreboard of the European Commission for 36 countries in the period 2010-2019 with Pooled OLS, Dynamic Panel, WLS, Panel Data with Fixed Effects and Panel Data with Random Effects. We found that “New Doctorate Graduates” is positively associated, among others, with “Human Resources” and “Government Procurement of Advanced Technology Products” and negatively, associated among others, with “Total Entrepreneurial Activity” and “Innovation Index”. We apply a clusterization with k-Means algorithm either with the Silhouette Coefficient either with the Elbow Method and we found that in both cases the optimal number of clusters is three. Furthermore, we use the Network Analysis with the Distance of Manhattan, and we find the presence of seven network structures. Finally, we propose a confrontation among ten machine learning algorithms to predict the value of “New Doctorate Graduates” either with Original Data-OD either with Augmented Data-AD. Results show that SGD-Stochastic Gradient Descendent is the best predictor for OD while Linear Regression performs better for AD.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O3 O30 O32
    Date: 2022–09–06
  16. By: Henrika Langen
    Abstract: This study assesses the effect of the #MeToo movement on different quantifiers of the 2015-2020 judicial opinions in sexual violence related cases from 51 U.S. courts. The judicial opinions are vectorized into bag-of-words and tf-idf vectors in order to study their development over time. Further, different indicators quantify to what extent the judges use a language that implicitly shifts some blame from the victim(s) to the perpetrator(s). These indicators measure how the grammatical structure, the sentiment and the context of sentences mentioning the victim(s) and/or perpetrator(s) change over time. The causal effect of the #MeToo movement is estimated by means of Difference-in-Differences comparing the development of the language in opinions on sexual violence and other interpersonal crime related cases as well as a Panel Event Study approach. The results do not clearly identify a #MeToo-movement-induced change in the language in court but suggest that the movement may have accelerated the evolution of court language slightly, causing the effect to materialize with a significant time lag. Additionally, the study considers potential effect heterogeneity with respect to the judge's gender and his/her political affiliation. The study combines causal inference with text quantification methods that are commonly used for classification as well as with indicators from the fields of sentiment analysis, word embedding models and grammatical tagging.
    Date: 2022–09
  17. By: Om Mane; Saravanakumar kandasamy
    Abstract: The stock market is a network which provides a platform for almost all major economic transactions. While investing in the stock market is a good idea, investing in individual stocks may not be, especially for the casual investor. Smart stock-picking requires in-depth research and plenty of dedication. Predicting this stock value offers enormous arbitrage profit opportunities. This attractiveness of finding a solution has prompted researchers to find a way past problems like volatility, seasonality, and dependence on time. This paper surveys recent literature in the domain of natural language processing and machine learning techniques used to predict stock market movements. The main contributions of this paper include the sophisticated categorizations of many recent articles and the illustration of the recent trends of research in stock market prediction and its related areas.
    Date: 2022–08

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.