nep-big New Economics Papers
on Big Data
Issue of 2022‒01‒31
twenty papers chosen by
Tom Coupé
University of Canterbury

  1. Do Political Actors Engage in Strategic Deception on Social Media? By Ricketts, Simon
  2. The Turing Trap: The Promise & Peril of Human-Like Artificial Intelligence By Erik Brynjolfsson
  3. Short-term Prediction of Bank Deposit Flows: Do Textual Features matter? By Katsafados, Apostolos; Anastasiou, Dimitris
  5. Machine Learning-Based Feasibility Checks for Dynamic Time Slot Management By van der Hagen, L.; Agatz, N.A.H.; Spliet, R.; Visser, T.R.; Kok, A.L.
  6. Trustworthy Autonomous Vehicles By FERNANDEZ LLORCA David; GOMEZ Emilia
  7. Using Large-Scale Social Media Data for Population-Level Mental Health Monitoring and Public Sentiment Assessment: A Case Study of Thailand By Suppawong Tuarob; Thanapon Noraset; Tanisa Tawichsri
  8. Accelerated American Option Pricing with Deep Neural Networks By David Anderson; Urban Ulrych
  9. Fair learning with bagging By Jean-David Fermanian; Dominique Guegan
  10. Machine Learning Based Semiparametric Time Series Conditional Variance: Estimation and Forecasting By Justin Dang; Aman Ullah
  11. A Year of Pandemic: Levels, Changes and Validity of Well-Being Data from Twitter. Evidence from Ten Countries By Sarracino, Francesco; Greyling, Talita; O'Connor, Kelsey J.; Peroni, Chiara; Rossouw, Stephanié
  12. Machines and Markets : Assessing the Impact of Algorithmic Trading on Financial Market Efficiency By Garg, Karan
  13. Cheap Talk in Corporate Climate Commitments: The Role of Active Institutional Ownership, Signaling, Materiality, and Sentiment By Julia Anna Bingler; Mathias Kraus; Markus Leippold; Nicolas Webersinke
  14. Predicting Specialty Coffee Auction Prices Using Machine Learning By Aldott, Zoltan
  15. Green Infrastructure and Air Pollution: Evidence from Highways Connecting Two Megacities in China By Yu, Bo; Tran, Trang; Lee, Wang-Sheng
  16. Corporate Disclosure: Facts or Opinions? By Shimon Kogan; Vitaly Meursault
  17. Deep Partial Hedging By Songyan Hou; Thomas Krabichler; Marcus Wunsch
  18. Forecasting Realized Volatility Using Machine Learning and Mixed-Frequency Data (the Case of the Russian Stock Market) By Vladimir Pyrlik; Pavel Elizarov; Aleksandra Leonova
  19. The Impact of Artificial Intelligence on Labor Markets in Developing Countries: A New Method with an Illustration for Lao PDR and Vietnam By Carbonero, Francesco; Davies, Jeremy; Ernst, Ekkehard; Fossen, Frank M.; Samaan, Daniel; Sorgner, Alina
  20. Intelligent Trading Systems: A Sentiment-Aware Reinforcement Learning Approach By Francisco Caio Lima Paiva; Leonardo Kanashiro Felizardo; Reinaldo Augusto da Costa Bianchi; Anna Helena Reali Costa

  1. By: Ricketts, Simon (Monash University)
    Abstract: We examine whether political actors engage in strategic deception on social media. We find evidence that certain groups of politicians engage in deception in response to an election. To infer deception, we construct a novel wealth inference model from text of political social media accounts. We use machine learning and natural language processing, which is accurate to within half an order of magnitude when compared to real wealth disclosures as required by law in the United States. Wealth exaggeration is not homogenous ; in an election year, the wealthiest political actors minimise their perceived wealth, while the poorest exaggerate their perceived wealth. We do not find evidence that there are differences in exaggeration due to sex, party or experience.
    Keywords: Strategic deception ; wealth-inference ; machine-learning ; natural language processing ; social media ; election JEL Classification: C55 ; D72
    Date: 2021
  2. By: Erik Brynjolfsson
    Abstract: In 1950, Alan Turing proposed an imitation game as the ultimate test of whether a machine was intelligent: could a machine imitate a human so well that its answers to questions indistinguishable from a human. Ever since, creating intelligence that matches human intelligence has implicitly or explicitly been the goal of thousands of researchers, engineers, and entrepreneurs. The benefits of human-like artificial intelligence (HLAI) include soaring productivity, increased leisure, and perhaps most profoundly, a better understanding of our own minds. But not all types of AI are human-like. In fact, many of the most powerful systems are very different from humans. So an excessive focus on developing and deploying HLAI can lead us into a trap. As machines become better substitutes for human labor, workers lose economic and political bargaining power and become increasingly dependent on those who control the technology. In contrast, when AI is focused on augmenting humans rather than mimicking them, then humans retain the power to insist on a share of the value created. Furthermore, augmentation creates new capabilities and new products and services, ultimately generating far more value than merely human-like AI. While both types of AI can be enormously beneficial, there are currently excess incentives for automation rather than augmentation among technologists, business executives, and policymakers.
    Date: 2022–01
  3. By: Katsafados, Apostolos; Anastasiou, Dimitris
    Abstract: The purpose of this study is twofold. First, to construct short-term prediction models for bank deposit flows in the Euro area peripheral countries, employing machine learning techniques. Second, to examine whether textual features enhance the predictive ability of our models. We find that Random Forest models including both textual features and macroeconomic variables outperform those that include only macro factors or textual features. Monetary policy authorities or macroprudential regulators could adopt our approach to timely predict potential excessive bank deposit outflows and assess the resilience of the whole banking sector in the Euro area peripheral countries.
    Keywords: Bank deposit flows; European banks; textual analysis; short-term prediction; machine learning
    JEL: C0 C22 C5 C51 C54 E44 E47 G10
    Date: 2022–01
  4. By: Maria Mercanti-Guérin (IAE Paris - Sorbonne Business School)
    Abstract: Digital twins are a digital representation of the physical world. Among the multiple applications of digital twins, housing is one of the first sectors concerned. The objective of this re-search is to determine whether interior designs conceived by digital twins are more readable by artificial intelligence than human-designed designs (1) and whether these designs generate more consumer preference (2). Our first experiment shows that AI generates fewer annotation or classification errors when analyzing designs conceived via digital twins. Our second experiment shows that consumers' attitudes are more favorable towards designs conceived by digital twins. Aesthetics and complexity, which are two dimensions of object creativity, are perceived negatively. Only the "novelty" dimension which can be assimilated to modernity ex-plains a strong preference for this type of interior. A discussion on AI as a possible brake on creativity and reinforcement of the status quo bias is proposed.
    Keywords: status quo,creativity,AI,digital twins
    Date: 2021–11–25
  5. By: van der Hagen, L.; Agatz, N.A.H.; Spliet, R.; Visser, T.R.; Kok, A.L.
    Abstract: Online grocers typically let customers choose a delivery time slot to receive their goods. To ensure a reliable service, the retailer may want to close time slots as capacity fills up. The number of cus- tomers that can be served per slot largely depends on the specific order sizes and delivery locations. Conceptually, checking whether it is possible to serve a certain customer in a certain time slot given a set of already accepted customers involves solving a vehicle routing problem with time windows. This is challenging in practice as there is little time available and not all relevant information is known in advance. We explore the use of machine learning to support time slot decisions in this context. Our results on realistic instances using a commercial route solver suggest that machine learning can be a promising way to assess the feasibility of customer insertions. On large-scale routing problems it performs better than insertion heuristics.
    Keywords: time slot management, vehicle routing, supervised machine learning
    Date: 2022–01–17
  6. By: FERNANDEZ LLORCA David (European Commission - JRC); GOMEZ Emilia (European Commission - JRC)
    Abstract: This report aims to advance towards a general framework on Trustworthy AI for the specific domain of Autonomous Vehicles (AVs). The implementation and relevance of the assessment list established by the independent High Level Expert Group on Artificial Intelligence (AI HLEG) as a tool to translate the seven requirements that AI systems should meet in order to be trustworthy, defined in the Ethics Guidelines, are discussed in detail and contextualized for the field of AVs. The general behaviour of an AV depends on a set of multiple, complex, interrelated Artificial Intelligence (AI) based systems, each dealing with problems of different nature. The application context of AVs can intuitively be considered high-risk, and their adoption involves addressing significant technical, political and societal challenges. However, AVs could bring substantial benefits, improving safety, mobility, and the environment. Therefore, although challenging, it seems necessary to deepen the application of the assessment criteria of trustworthy AI for AVs.
    Keywords: AI, autonomous vehicles
    Date: 2021–12
  7. By: Suppawong Tuarob; Thanapon Noraset; Tanisa Tawichsri
    Abstract: Mental health problems are among major public health concerns during the COVID-19 pandemic, given heightened uncertainties and drastic changes in lifestyles. However, mental health problem prevention and monitoring could be greatly improved given advancements in deep-learning techniques and readily available social media messages. This research uses deep learning algorithms to extract emotion, mood, and psychological cues from social media messages and then aggregates these signals to track population-level mental health. To verify the accuracy of our proposed approaches, we compared our findings to the actual number of patients treated for depression, attempted suicides, and self-harm cases reported by Thailand’s Department of Mental Health. We discovered a strong correlation between the predicted mental signals and actual depression, suicide, and self-harm (injured) cases. Finally, we also create a database and user-friendly interface to facilitate researchers and policymakers to explore our extracted mental signals for further applications such as policy sentiment assessment.
    Keywords: Mental Health; Natural Language Processing; Deep Learning; Social Networks
    JEL: I10
    Date: 2022–01
  8. By: David Anderson (University of Zurich); Urban Ulrych (University of Zurich - Department of Banking and Finance; Swiss Finance Institute)
    Abstract: Given the competitiveness of a market-making environment, the ability to speedily quote option prices consistent with an ever-changing market environment is essential. Thus, the smallest acceleration or improvement over traditional pricing methods is crucial to avoid arbitrage. We propose a novel method for accelerating the pricing of American options to near-instantaneous using a feed-forward neural network. This neural network is trained over the chosen (e.g., Heston) stochastic volatility specification. Such an approach facilitates parameter interpretability, as generally required by the regulators, and establishes our method in the area of eXplainable Artificial Intelligence (XAI) for finance. We show that the proposed deep explainable pricer induces a speed accuracy trade-off compared to the typical Monte Carlo or Partial Differential Equation-based pricing methods. Moreover, the proposed approach allows for pricing derivatives with path dependent and more complex payoffs and is, given the sufficient accuracy of computation and its tractable nature, applicable in a market-making environment.
    Keywords: American Option Pricing, Deep Neural Networks, Explainable Artificial Intelligence, Speed-Accuracy Trade-Off, Market Making, Heston Model, Computational Finance.
    JEL: C45 C63 G13
    Date: 2022–01
  9. By: Jean-David Fermanian (Ensae-Crest); Dominique Guegan (UP1 - Université Paris 1 Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, University of Ca’ Foscari [Venice, Italy])
    Abstract: The central question of this paper is how to enhance supervised learning algorithms with fairness requirement ensuring that any sensitive input does not "'unfairly"' influence the outcome of the learning algorithm. To attain this objective we proceed by three steps. First after introducing several notions of fairness in a uniform approach, we introduce a more general notion through conditional fairness definition which englobes most of the well known fairness definitions. Second we use a ensemble of binary and continuous classifiers to get an optimal solution for a fair predictive outcome using a related-post-processing procedure without any transformation on the data, nor on the training algorithms. Finally we introduce several tests to verify the fairness of the predictions. Some empirics are provided to illustrate our approach.
    Keywords: fairness,nonparametric regression,classification,accuracy
    Date: 2021–11
  10. By: Justin Dang (UCR); Aman Ullah (Department of Economics, University of California Riverside)
    Abstract: This paper proposes a new combined semiparametric estimator of the conditional variance that takes the product of a parametric estimator and a nonparametric estimator based on machine learning. A popular kernel based machine learning algorithm, known as kernel regularized least squares estimator, is used to estimate the nonparametric component. We discuss how to estimate the semiparametric estimator using real data and how to use this estimator to make forecasts for the conditional variance.Simulations are conducted to show the dominance of the proposed estimator in terms of mean squared error. An empirical application using S&P 500 daily returns is analyzed, and the semiparametric estimator effectively forecasts future volatility.
    Keywords: Conditional variance; Nonparametric estimator; Semiparametric models; Forecasting; Machine Learning
    JEL: C01 C14 C51
    Date: 2021–01
  11. By: Sarracino, Francesco; Greyling, Talita (University of Johannesburg); O'Connor, Kelsey J. (STATEC Research – National Institute of Statistics and Economic Studies); Peroni, Chiara (STATEC Research – National Institute of Statistics and Economic Studies); Rossouw, Stephanié (Auckland University of Technology)
    Abstract: In this article, we describe how well-being changed during 2020 in ten countries, namely Australia, Belgium, France, Germany, Great Britain, Italy, Luxembourg, New Zealand, South Africa, and Spain. Our measure of well-being is the Gross National Happiness (GNH), a country-level index built applying sentiment analysis to data from Twitter. We aim to describe how GNH changed during the pandemic within countries, assess its validity as a measure of well-being, and analyse its correlations. We take advantage of a unique dataset of daily observations about GNH, generalised trust and trust in national institutions, fear concerning the economy, loneliness, infection rate, policy stringency and distancing. To assess the validity of the data sourced from Twitter, we exploit various survey data sources, such as the Eurobarometer and consumer satisfaction, and Big Data sources, such as Google Trends. Results indicate that sentiment analysis of tweets can provide reliable and timely information on well-being. This can be particularly useful to timely inform decision-making.
    Keywords: happiness, COVID-19, Big Data, Twitter, Sentiment Analysis, well-being, public policy, trust, fear, loneliness
    JEL: C55 I10 I31 H12
    Date: 2021–11
  12. By: Garg, Karan (University of Warwick)
    Abstract: The rise of machine learning has revolutionised finance. Institutions across the world have increasingly turned to data science and machine learning to create trading models without the need for human intervention. This has had various implications for the financial markets that they operate in, including market efficiency. This paper simulates a financial market with agent-based modelling and Monte-Carlo style simulations, to motivate a qualitative discussion about the implications of increased algorithmic trading on financial market efficiency. It finds that algorithmic traders (ATs) can seemingly increase market efficiency through better liquidity management and more complete extraction of information from prices. However, this also comes with increased instability and potential convergence to an unstable equilibrium. The Adaptive Market Hypothesis (Lo, 2004) is suggested as an alternative framework for analysing AT behaviour.
    Keywords: Neural Networks ; Agent-Based Modelling ; Efficient Market Hypothesis ; Stock Market Simulation ; Financial Regulation JEL Classification: C45 ; C53 ; G14 ; G17 ; G18
    Date: 2021
  13. By: Julia Anna Bingler (ETH Zürich - CER-ETH - Center of Economic Research at ETH Zurich); Mathias Kraus (University of Erlangen-Nuremberg-Friedrich Alexander Universität Erlangen Nürnberg); Markus Leippold (University of Zurich; Swiss Finance Institute - University of Zurich); Nicolas Webersinke (Friedrich-Alexander-Universität Erlangen-Nürnberg)
    Abstract: Corporate climate disclosures based on the TCFD recommendations are considered an important prerequisite to managing climate-related financial risks. At the same time, current disclosures are imprecise, inaccurate, and greenwashing-prone. Yet, existing research on this matter suffers from small samples or inaccuracies. Therefore, we introduce a scalable deep learning approach to enable comprehensive climate disclosure analyses of large samples by fine-tuning the ClimateBert model. Our model significantly outperforms previous approaches. We then extract the amount of cheap talk, defined as the share of precise versus imprecise climate commitments, of 14,584 annual reports of the MSCI World index firms from 2010 to 2020. Finally, we use this data to test various hypotheses on the drivers of cheap talk. We find that institutional ownership, targeted institutional investor engagement, materiality and downside risk disclosures are associated with less cheap talk. Signaling by publicly supporting the TCFD is associated with more cheap talk.
    Keywords: Corporate climate disclosures, voluntary reporting, commitments, TCFD recommendations, textual analysis, natural language processing.
    JEL: G2 G38 C8 M48
    Date: 2022–01
  14. By: Aldott, Zoltan (University of Warwick)
    Abstract: This paper aims to contribute to the coffee pricing literature pertaining to the Cup of Excellence (CoE) competitions by revising the feature set used and extending the modelling approach using machine learning. The specific dataset used is merged from data provided by the Alliance for Coffee Excellence and information collected through scraping public information from the Cup of Excellence website. The paper compares popular supervised learning algorithms exploring multiple interpretations of tasting notes to attain an efficient predictive model of prices. The algorithms compared include OLS, regularised linear algorithms, the decision tree, as well as, bagging and gradient-boosting ensemble methods. The best-performing models are further optimised using hyperparameter tuning and the most efficient one is selected. Based on a gradient-boosting regression, the final model is analysed to find the key relationships driving model predictions. Permutation feature importance and accumulated local effects analyses are used to provide insights into the non-linearities present in the data generating process.
    Keywords: specialty coffee ; machine learning ; prediction ; Coffee Taster’s Flavor Wheel ; Cup of Excellence JEL Classification: C53 ; C81 ; D44 ; Q11
    Date: 2021
  15. By: Yu, Bo (Deakin University); Tran, Trang (University of Maryland at College Park); Lee, Wang-Sheng (Monash University)
    Abstract: Following market liberalisation, the vehicle population in China has increased dramatically over the past few decades. This paper examines the causal impact of the opening of a heavily used high speed rail line connecting two megacities in China in 2015, Chengdu and Chongqing, on air pollution. We use high-frequency and high spatial resolution data to track pollution along major highways linking the two cities. Our approach involves the use of an augmented regression discontinuity in time approach applied on data that have been through a meteorological normalisation process. This deweathering process involves applying machine learning techniques to account for change in meteorology in air quality time series data. Our estimates show that air pollution is reduced by 7.6% along the main affected highway. We simultaneously find increased levels of ozone pollution which is likely due to the reduction in nitrogen dioxide levels that occurred. These findings are supported using a difference-in-difference approach.
    Keywords: air pollution, China, green infrastructure, high-speed railway, regression discontinuity, machine learning
    JEL: L92 O18 Q53 Q54 R41
    Date: 2021–11
  16. By: Shimon Kogan; Vitaly Meursault
    Abstract: A large body of literature documents the link between textual communication (e.g., news articles, earnings calls) and firm fundamentals, either through pre-defined “sentiment” dictionaries or through machine learning approaches. Surprisingly, little is known about why textual communication matters. In this paper, we take a step in that direction by developing a new methodology to automatically classify statements into objective (“facts”) and subjective (“opinions”) and apply it to transcripts of earnings calls. The large scale estimation suggests several novel results: (1) Facts and opinions are both prominent parts of corporate disclosure, taking up roughly equal parts, (2) higher prevalence of opinions is associated with investor disagreement, (3) anomaly returns are realized around the disclosure of opinions rather than facts, and (4) facts have a much stronger correlation with contemporaneous financial performance but facts and opinions have an equally strong association with financial results for the next quarter.
    Keywords: Subjectivity; Machine Learning; NLP; Text Analysis
    JEL: C00 G12 G14
    Date: 2021–11–26
  17. By: Songyan Hou; Thomas Krabichler; Marcus Wunsch
    Abstract: Using techniques from deep learning (cf. [B\"uh+19]), we show that neural networks can be trained successfully to replicate the modified payoff functions that were first derived in the context of partial hedging by [FL00]. Not only does this approach better accommodate the realistic setting of hedging in discrete time, it also allows for the inclusion of transaction costs as well as general market dynamics.
    Date: 2021–12
  18. By: Vladimir Pyrlik; Pavel Elizarov; Aleksandra Leonova
    Abstract: We assess the performance of selected machine learning algorithms (lasso, random forest, gradient boosting, and long short-term memory) in forecasting the daily realized volatility of returns of selected top stocks in the Russian stock market in comparison with a heterogeneous autoregressive realized volatility benchmark in 2018-2020. We seek to improve the predictive power of the models by including various economic indicators that carry information about future volatility. We find that lasso delivers a good combination of easy implementation and forecast precision. The other algorithms require fine-tuning and frequent re-training, otherwise they are likely to fail to outperform the benchmark often enough. Only the basic lagged log-RV values are significant explanatory variables in terms of the benchmark in-sample quality. Many economic indicators of mixed frequencies improve the predictive power of lasso though, including calendar and overnight effects, financial spillovers from local and global markets, and various macroeconomics indicators.
    Keywords: heterogeneous autoregressive model; machine learning; lasso; gradient boosting; random forest; long short-term memory; realized volatility; Russian stock market; mixed-frequency data;
    Date: 2021–11
  19. By: Carbonero, Francesco (University of Turin); Davies, Jeremy (East Village Software Consultants); Ernst, Ekkehard (ILO International Labour Organization); Fossen, Frank M. (University of Nevada, Reno); Samaan, Daniel (ILO International Labour Organization); Sorgner, Alina (John Cabot University)
    Abstract: AI is transforming labor markets around the world. Existing research has focused on advanced economies but has neglected developing economies. Different impacts of AI on labor markets in different countries arise not only from heterogeneous occupational structures, but also from the fact that occupations vary across countries in their composition of tasks. We propose a new methodology to translate existing measures of AI impacts that were developed for the US to countries at various levels of economic development. Our method assesses semantic similarities between textual descriptions of work activities in the US and workers' skills elicited in surveys for other countries. We implement the approach using the measure of suitability of work activities for machine learning provided by Brynjolfsson et al. (2018) for the US and the World Bank's STEP survey for Lao PDR and Viet Nam. Our approach allows characterizing the extent to which workers and occupations in a given country are subject to destructive digitalization, which puts workers at risk of being displaced, in contrast to transformative digitalization, which tends to benefit workers. We find that workers in Lao PDR are less likely than in Viet Nam to be in the "machine terrain", where workers will have to adapt to occupational transformations due to AI and are at risk of being partially displaced. Our method based on semantic textual similarities using SBERT is advantageous compared to approaches transferring AI impact scores across countries using crosswalks of occupational codes.
    Keywords: artificial intelligence, machine learning, digitalization, labor, skills, developing countries
    JEL: J22 J23 O14 O33
    Date: 2021–12
  20. By: Francisco Caio Lima Paiva; Leonardo Kanashiro Felizardo; Reinaldo Augusto da Costa Bianchi; Anna Helena Reali Costa
    Abstract: The feasibility of making profitable trades on a single asset on stock exchanges based on patterns identification has long attracted researchers. Reinforcement Learning (RL) and Natural Language Processing have gained notoriety in these single-asset trading tasks, but only a few works have explored their combination. Moreover, some issues are still not addressed, such as extracting market sentiment momentum through the explicit capture of sentiment features that reflect the market condition over time and assessing the consistency and stability of RL results in different situations. Filling this gap, we propose the Sentiment-Aware RL (SentARL) intelligent trading system that improves profit stability by leveraging market mood through an adaptive amount of past sentiment features drawn from textual news. We evaluated SentARL across twenty assets, two transaction costs, and five different periods and initializations to show its consistent effectiveness against baselines. Subsequently, this thorough assessment allowed us to identify the boundary between news coverage and market sentiment regarding the correlation of price-time series above which SentARL's effectiveness is outstanding.
    Date: 2021–11

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.