nep-big New Economics Papers
on Big Data
Issue of 2023‒06‒26
twenty-one papers chosen by
Tom Coupé
University of Canterbury

  1. Fed Communication, News, Twitter, and Echo Chambers By Bennett Schmanski; Chiara Scotti; Clara Vega
  2. More than Words: Twitter Chatter and Financial Market Sentiment By Andrea Ajello; Diego Silva; Travis Adams; Francisco Vazquez-Grande
  3. Machine Learning and Deep Learning Forecasts of Electricity Imbalance Prices By Sinan Deng; John Inekwe; Vladimir Smirnov; Andrew Wait; Chao Wang
  4. Deep Learning for Solving and Estimating Dynamic Macro-Finance Models By Benjamin Fan; Edward Qiao; Anran Jiao; Zhouzhou Gu; Wenhao Li; Lu Lu
  5. Gated Deeper Models are Effective Factor Learners By Jingjing Guo
  6. Machine learning and physician prescribing: a path to reduced antibiotic use By Michael Allan Ribers; Hannes Ullrich
  7. Contingent valuation machine learning (CVML): A novel method for estimating citizens’ willingness- to- pay for safer and cleaner environment By Khuc, Quy Van; Tran, Duc-Trung
  8. From risk mitigation to employee action along the machine learning pipeline: A paradigm shift in European regulatory perspectives on automated decision-making systems in the workplace By Mollen, Anne; Hondrich, Lukas
  9. Practical and Ethical Perspectives on AI-Based Employee Performance Evaluation By Pletcher, Scott Nicholas
  10. Mind Your Language: Market Responses to Central Bank Speeches By Maximilian Ahrens; Deniz Erdemlioglu; Michael McMahon; Christopher J. Neely; Xiye Yang
  11. Backward Hedging for American Options with Transaction Costs By Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
  12. A Behaviorally-Validated Warm Glow Questionnaire By Carpenter, Jeffrey P.; Lyford, Alex; Zhang, Mingfang
  13. “Density forecasts of inflation using Gaussian process regression models” By Petar Soric; Enric Monte; Salvador Torra; Oscar Claveria
  14. Reinforcement Learning and Portfolio Allocation: Challenging Traditional Allocation Methods By Lavko, Matus; Klein, Tony; Walther, Thomas
  15. From Alchemy to Analytics: Unleashing the Potential of Technical Analysis in Predicting Noble Metal Price Movement By Marcin Chlebus; Artur Nowak
  16. Measuring Job Loss during the Pandemic Recession in Real Time with Twitter Data By Anbar Aizenman; Connor M. Brennan; Tomaz Cajner; Cynthia L. Doniger; Jacob Williams
  17. Analyzing Climate Change Policy Narratives with the Character-Role Narrative Framework By Kai Gehring; Matteo Grigoletto
  18. Occupational segregation in the digital economy? A Natural Language Processing approach using UK Web Data By Occhini, Giulia; Tranos, Emmanouil; Wolf, Levi John
  19. Efficient Learning of Nested Deep Hedging using Multiple Options By Masanori Hirano; Kentaro Imajo; Kentaro Minami; Takuya Shimada
  20. Uncertainty about the War in Ukraine: Measurement and Effects on the German Business Cycle By Moritz Grebe; Sinem Kandemir; Peter Tillmann
  21. Minority Ethnic Vulnerabilities in the Use of Digital Housing Services Across Age Groups By Hasan, Sacha; Yuan, Yingfang

  1. By: Bennett Schmanski; Chiara Scotti; Clara Vega
    Abstract: We estimate monetary policy surprises (sentiment) from the perspective of three different textual sources: direct central bank communication (FOMC statements and press conferences), news articles, and Twitter posts during FOMC announcement days. Textual sentiment across sources is highly correlated, but there are times when news and Twitter sentiment substantially disagree with the sentiment conveyed by the central bank. We find that sentiment estimated using news articles correlates better with daily U.S. Treasury yield changes than the sentiment extracted directly from Fed communication, and better predicts revisions in economic forecasts and FOMC decisions. Twitter sentiment is also useful, but slightly less so than news sentiment. These results suggest that news coverage and Tweets are not a simple echo chamber but they provide additional useful information. We use Sastry (2022)'s theoretical model to guide our empirical analysis and test three mechanisms that can explain what drives monetary policy surprises extracted from different sources: asymmetric information (central bank has better information than journalists and Tweeters), journalists (and Tweeters) have erroneous beliefs about the monetary policy rule, and the central bank and journalists (Tweeters) have different confidence in public information. Our empirical results suggest that the latter mechanism is the most likely mechanism.
    Keywords: Monetary policy; Public information; Price discovery
    JEL: C53 D83 E27 E37 E44 E47 E50 G10
    Date: 2023–05–26
  2. By: Andrea Ajello; Diego Silva; Travis Adams; Francisco Vazquez-Grande
    Abstract: We build a new measure of credit and financial market sentiment using Natural Language Processing on Twitter data. We find that the Twitter Financial Sentiment Index (TFSI) correlates highly with corporate bond spreads and other price- and survey-based measures of financial conditions. We document that overnight Twitter financial sentiment helps predict next day stock market returns. Most notably, we show that the index contains information that helps forecast changes in the U.S. monetary policy stance: a deterioration in Twitter financial sentiment the day ahead of an FOMC statement release predicts the size of restrictive monetary policy shocks. Finally, we document that sentiment worsens in response to an unexpected tightening of monetary policy.
    Keywords: Financial Market Sentiment; Monetary policy; Natural Language Processing; Stock Returns; Twitter
    JEL: D53 C58 C55 E52
    Date: 2023–05–23
  3. By: Sinan Deng; John Inekwe; Vladimir Smirnov; Andrew Wait; Chao Wang
    Abstract: In this paper, we propose a seasonal attention mechanism, the effectiveness of which is evaluated via the Bidirectional Long Short-Term Memory (BiLSTM) model. We compare its performance with alternative deep learning and machine learning models in forecasting the balancing settlement prices in the electricity market of Great Britain. Critically, the Seasonal Attention-Based BiLSTM framework provides a superior forecast of extreme prices with an out-of-sample gain in the predictability of 25-37% compared with models in the literature. Our forecasting techniques could aid both market participants, to better manage their risk and assign their assets, and policy makers, to operate the system at lower cost.
    Keywords: forecasting; electricity; balance settlement prices; Long Short-Term Memory; machine learning.
    Date: 2023–06
  4. By: Benjamin Fan; Edward Qiao; Anran Jiao; Zhouzhou Gu; Wenhao Li; Lu Lu
    Abstract: We develop a methodology that utilizes deep learning to simultaneously solve and estimate canonical continuous-time general equilibrium models in financial economics. We illustrate our method in two examples: (1) industrial dynamics of firms and (2) macroeconomic models with financial frictions. Through these applications, we illustrate the advantages of our method: generality, simultaneous solution and estimation, leveraging the state-of-art machine-learning techniques, and handling large state space. The method is versatile and can be applied to a vast variety of problems.
    Date: 2023–05
  5. By: Jingjing Guo
    Abstract: Precisely forecasting the excess returns of an asset (e.g., Tesla stock) is beneficial to all investors. However, the unpredictability of market dynamics, influenced by human behaviors, makes this a challenging task. In prior research, researcher have manually crafted among of factors as signals to guide their investing process. In contrast, this paper view this problem in a different perspective that we align deep learning model to combine those human designed factors to predict the trend of excess returns. To this end, we present a 5-layer deep neural network that generates more meaningful factors in a 2048-dimensional space. Modern network design techniques are utilized to enhance robustness training and reduce overfitting. Additionally, we propose a gated network that dynamically filters out noise-learned features, resulting in improved performance. We evaluate our model over 2, 000 stocks from the China market with their recent three years records. The experimental results show that the proposed gated activation layer and the deep neural network could effectively overcome the problem. Specifically, the proposed gated activation layer and deep neural network contribute to the superior performance of our model. In summary, the proposed model exhibits promising results and could potentially benefit investors seeking to optimize their investment strategies.
    Date: 2023–05
  6. By: Michael Allan Ribers; Hannes Ullrich
    Abstract: Inefficient human decisions are driven by biases and limited information. Health care is one leading example where machine learning is hoped to deliver efficiency gains. Antibiotic resistance constitutes a major challenge to health care systems due to human antibiotic overuse. We investigate how a policy leveraging the strengths of a machine learning algorithm and physicians can provide new opportunities to reduce antibiotic use. We focus on urinary tract infections in primary care, a leading cause for antibiotic use, where physicians often prescribe prior to attaining diagnostic certainty. Symptom assessment and rapid testing provide diagnostic information with limited accuracy, while laboratory testing can diagnose bacterial infections with considerable delay. Linking Danish administrative and laboratory data, we optimize policy rules which base initial prescription decisions on machine learning predictions and delegate decisions to physicians where these benefit most from private information at the point-of-care. The policy shows a potential to reduce antibiotic prescribing by 8.1 percent and overprescribing by 20.3 percent without assigning fewer prescriptions to patients with bacterial infections. We find human-algorithm complementarity is essential to achieve efficiency gains.
    Date: 2023–06–05
  7. By: Khuc, Quy Van; Tran, Duc-Trung
    Abstract: This paper introduces an advanced method that integrates contingent valuation and machine learning (CVML) to estimate residents’ demand for mitigating environmental pollutions and climate change. To be precise, CVML is an innovative hybrid machine-learning model, and it can leverage a limited amount of survey data for prediction and data enrichment purposes. The model comprises of two interconnected modules: Module I, an unsupervised learning algorithm, and Module II, a supervised learning algorithm. Module I is responsible for clustering the data (x^sur) into groups based on common characteristics, thereby grouping the corresponding dependent variable (y^sur) values as well. Take a survey on the topic of air pollution in Hanoi in 2019 as an example, we find that CVML can predict households’ willingness– to– pay for polluted air mitigation at a high degree of accuracy (i.e., over 90%). This finding suggests that CVML is a powerful and practical method that would be potentially widely applied in fields of environmental economics and sustainability science in years to come.
    Date: 2023–05–17
  8. By: Mollen, Anne; Hondrich, Lukas
    Abstract: Automated decision-making (ADM) systems in the workplace aggravate the power imbalance between employees and employers by making potentially crucial decisions about employees. Current approaches focus on risk mitigation to safeguard employee interests. While limiting risks remains important, employee representatives should be able to include their interests in the decision-making of ADM systems. This paper introduces the concept of the Machine Learning Pipeline to demonstrate how these interests can be implemented in practice and point to necessary structural transformations.
    Keywords: Artificial Intelligence, EU regulation, workplace, democracy, employee representatives
    Date: 2023
  9. By: Pletcher, Scott Nicholas
    Abstract: For most, job performance evaluations are often just another expected part of the employee experience. While these evaluations take on different forms depending on the occupation, the usual objective is to align the employee’s activities with the values and objectives of the greater organization. Of course, pursuing this objective involves a whole host of complex skills and abilities which sometimes pose challenges to leaders and organizations. Automation has long been a favored tool of businesses to help bring consistency, efficiency, and accuracy to various processes, including many human capital management processes. Recent improvements in artificial intelligence (AI) approaches have enabled new options for its use in the HCM space. One such use case is assisting leaders in evaluating their employees’ performance. While using technology to measure and evaluate worker production is not novel, the potential now exists through AI algorithms to delve beyond just piece-meal work and make inferences about an employee’s economic impact, emotional state, aptitude for leadership and the likelihood of leaving. Many organizations are eager to use these tools, potentially saving time and money, and are keen on removing bias or inconsistency humans can introduce in the employee evaluation process. However, these AI models often consist of large, complex neural networks where transparency and explainability are not easily achieved. These black-box systems might do a reasonable job, but what are the implications of faceless algorithms making life-changing decisions for employees?
    Date: 2023–04–28
  10. By: Maximilian Ahrens; Deniz Erdemlioglu; Michael McMahon; Christopher J. Neely; Xiye Yang
    Abstract: Researchers have carefully studied post-meeting central bank communication and have found that it often moves markets, but they have paid less attention to the more frequent central bankers’ speeches. We create a novel dataset of US Federal Reserve speeches and use supervised multimodal natural language processing methods to identify how monetary policy news affect financial volatility and tail risk through implied changes in forecasts of GDP, inflation, and unemployment. We find that news in central bankers’ speeches can help explain volatility and tail risk in both equity and bond markets. We also find that markets attend to these signals more closely during abnormal GDP and inflation regimes. Our results challenge the conventional view that central bank communication primarily resolves uncertainty.
    Keywords: central bank communication; multimodal machine learning; natural language processing; speech analysis; high-frequency data; volatility; tail risk
    JEL: E50 E52 C45 C53 G10 G12 G14
    Date: 2023–05–31
  11. By: Ludovic Gouden\`ege; Andrea Molent; Antonino Zanette
    Abstract: In this article, we introduce an algorithm called Backward Hedging, designed for hedging European and American options while considering transaction costs. The optimal strategy is determined by minimizing an appropriate loss function, which is based on either a risk measure or the mean squared error of the hedging strategy at maturity. By appropriately reformulating this loss function, we can address its minimization by moving backward in time. The approach avoids machine learning and instead relies on traditional optimization techniques, Monte Carlo simulations, and interpolations on a grid. Comparisons with the Deep Hedging algorithm in various numerical experiments showcase the efficiency and accuracy of the proposed method.
    Date: 2023–05
  12. By: Carpenter, Jeffrey P. (Middlebury College); Lyford, Alex (Middlebury College); Zhang, Mingfang (Middlebury College)
    Abstract: Measuring the social preferences of economic agents using experiments has become common place. This process, while incentive compatible, is costly and time consuming, making it infeasible in many settings. We combine standard altruism and warm glow choice experiments with a battery of candidate survey questions to construct behaviorally-validated questionnaires. We use machine learning to create parsimonious 3-question modules that reliably replicate existing results on general altruism and provide an alternative method for collecting warm glow preferences.
    Keywords: experiment, altruism, warm glow, survey validation
    JEL: C91 D64 D91 H41
    Date: 2023–06
  13. By: Petar Soric (University of Zagreb); Enric Monte (Polytechnic University of Catalunya); Salvador Torra (Riskcenter-IREA, University of Barcelona); Oscar Claveria (AQR-IREA, University of Barcelona)
    Abstract: The present study uses Gaussian Process regression models for generating density forecasts of inflation within the New Keynesian Phillips curve (NKPC) framework. The NKPC is a structural model of inflation dynamics in which we include the output gap, inflation expectations, fuel world prices and money market interest rates as predictors. We estimate country-specific time series models for the 19 Euro Area (EA) countries. As opposed to other machine learning models, Gaussian Process regression allows estimating confidence intervals for the predictions. The performance of the proposed model is assessed in a one-step-ahead forecasting exercise. The results obtained point out the recent inflationary pressures and show the potential of Gaussian Process regression for forecasting purposes.
    Keywords: Machine learning, Gaussian process regression, Time-series analysis, Economic forecasting, Inflation, New Keynesian Phillips curve JEL classification: C45, C51, C53, E31
    Date: 2022–07
  14. By: Lavko, Matus; Klein, Tony; Walther, Thomas
    Abstract: We test the out-of-sample trading performance of model-free reinforcement learning (RL) agents and compare them with the performance of equally-weighted portfolios and traditional mean-variance (MV) optimization benchmarks. By dividing European and U.S. indices constituents into factor datasets, the RL-generated portfolios face different scenarios defined by these factor environments. The RL approach is empirically evaluated based on a selection of measures and probabilistic assessments. Training these models only on price data and features constructed from these prices, the performance of the RL approach yields better risk-adjusted returns as well as probabilistic Sharpe ratios compared to MV specifications. However, this performance varies across factor environments. RL models partially uncover the nonlinear structure of the stochastic discount factor. It is further demonstrated that RL models are successful at reducing left-tail risks in out-of-sample settings. These results indicate that these models are indeed useful in portfolio management applications.
    Keywords: Asset Allocation, Reinforcement Learning, Machine Learning, Portfolio Theory, Diversification
    JEL: G11 C44 C55 C58
    Date: 2023
  15. By: Marcin Chlebus (University of Warsaw, Faculty of Economic Sciences); Artur Nowak (University of Warsaw, Faculty of Economic Sciences)
    Abstract: Algorithmic trading has been a central theme in numerous research papers, combining knowledge from the fields of Finance and Mathematics. This thesis aimed to apply basic Technical Analysis indicators for predicting price movement of three noble metals: Gold, Silver, and Platinum in a form of multi-class classification. That task was performed using four algorithms: Logistic Regression, k-Nearest Neighbors, Random Forest and XGBoost. The study incorporated feature filtering methods such as Kendall-tau filtering and PCA, as well as five different data frequencies: 1, 5, 10, 15 and 20 trading days. From a total of 40 potential models for each metal, the best one was selected and evaluated using data from period 2018-2022. The result revealed that models utilizing only Technical Analysis indicators were able to predict price movements to a significant extent, leading to investment strategies that outperformed the market in two out of three cases.
    Keywords: precious metals, algotrading, machine learning, multiclass classification, logistic regression, nearest neighbors, random forest, xgboost
    JEL: C38 C51 C52 C58 G17
    Date: 2023
  16. By: Anbar Aizenman; Connor M. Brennan; Tomaz Cajner; Cynthia L. Doniger; Jacob Williams
    Abstract: We present an indicator of job loss derived from Twitter data, based on a fine-tuned neural network with transfer learning to classify if a tweet is job-loss related or not. We show that our Twitter-based measure of job loss is well-correlated with and predictive of other measures of unemployment available in the official statistics and with the added benefits of real-time availability and daily frequency. These findings are especially strong for the period of the Pandemic Recession, when our Twitter indicator continues to track job loss well but where other real-time measures like unemployment insurance claims provided an imperfect signal of job loss. Additionally, we find that our Twitter job loss indicator provides incremental information in predicting official unemployment flows in a given month beyond what weekly unemployment insurance claims offer.
    Keywords: Job Loss; Natural Language Processing; Neural Networks
    JEL: J63
    Date: 2023–05–23
  17. By: Kai Gehring; Matteo Grigoletto
    Abstract: Understanding behavioral aspects of collective decision-making is an important challenge for eco-nomics, and narratives are a crucial group-based mechanism that influences human decision-making. This paper introduces the Character-Role Narrative Framework as a tool to systematically analyze narratives, and applies it to study US climate change policy on Twitter over the 2010-2021 period. We build on the idea of the so-called drama triangle that suggests, within the context of a topic, the essence of a narrative is captured by its characters in one of three essential roles: hero, villain, and victim. We show how this intuitive framework can be easily integrated into an empirical pipeline and scaled up to large text corpora using supervised machine learning. In our application to US climate change policy narratives, we find strong changes in the frequency of simple and complex character-role narratives over time. Using contagiousness, popularity, and sparking conversation as three distinct dimensions of virality, we show that narratives that are simple, feature human characters and emphasize villains tend to be more viral. Focusing on Donald Trump as an example of a populist leader, we demonstrate that populism is linked to a higher share of such simple, human, and villain-focused narratives.
    Keywords: narrative economics, text-as-data, machine learning, large language models, climate change, virality, populism
    JEL: C80 D72 H10 P16 Q54
    Date: 2023
  18. By: Occhini, Giulia; Tranos, Emmanouil; Wolf, Levi John (University of Bristol)
    Abstract: This paper investigates whether and how occupational segregation affects the digital economy. Despite the continuous growth of entrepreneurial activity in the digital, little is known about the demographic characteristics of people actively engaging with it and bene ting from it. Further, while popular discourse portrays the digital as a \level playing eld" for economic engagement, the literature has yet to empirically test these claims. Gaining a better understanding of whether occupational segregation is replicated in the digital can assist us in bridging new types of digital inequalities and demystify meritocratic narratives around success in the digital economy. To address this question, we use textual data extracted from UK commercial websites and model digital economic activities through Natural Language Processing techniques. We compare our findings across different gender and ethnicity groups, adopting a research framework informed by intersectionality theory. Our results indicate that occupational segregation persists in the digital economy, as male and female entrepreneurs tend to engage with economic activities stereotypically associating with their gender. However, we do not find the same results when comparing entrepreneurial outputs of female and male entrepreneurs of colour. Our results pave the way for more research in entrepreneurship using Natural Language Processing, textual data and analyses at the intersectional level.
    Date: 2023–05–18
  19. By: Masanori Hirano; Kentaro Imajo; Kentaro Minami; Takuya Shimada
    Abstract: Deep hedging is a framework for hedging derivatives in the presence of market frictions. In this study, we focus on the problem of hedging a given target option by using multiple options. To extend the deep hedging framework to this setting, the options used as hedging instruments also have to be priced during training. While one might use classical pricing model such as the Black-Scholes formula, ignoring frictions can offer arbitrage opportunities which are undesirable for deep hedging learning. The goal of this study is to develop a nested deep hedging method. That is, we develop a fully-deep approach of deep hedging in which the hedging instruments are also priced by deep neural networks that are aware of frictions. However, since the prices of hedging instruments have to be calculated under many different conditions, the entire learning process can be computationally intractable. To overcome this problem, we propose an efficient learning method for nested deep hedging. Our method consists of three techniques to circumvent computational intractability, each of which reduces redundant computations during training. We show through experiments that the Black-Scholes pricing of hedge instruments can admit significant arbitrage opportunities, which are not observed when the pricing is performed by deep hedging. We also demonstrate that our proposed method successfully reduces the hedging risks compared to a baseline method that does not use options as hedging instruments.
    Date: 2023–05
  20. By: Moritz Grebe (Justus Liebig University Giessen); Sinem Kandemir (Justus Liebig University Giessen); Peter Tillmann (Justus Liebig University Giessen)
    Abstract: We assemble a data set of more that eight million German Twitter posts related to the war in Ukraine. Based on state-of-the-art methods of text analysis, we construct a daily index of uncertainty about the war as perceived by German Twitter. The approach also allows us to separate this index into uncertainty about sanctions against Russia, energy policy and other dimensions. We then estimate a VAR model with daily financial and macroeconomic data and identify an exogenous uncertainty shock. The increase in uncertainty has strong effects on financial markets and causes a significant decline in economic activity as well as an increase in expected inflation. We find the effects of uncertainty to be particularly strong in the first months of the war.
    Keywords: war, Twitter, geopolitical risk, machine learning, business cycle
    JEL: D8 E3 G1
    Date: 2023
  21. By: Hasan, Sacha; Yuan, Yingfang
    Abstract: Despite the accelerated digitalisation of social housing services, there has been a lack of focused attention to the harms that are likely to arise through the systemic inequalities encountered by minoritised ethnic (ME) communities in the UK. Within this context, we are employing an intersectional framework to underline the centrality of age to ME vulnerabilities including lack of digital literacy and proficiency in English in the access, use and outcomes of digitalised social housing services. We draw our findings from an interdisciplinary sentimental analysis of 100 interviews with ME individuals in Glasgow, Bradford, Manchester and Tower Hamlets for extracting vulnerabilities and assessing their intensities across different ME age groups, and a subsample of qualitative analysis of 21 interviews. This is to illustrate similarities and differences of sentimental analysis of these vulnerabilities between machine learning (ML) and inductive coding, offering an example for future ML supported qualitative data analysis approach in housing studies.
    Date: 2023–06–02

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.