nep-big New Economics Papers
on Big Data
Issue of 2024‒03‒25
fifteen papers chosen by
Tom Coupé, University of Canterbury


  1. Extending the Scope of Inference About Predictive Ability to Machine Learning Methods By Juan Carlos Escanciano; Ricardo Parra
  2. On the Potential of Network-Based Features for Fraud Detection By Catayoun Azarm; Erman Acar; Mickey van Zeelt
  3. CNN-DRL with Shuffled Features in Finance By Sina Montazeri; Akram Mirzaeinia; Amir Mirzaeinia
  4. CaT-GNN: Enhancing Credit Card Fraud Detection via Causal Temporal Graph Neural Networks By Yifan Duan; Guibin Zhang; Shilong Wang; Xiaojiang Peng; Wang Ziqi; Junyuan Mao; Hao Wu; Xinke Jiang; Kun Wang
  5. Cross-Temporal Forecast Reconciliation at Digital Platforms with Machine Learning By Jeroen Rombouts; Marie Ternes; Ines Wilms
  6. MDGNN: Multi-Relational Dynamic Graph Neural Network for Comprehensive and Dynamic Stock Investment Prediction By Hao Qian; Hongting Zhou; Qian Zhao; Hao Chen; Hongxiang Yao; Jingwei Wang; Ziqi Liu; Fei Yu; Zhiqiang Zhang; Jun Zhou
  7. Novelty in Content Creation: Experimental Results Using Image Recognition on a Large Social Network By Huang, Justin; Kaul, Rupali; Narayanan, Sridhar
  8. Causal and Consequences of Multiple Dismissals: Evidence from Italian Football League By Kaori Narita; J.D. Tena; Babatunde Buraimo
  9. Machine Learning and Data-Driven Approaches in Spatial Statistics: A Case Study of Housing Price Estimation By Sarah Soleiman; Julien Randon-Furling; Marie Cottrell
  10. Cyber risk and the cross-section of stock returns By Daniel Celeny; Lo\"ic Mar\'echal
  11. Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal By Athey, Susan; Keleher, Niall; Spiess, Jann
  12. Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying By Andrea Macr\`i; Fabrizio Lillo
  13. Quantifying neural network uncertainty under volatility clustering By Steven Y. K. Wong; Jennifer S. K. Chan; Lamiae Azizi
  14. The Impact of Monetary and Fiscal Stimulus on Stock Returns During the COVID-19 Pandemic By Chinmaya Behera; Badri Narayan Rath; Pramod Kumar Mishra
  15. Introducing Textual Measures of Central Bank Policy-Linkages Using ChatGPT By Leek, Lauren Caroline; Bischl, Simeon; Freier, Maximilian

  1. By: Juan Carlos Escanciano; Ricardo Parra
    Abstract: Though out-of-sample forecast evaluation is systematically employed with modern machine learning methods and there exists a well-established classic inference theory for predictive ability, see, e.g., West (1996, Asymptotic Inference About Predictive Ability, \textit{Econometrica}, 64, 1067-1084), such theory is not directly applicable to modern machine learners such as the Lasso in the high dimensional setting. We investigate under which conditions such extensions are possible. Two key properties for standard out-of-sample asymptotic inference to be valid with machine learning are (i) a zero-mean condition for the score of the prediction loss function; and (ii) a fast rate of convergence for the machine learner. Monte Carlo simulations confirm our theoretical findings. For accurate finite sample inferences with machine learning, we recommend a small out-of-sample vs in-sample size ratio. We illustrate the wide applicability of our results with a new out-of-sample test for the Martingale Difference Hypothesis (MDH). We obtain the asymptotic null distribution of our test and use it to evaluate
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.12838&r=big
  2. By: Catayoun Azarm; Erman Acar; Mickey van Zeelt
    Abstract: Online transaction fraud presents substantial challenges to businesses and consumers, risking significant financial losses. Conventional rule-based systems struggle to keep pace with evolving fraud tactics, leading to high false positive rates and missed detections. Machine learning techniques offer a promising solution by leveraging historical data to identify fraudulent patterns. This article explores using the personalised PageRank (PPR) algorithm to capture the social dynamics of fraud by analysing relationships between financial accounts. The primary objective is to compare the performance of traditional features with the addition of PPR in fraud detection models. Results indicate that integrating PPR enhances the model's predictive power, surpassing the baseline model. Additionally, the PPR feature provides unique and valuable information, evidenced by its high feature importance score. Feature stability analysis confirms consistent feature distributions across training and test datasets.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.09495&r=big
  3. By: Sina Montazeri; Akram Mirzaeinia; Amir Mirzaeinia
    Abstract: In prior methods, it was observed that the application of Convolutional Neural Networks agent in Deep Reinforcement Learning to financial data resulted in an enhanced reward. In this study, a specific permutation was applied to the feature vector, thereby generating a CNN matrix that strategically positions more pertinent features in close proximity. Our comprehensive experimental evaluations unequivocally demonstrate a substantial enhancement in reward attainment.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.03338&r=big
  4. By: Yifan Duan; Guibin Zhang; Shilong Wang; Xiaojiang Peng; Wang Ziqi; Junyuan Mao; Hao Wu; Xinke Jiang; Kun Wang
    Abstract: Credit card fraud poses a significant threat to the economy. While Graph Neural Network (GNN)-based fraud detection methods perform well, they often overlook the causal effect of a node's local structure on predictions. This paper introduces a novel method for credit card fraud detection, the \textbf{\underline{Ca}}usal \textbf{\underline{T}}emporal \textbf{\underline{G}}raph \textbf{\underline{N}}eural \textbf{N}etwork (CaT-GNN), which leverages causal invariant learning to reveal inherent correlations within transaction data. By decomposing the problem into discovery and intervention phases, CaT-GNN identifies causal nodes within the transaction graph and applies a causal mixup strategy to enhance the model's robustness and interpretability. CaT-GNN consists of two key components: Causal-Inspector and Causal-Intervener. The Causal-Inspector utilizes attention weights in the temporal attention mechanism to identify causal and environment nodes without introducing additional parameters. Subsequently, the Causal-Intervener performs a causal mixup enhancement on environment nodes based on the set of nodes. Evaluated on three datasets, including a private financial dataset and two public datasets, CaT-GNN demonstrates superior performance over existing state-of-the-art methods. Our findings highlight the potential of integrating causal reasoning with graph neural networks to improve fraud detection capabilities in financial transactions.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.14708&r=big
  5. By: Jeroen Rombouts; Marie Ternes; Ines Wilms
    Abstract: Platform businesses operate on a digital core and their decision making requires high-dimensional accurate forecast streams at different levels of cross-sectional (e.g., geographical regions) and temporal aggregation (e.g., minutes to days). It also necessitates coherent forecasts across all levels of the hierarchy to ensure aligned decision making across different planning units such as pricing, product, controlling and strategy. Given that platform data streams feature complex characteristics and interdependencies, we introduce a non-linear hierarchical forecast reconciliation method that produces cross-temporal reconciled forecasts in a direct and automated way through the use of popular machine learning methods. The method is sufficiently fast to allow forecast-based high-frequency decision making that platforms require. We empirically test our framework on a unique, large-scale streaming dataset from a leading on-demand delivery platform in Europe.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.09033&r=big
  6. By: Hao Qian; Hongting Zhou; Qian Zhao; Hao Chen; Hongxiang Yao; Jingwei Wang; Ziqi Liu; Fei Yu; Zhiqiang Zhang; Jun Zhou
    Abstract: The stock market is a crucial component of the financial system, but predicting the movement of stock prices is challenging due to the dynamic and intricate relations arising from various aspects such as economic indicators, financial reports, global news, and investor sentiment. Traditional sequential methods and graph-based models have been applied in stock movement prediction, but they have limitations in capturing the multifaceted and temporal influences in stock price movements. To address these challenges, the Multi-relational Dynamic Graph Neural Network (MDGNN) framework is proposed, which utilizes a discrete dynamic graph to comprehensively capture multifaceted relations among stocks and their evolution over time. The representation generated from the graph offers a complete perspective on the interrelationships among stocks and associated entities. Additionally, the power of the Transformer structure is leveraged to encode the temporal evolution of multiplex relations, providing a dynamic and effective approach to predicting stock investment. Further, our proposed MDGNN framework achieves the best performance in public datasets compared with state-of-the-art (SOTA) stock investment methods.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.06633&r=big
  7. By: Huang, Justin (U of Michigan); Kaul, Rupali (INSEAD); Narayanan, Sridhar (Stanford U)
    Abstract: Social networks utilize award recognition and front pages to motivate user content creation, facilitate consumer discovery of content, and provide attention and recognition to the best content. Past research shows that such attention and recognition increase the volume of content shared on the networks. But how do these affect the nature of content shared on platforms? Do they cause creators to share content similar to that which received attention and recognition? Or do creators take risks and create different content? We investigate these questions in the context of a digital art-sharing social network. We implement a randomized controlled experiment to induce exogenous variation in attention and recognition provided to users' content. Using a transfer learning machine learning algorithm, we convert complex images into lower-level features to analyze changes in content novelty. We find that awarded creators produce more novel content, relative to both the awarded content and their past work. This result is robust to a variety of ways in which we classify image content. Our results illustrate the importance of tools that induce attention and recognition to the creation and development of diverse content by social media creators and give insights into factors that motivate content creation.
    Date: 2023–11
    URL: http://d.repec.org/n?u=RePEc:ecl:stabus:4040&r=big
  8. By: Kaori Narita; J.D. Tena; Babatunde Buraimo
    Abstract: Previous research in leadership succession focuses on establishing whether such an event has a positive impact on the subsequent performance of an organisation. However, factors that can affect the effectiveness of leadership change are not well understood. The aim of this study is to identify the causes of first and second within-season head coach dismissals and estimate the impact of the two types of dismissal on field performance using data from the Italian professional football league (Serie A). We employ inverse propensity score weighting together with machine learning techniques in order to mitigate selection bias. Our analysis shows that the determinants of the two decisions are not identical in that the second replacement is likely to be taken with greater caution. Despite this, we find some positive effects of first dismissals on subsequent performance, whilst the second dismissals do not appear to make any difference. These findings suggest that frequent changes in leadership are not favourable options even when a recent replacement has not improved the situation. This could be because the potential benefit of leadership replacement may be counteracted by disruptive effects, or a source of underperformance may lie elsewhere rather than a manager.
    Keywords: leadership succession, machine learning, inverse propensity score weighting, football managers
    JEL: J63 M51 Z22
    Date: 2022–11
    URL: http://d.repec.org/n?u=RePEc:liv:livedp:202226&r=big
  9. By: Sarah Soleiman (SAMM - Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) - UP1 - Université Paris 1 Panthéon-Sorbonne); Julien Randon-Furling (SAMM - Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) - UP1 - Université Paris 1 Panthéon-Sorbonne); Marie Cottrell (SAMM - Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne) - UP1 - Université Paris 1 Panthéon-Sorbonne)
    Date: 2022–07–06
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-03900972&r=big
  10. By: Daniel Celeny; Lo\"ic Mar\'echal
    Abstract: We extract firms' cyber risk with a machine learning algorithm measuring the proximity between their disclosures and a dedicated cyber corpus. Our approach outperforms dictionary methods, uses full disclosure and not devoted-only sections, and generates a cyber risk measure uncorrelated with other firms' characteristics. We find that a portfolio of US-listed stocks in the high cyber risk quantile generates an excess return of 18.72\% p.a. Moreover, a long-short cyber risk portfolio has a significant and positive risk premium of 6.93\% p.a., robust to all factors' benchmarks. Finally, using a Bayesian asset pricing method, we show that our cyber risk factor is the essential feature that allows any multi-factor model to price the cross-section of stock returns.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.04775&r=big
  11. By: Athey, Susan (Stanford U); Keleher, Niall (Innovations for Poverty Action, New Haven); Spiess, Jann (Stanford U)
    Abstract: In many settings, interventions may be more effective for some individuals than others, so that targeting interventions may be beneficial. We analyze the value of targeting in the context of a large-scale field experiment with over 53, 000 college students, where the goal was to use "nudges" to encourage students to renew their financial-aid applications before a non-binding deadline. We begin with baseline approaches to targeting. First, we target based on a causal forest that estimates heterogeneous treatment effects and then assigns students to treatment according to those estimated to have the highest treatment effects. Next, we evaluate two alternative targeting policies, one targeting students with low predicted probability of renewing financial aid in the absence of the treatment, the other targeting those with high probability. The predicted baseline outcome is not the ideal criterion for targeting, nor is it a priori clear whether to prioritize low, high, or intermediate predicted probability. Nonetheless, targeting on low baseline outcomes is common in practice, for example because the relationship between individual characteristics and treatment effects is often difficult or impossible to estimate with historical data. We propose hybrid approaches that incorporate the strengths of both predictive approaches (accurate estimation) and causal approaches (correct criterion); we show that targeting intermediate baseline outcomes is most effective, while targeting based on low baseline outcomes is detrimental. In one year of the experiment, nudging all students improved early filing by an average of 6.4 percentage points over a baseline average of 37% filing, and we estimate that targeting half of the students using our preferred policy attains around 75% of this benefit.
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:ecl:stabus:4146&r=big
  12. By: Andrea Macr\`i; Fabrizio Lillo
    Abstract: Optimal execution is an important problem faced by any trader. Most solutions are based on the assumption of constant market impact, while liquidity is known to be dynamic. Moreover, models with time-varying liquidity typically assume that it is observable, despite the fact that, in reality, it is latent and hard to measure in real time. In this paper we show that the use of Double Deep Q-learning, a form of Reinforcement Learning based on neural networks, is able to learn optimal trading policies when liquidity is time-varying. Specifically, we consider an Almgren-Chriss framework with temporary and permanent impact parameters following several deterministic and stochastic dynamics. Using extensive numerical experiments, we show that the trained algorithm learns the optimal policy when the analytical solution is available, and overcomes benchmarks and approximated solutions when the solution is not available.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.12049&r=big
  13. By: Steven Y. K. Wong; Jennifer S. K. Chan; Lamiae Azizi
    Abstract: Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles into a unified framework to deal with UQ under the presence of volatility clustering. We show that a Scale Mixture Distribution is a simpler alternative to the Normal-Inverse-Gamma prior that provides favorable complexity-accuracy trade-off. To illustrate the performance of our proposed approach, we apply it to two sets of financial time-series exhibiting volatility clustering: cryptocurrencies and U.S. equities.
    Date: 2024–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2402.14476&r=big
  14. By: Chinmaya Behera (Economics and General Management, Goa Institute of Management, Goa, India, (Corresponding author: Goa Institute of Management, Poriem, Sattari, Goa)); Badri Narayan Rath (Department of Liberal Arts, IIT Hyderabad, Kandi, Sangareddy, India); Pramod Kumar Mishra (School of Management, University of Hyderabad, Telangana, India)
    Abstract: We contribute to the literature by investing the impact of monetary and fiscal stimulus and exchange rate on stock returns during the COVID-19 pandemic in Australia, China, India, and Indonesia. By employing the machine learning approach, We find that monetary stimulus positively boosts the stock return of Indonesia. Contrary, fiscal stimulus adversely affected stock return in Australia. The exchange rate positively impacts stock return for both India and Indonesia during the COVID-19 pandemic. However, the findings from this study reveal that both monetary and fiscal stimulus have no effect on the stock market return in the case of China and India. Policymakers needs better strategy to counter the extreme events like pandemic. Our model is robust to the alternative model specification.
    Keywords: Monetary Policy, Fiscal Policy, Stock Return, Machine Learning, COVID-19
    JEL: G11 G15 G18
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:mad:wpaper:2023-247&r=big
  15. By: Leek, Lauren Caroline (European University Institute); Bischl, Simeon; Freier, Maximilian
    Abstract: While institutionally independent, monetary policy-makers do not operate in a vacuum. The policy choices of a central bank are intricately linked to government policies and financial markets. We present novel indices of monetary, fiscal and financial policy-linkages based on central bank communication, namely, speeches by 118 central banks worldwide from 1997 to mid-2023. Our indices measure not only instances of monetary, fiscal or financial dominance but, importantly, also identify communication that aims to coordinate monetary policy with the government and financial markets. To create our indices, we use a Large Language Model (ChatGPT 3.5-0301) and provide transparent prompt-engineering steps, considering both accuracy on the basis of a manually coded dataset as well as efficiency regarding token usage. We also test several model improvements and provide descriptive statistics of the trends of the indices over time and across central banks including correlations with political-economic variables.
    Date: 2024–02–14
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:78wnp&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.