nep-big New Economics Papers
on Big Data
Issue of 2026–04–20
eight papers chosen by
Tom Coupé, University of Canterbury


  1. Learning Preferences from Conjoint Data: A Structural Deep Learning Approach By Avidit Acharya; Jens Hainmueller; Yiqing Xu
  2. The Monetary Policy Statement Database: An LLM Application to Global Financial Conditions By Cory Baird; Jonathan Benchimol; Wook Sohn; Vira Vyshnevska; Iegor Vyshnevskyi
  3. Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards By Shuze Daniel Liu; Claire Chen; Jiabao Sean Xiao; Lei Lei; Yuheng Zhang; Yisong Yue; David Simchi-Levi
  4. Machine Learning Forecasting of U.S. Stock Market Volatility: The Role of Stock and Oil Bubbles By Onur Polat; Rangan Gupta; Dhanashree Somani; Sayar Karmakar
  5. The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction By Dhruvin Dungrani; Disha Dungrani
  6. Dynamic Forecasting and Temporal Feature Evolution of Stock Repurchases in Listed Companies Using Attention-Based Deep Temporal Networks By Xiang Ao; Jingxuan Zhang; Xinyu Zhao
  7. LR-Robot: An Human-in-the-Loop LLM Framework for Systematic Literature Reviews with Applications in Financial Research By Wei Wei; Jin Zheng; Zining Wang; Weibin Feng
  8. Is Bitcoin A Hedge Against Central Banking? Evidence from AI-Driven Monetary Policy Expectations By Maxime L. D. Nicolas; Fran\c{c}ois Sicard; Marion Laboure; Zixin Sun; Anah\'i Rodr\'iguez-Mart\'inez

  1. By: Avidit Acharya; Jens Hainmueller; Yiqing Xu
    Abstract: Conjoint experiments randomize multidimensional profiles, offering a powerful design for recovering structural preference parameters -- including marginal rates of substitution, willingness to pay, and the distribution of preferences across a population. Yet the dominant approach in political science has focused on nonparametric causal estimands that do not leverage this potential. We propose a structural approach that embeds a deep neural network within a random utility logit model, allowing preference parameters to vary as a fully flexible function of respondent characteristics. The neural network addresses the concern that a parametric specification may not capture the true data generating process, while double/debiased machine learning provides valid inference on average preference parameters. We apply our method to three prominent conjoint studies and find rich preference heterogeneity masked by reduced-form averages: a near-zero gender effect coexists with 83% preferring female candidates, opposition to undemocratic behavior is near-universal but varies sharply in intensity, and progressive tax preferences cut across every partisan subgroup.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.10845
  2. By: Cory Baird; Jonathan Benchimol; Wook Sohn; Vira Vyshnevska; Iegor Vyshnevskyi
    Abstract: This study introduces the Monetary Policy Statement Database (MPSD), comprising 6, 693 statements from 51 central banks worldwide (1990-2024). We develop a reproducible pipeline combining standard natural language preprocessing with large language model (LLM) tools for cross-country analysis. Four key findings emerge. First, statements lengthened substantially after the Global Financial Crisis while readability improved modestly. Second, inflation references comove across countries during global inflation episodes. Third, LLM-based question answering and aspect-based sentiment reveal that central banks attribute global financial conditions primarily to broad U.S. macroeconomic developments rather than to Federal Reserve policy actions specifically. Fourth, using a benchmark dictionary-based sentiment index and LLM-derived aspect-based sentiment indicators, Granger causality tests suggest that statement sentiment predicts the Global Financial Cycle rather than merely responding to it. The MPSD and accompanying codebase support reproducible research on monetary policy communication and international transmission.
    Keywords: central bank communication, large language models, text analysis, generative database, machine learning
    JEL: C55 C63 E52 E58 G15
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:een:camaaa:2026-25
  3. By: Shuze Daniel Liu; Claire Chen; Jiabao Sean Xiao; Lei Lei; Yuheng Zhang; Yisong Yue; David Simchi-Levi
    Abstract: The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and ultimately develops sophisticated persuasive skills. Our results demonstrate that this verifiable training allows a 30B agent to significantly outperform frontier models over ten times its size in extracting surplus. Furthermore, the trained agent generalizes robustly to stronger counterparties unseen during training and remains effective even when facing hostile, adversarial seller personas.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.09855
  4. By: Onur Polat (Institute of Informatics, Hacettepe University, Beytepe Campus, 06800 Cankaya, Ankara, Turkiye); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Dhanashree Somani (Department of Statistics, University of Florida, 230 Newell Drive, Gainesville, FL, 32601, USA); Sayar Karmakar (Department of Statistics, University of Florida, 230 Newell Drive, Gainesville, FL, 32601, USA)
    Abstract: This study examines the predictive power of multi-scale positive and negative speculative bubbles in equity and energy markets for S&P 500 realized variance across horizons from 1 to 24 months. Using a hierarchical modeling framework and machine learning estimators, the analysis evaluates whether stock and oil bubbles provide incremental information beyond macroeconomic variables and financial uncertainty. Applying Clark and West's (2007) tests for nested model comparisons, the results reveal a hierarchy in predictive content that varies by forecast horizon. At the 1-month horizon, neither stock nor oil bubbles improves forecast accuracy. At the 3-month horizon, oil bubbles emerge as the dominant predictor; the Bayesian Regularized Neural Network (BRNN) estimator achieves a statistically significant improvement when oil bubbles are included with stock bubbles, resulting in a 30.7 percent reduction in mean squared error (MSE). At the 6-month horizon, stock bubbles become more important, with both the Gradient Boosting Machine (GBM) and BRNN estimators showing significant improvements. For longer horizons, oil bubbles remain relevant, but their predictive value depends on the estimator: BRNN captures oil bubble effects at 12 months, while GBM does so at 24 months. These findings highlight the importance of horizonspecific model selection and indicate a complex transmission of speculative shocks across asset classes.
    Keywords: Stock Market Realized Variance, Stock and Oil Bubbles, Machine Learning, Forecasting
    JEL: C22 C53 G10 Q51
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:pre:wpaper:202611
  5. By: Dhruvin Dungrani; Disha Dungrani
    Abstract: In computational paralinguistics, detecting cognitive load and deception from speech signals is a heavily researched domain. Recent efforts have attempted to apply these acoustic frameworks to corporate earnings calls to predict catastrophic stock market volatility. In this study, we empirically investigate the limits of acoustic feature extraction (pitch, jitter, and hesitation) when applied to highly trained speakers in in-the-wild teleconference environments. Utilizing a two-stream late-fusion architecture, we contrast an acoustic-based stream with a baseline Natural Language Processing (NLP) stream. The isolated NLP model achieved a recall of 66.25% for tail-risk downside events. Surprisingly, integrating acoustic features via late fusion significantly degraded performance, reducing recall to 47.08%. We identify this degradation as Acoustic Camouflage, where media-trained vocal regulation introduces contradictory noise that disrupts multimodal meta-learners. We present these findings as a boundary condition for speech processing applications in high-stakes financial forecasting.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.14619
  6. By: Xiang Ao; Jingxuan Zhang; Xinyu Zhao
    Abstract: Accurately predicting stock repurchases is crucial for quantitative investment and risk management, yet traditional static models fail to capture the complex temporal dependencies of corporate financial conditions. This paper proposes a dynamic early warning system integrating economic theory with deep temporal networks. Using Chinese A-share panel data (2014-2024), we employ a hybrid Temporal Convolutional Network (TCN) and Attention-based LSTM to capture long- and short-term financial evolutionary patterns. Rolling-window cross-validation demonstrates our model significantly outperforms static baselines like Logistic Regression and XGBoost. Furthermore, utilizing Explainable AI (XAI), we reveal the temporal dynamics of repurchase decisions: prolonged "undervaluation" serves as the long-term underlying motive, while a sharp increase in "cash flow" acts as the decisive short-term trigger. This study provides a robust deep learning paradigm for financial forecasting and offers dynamic empirical support for classic corporate finance hypotheses.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.09650
  7. By: Wei Wei; Jin Zheng; Zining Wang; Weibin Feng
    Abstract: The exponential growth of financial research has rendered traditional systematic literature reviews (SLRs) increasingly impractical, as manual screening and narrative synthesis struggle to keep pace with the scale and complexity of modern scholarship. While the existing artificial intelligence (AI) and natural language processing (NLP) approaches often often produce outputs that are efficient but contextually limited, still requiring substantial expert oversight. To address these challenges, we propose LR-Robot, a novel framework in which domain experts define multidimensional classification taxonomies and prompt constraints that encode conceptual boundaries, large language models (LLMs) execute scalable classification across large corpora, and systematic human-in-the-loop evaluation ensures reliability before full-dataset deployment.The framework further leverages retrieval-augmented generation (RAG) to support downstream analyses including temporal evolution tracking and label-enhanced citation networks. We demonstrate the framework on a corpus of 12, 666 option pricing articles spanning 50 years, designing a four-dimensional taxonomy and systematically evaluating up to eleven mainstream LLMs across classification tasks of varying complexity. The results reveal the current capabilities of AI in understanding and synthesizing literature, uncover emerging trends, reveal structural research patterns, and highlight core research directions. By accelerating labor-intensive review stages while preserving interpretive accuracy, LR-Robot provides a practical, customizable, and high-quality approach for AI-assisted SLRs.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.14793
  8. By: Maxime L. D. Nicolas; Fran\c{c}ois Sicard; Marion Laboure; Zixin Sun; Anah\'i Rodr\'iguez-Mart\'inez
    Abstract: This study investigates the transmission of monetary policy narratives to Bitcoin prices, distinguishing the impact of ex-ante expectations from ex-post interest rate implementation. We introduce a high-frequency Monetary Policy Expectations (MPE) index, using a Large Language Model (LLM)-based classification of 118, 000+ market messages to achieve a precise hawkish/dovish decomposition. Results from a framework combining Long Short-Term Memory (LSTM) networks with SHapley Additive exPlanations (SHAP) indicate that Bitcoin functions as a sensitive barometer of central bank signaling; specifically, hawkish narratives consistently trigger negative price responses independently of actual Federal Funds Rate adjustments. We demonstrate that the MPE index Granger-causes Bitcoin returns at short-to-medium horizons, establishing linear predictive causality, while the LSTM-SHAP framework reveals pronounced non-linear, macroeconomic regime-dependent interactions. These findings highlight Bitcoin's structural sensitivity to global monetary discourse, establishing LLM-derived sentiment as a potent leading macroeconomic indicator for the digital asset landscape.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.08825

This nep-big issue is ©2026 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.