nep-big New Economics Papers
on Big Data
Issue of 2026–04–06
eight papers chosen by
Tom Coupé, University of Canterbury


  1. Using Transformers and Reinforcement Learning as Narrative Filters in Macroeconomics* By Vegard H. Larsen; Leif Anders Thorsrud
  2. Large Language Models and Stock Investing: Is the Human Factor Required? By Ricardo Crisostomo; Diana Mykhalyuk
  3. Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting? By Alexander Eliseev; Sergei Seleznev
  4. Learning to Aggregate Zero-Shot LLM Agents for Corporate Disclosure Classification By Kemal Kirtac
  5. Central bank communication on financial stability – A shadowed sibling? By Martin, Reiner; Klacso, Jan; Mohácsi, Piroska Nagy; Evdokimova, Tatiana; Ponomarenko, Olga
  6. Designing Agentic AI-Based Screening for Portfolio Investment By Mehmet Caner; Agostino Capponi; Nathan Sun; Jonathan Y. Tan
  7. AI-Driven Demand Forecasting and Its Impact on Inventory Optimization By Abdelfatah, Omar Sharafeldin Mohamed
  8. LLM-Based Measurement of Latent Attributes in Trade Data By DiGiuseppe, Matthew; Fu, Xuelong; Flynn, Michael E

  1. By: Vegard H. Larsen; Leif Anders Thorsrud
    Abstract: Building on recent advances in Natural Language Processing and modeling of sequences, we study how a multimodal Transformer-based deep learning architecture can be used for measurement and structural narrative attribution in macroeconomics. The framework we propose combines (news) text and (macroeconomic) time series information using cross-attention mechanisms, easily incorporates differences in data frequencies and reporting delays, and can be used together with Reinforcement Learning to produce structurally coherent summaries of high-frequency news flows. Applied and tested on both simulated and real-world data out-of-sample, the results we obtain are encouraging.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:bny:wpaper:0147
  2. By: Ricardo Crisostomo; Diana Mykhalyuk
    Abstract: This paper investigates whether large language models (LLMs) can generate reliable stock market predictions. We evaluate four state-of-the-art models - ChatGPT, Gemini, DeepSeek, and Perplexity - across three prompting strategies: a naive query, a structured approach, and chain-of-thought reasoning. Our results show that LLM-generated recommendations are hindered by recurring reasoning failures, including financial misconceptions, carryover errors, and reliance on outdated or hallucinated information. When appropriately guided and supervised, LLMs demonstrate the capacity to outperform the market, but realizing LLMs' full potential requires substantial human oversight. We also find that grounding stock recommendations in official regulatory filings increases their forecasting accuracy. Overall, our findings underscore the need for robust safeguards and validation when deploying LLMs in financial markets.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.19944
  3. By: Alexander Eliseev (Bank of Russia, Russian Federation); Sergei Seleznev (Bank of Russia, Russian Federation)
    Abstract: Large language models (LLMs) are a type of machine learning tool that economists have started to apply in their empirical research. One such application is macroeconomic forecasting with backtesting of LLMs, even though they are trained on the same data that is used to estimate their forecasting performance. Can these in-sample accuracy results be extrapolated to the model’s out-of-sample performance? To answer this question, we developed a family of prompt sensitivity tests and two members of this family, which we call the fake date tests. These tests aim to detect two types of biases in LLMs’ in-sample forecasts: lookahead bias and context bias. According to the empirical results, none of the modern LLMs tested in this study passed our tests, signaling the presence of biases in their in-sample forecasts.
    Keywords: large language models, macroeconomic forecasting, lookahead bias, context bias
    JEL: C12 C52 C53
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:bkr:wpaper:wps167
  4. By: Kemal Kirtac
    Abstract: This paper studies whether a lightweight trained aggregator can combine diverse zero-shot large language model judgments into a stronger downstream signal for corporate disclosure classification. Zero-shot LLMs can read disclosures without task-specific fine-tuning, but their predictions often vary across prompts, reasoning styles, and model families. I address this problem with a multi-agent framework in which three zero-shot agents independently read each disclosure and output a sentiment label, a confidence score, and a short rationale. A logistic meta-classifier then aggregates these signals to predict next-day stock return direction. I use a sample of 18, 420 U.S. corporate disclosures issued by Nasdaq and S&P 500 firms between 2018 and 2024, matched to next-day stock returns. Results show that the trained aggregator outperforms all single agents, majority vote, confidence-weighted voting, and a FinBERT baseline. Balanced accuracy rises from 0.561 for the best single agent to 0.612 for the trained aggregator, with the largest gains in disclosures combining strong current performance with weak guidance or elevated risk. The results suggest that zero-shot LLM agents capture complementary financial signals and that supervised aggregation can turn cross-agent disagreement into a more useful classification target.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.20965
  5. By: Martin, Reiner; Klacso, Jan; Mohácsi, Piroska Nagy; Evdokimova, Tatiana; Ponomarenko, Olga
    Abstract: Central bank communication on financial stability has been less studied than on monetary policy. Our paper aims to contribute to the growing literature in this area. Our focus is the region of Central Europe, where financial sectors are intertwined through close cross-border ownership, and about half of the countries are members of the euro area. Using large language models (LLMs) combined with country-specific contextual analysis, we study executive summaries of Financial Stability Reports (FSRs) published since the early 2000s by seven Central, Eastern, and Southeastern European (CESEE) central banks, as well as by Austria and the European Central Bank (ECB). We construct a novel financial stability sentiment index and document that central bank communication is strongly risk-focused, most notably in the case of the ECB. In addition, prior to the Global Financial Crisis, the Austrian central bank was much less concerned than other central banks in the region although Austria plays a pivotal role in the financial system in the region. Our analysis of the link between financial stability sentiment communication and macroprudential policy action highlights that many central banks actively use and communicate about borrower-based measures, while most countries activated non-zero counter-cyclical capital buffers belatedly or not at all. Finally, comparing central banks’ communication on financial stability and monetary policy, we find that euro area national central banks and the ECB’s FSR communicated about the rising risks of post-Covid inflation in a timely manner, ahead of the ECB’s monetary policy communication. JEL Classification: C55, E58, E61, H12, D83
    Keywords: central banking, Central Europe, communication, euro area, European Central Bank, financial policy, macroprudential policy
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:srk:srkwps:2026154
  6. By: Mehmet Caner; Agostino Capponi; Nathan Sun; Jonathan Y. Tan
    Abstract: We introduce a new agentic artificial intelligence (AI) platform for portfolio management. Our architecture consists of three layers. First, two large language model (LLM) agents are assigned specialized tasks: one agent screens for firms with desirable fundamentals, while a sentiment analysis agent screens for firms with desirable news. Second, these agents deliberate to generate and agree upon buy and sell signals from a large portfolio, substantially narrowing the pool of candidate assets. Finally, we apply a high-dimensional precision matrix estimation procedure to determine optimal portfolio weights. A defining theoretical feature of our framework is that the number of assets in the portfolio is itself a random variable, realized through the screening process. We introduce the concept of sensible screening and establish that, under mild screening errors, the squared Sharpe ratio of the screened portfolio consistently estimates its target. Empirically, our method achieves superior Sharpe ratios relative to an unscreened baseline portfolio and to conventional screening approaches, evaluated on S&P 500 data over the period 2020--2024.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.23300
  7. By: Abdelfatah, Omar Sharafeldin Mohamed
    Abstract: This research article investigates the transformative impact of Artificial Intelligence (AI) and Machine Learning (ML) on demand forecasting and subsequent inventory optimization. Utilizing a mixed-methods approach—including a survey of 204 supply chain professionals and 22 executive interviews—the study quantifies how advanced models like LSTM, XGBoost, and ensemble methods outperform traditional statistical approaches (e.g., ARIMA, Exponential Smoothing). Key findings include: A 31.2% average reduction in Mean Absolute Percentage Error (MAPE) across the sample. Significant downstream improvements: 24.7% increase in inventory turnover and a 19.4% reduction in safety stock. Identification of model sophistication, data richness, and integration depth as primary predictors of success. The paper introduces a three-stage AI Forecasting Maturity Model and the AI Forecasting–Inventory Performance (AFIP) framework to guide practitioners in transitioning from basic statistical augmentation to probabilistic AI optimization.
    Date: 2026–03–21
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:uw57j_v1
  8. By: DiGiuseppe, Matthew (Leiden University); Fu, Xuelong; Flynn, Michael E (Kansas State University)
    Abstract: Trade data are available at a high level of disaggregation, allowing scholars to examine flows of highly specific goods. Yet the sheer number of goods classifications (5, 000+) makes it difficult to analyze trade flows and tariff policy at a mid-level of aggregation beyond a few existing categorizations. Here, we outline a method that can scale---not merely classify---traded goods on researcher-defined dimensions that are orthogonal to existing classification schemes. We propose that the embedded knowledge in large language models (LLMs) can be used to conduct pairwise comparisons (PWCs) of Harmonized System (HS) product descriptions by determining their relative proximity to a specific concept. A Bayesian Bradley--Terry model then uses these PWCs to place individual items on a latent scale of interest. These estimates and their associated uncertainty can then be used for downstream descriptive or causal analysis.
    Date: 2026–03–27
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:t8wdg_v1

This nep-big issue is ©2026 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.