nep-for New Economics Papers
on Forecasting
Issue of 2026–04–06
nineteen papers chosen by
Malte Knüppel, Deutsche Bundesbank


  1. Forecast collapse of transformer-based models under squared loss in financial time series By Pierre Andreoletti
  2. A Controlled Comparison of Deep Learning Architectures for Multi-Horizon Financial Forecasting: Evidence from 918 Experiments By Nabeel Ahmad Saidd
  3. Network Structure in UK Payment Flows: Evidence on Economic Interdependencies and Implications for Real-Time Measurement By Aditya Humnabadkar
  4. Fake Date Tests: Can We Trust In-sample Accuracy of LLMs in Macroeconomic Forecasting? By Alexander Eliseev; Sergei Seleznev
  5. Regularized Random Subspace Regressions* By Yilin Xiao; Jamie L. Cross
  6. AI-Driven Demand Forecasting and Its Impact on Inventory Optimization By Abdelfatah, Omar Sharafeldin Mohamed
  7. Beating the "Pros" with a Semi-structural Model of their own Inflation Forecasts By Sergio Lago Alves; Waldyr Areosa; Carlos Carvalho
  8. Do Prediction Markets Forecast Cryptocurrency Volatility? Evidence from Kalshi Macro Contracts By Hardhik Mohanty; Bhaskar Krishnamachari
  9. Modeling and Forecasting Tail Risk Spillovers: A Component-Based CAViaR Approach By Demetrio Lacava
  10. GARP-EFM: Improving Foundation Models with Revealed Preference Structure By Victor H. Aguiar; Nail Kashaev
  11. Forecasting duration in high-frequency financial data using a self-exciting flexible residual point process By Kyungsub Lee
  12. Multivariate GARCH and portfolio variance prediction: A forecast reconciliation perspective By Massimiliano Caporin; Daniele Girolimetto; Emanuele Lopetuso
  13. Nowcasting Growth Using the Bayesian Structural Time Series Model By Ms. Sunwoo Lee
  14. Large Language Models and Stock Investing: Is the Human Factor Required? By Ricardo Crisostomo; Diana Mykhalyuk
  15. Forecasting Out-of-Time Credit Scoring Model Risk By Valter T. Yoshida Jr.; Rafael Schiozer; Alan de Genaro; Toni R.E. dos Santos
  16. Deflating Bank Transaction Data for GDP Nowcasting: Whether and How to Use Inflation Lags By Kris Boudt; Arno De Block; Feliciaan De Palmenaer; Elsa Laura Verbeken
  17. Does speculation in futures markets improve commodity hedging decisions? By A. Fernandez-Perez; A.-M. Fuertes; J. Miffre
  18. An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination By Minchul Shin
  19. Anomaly prediction in XRP price with topological features By Illia Donhauzer; Pierluigi Cesana; Tomoyuki Shirai; Yuichi Ikeda

  1. By: Pierre Andreoletti (IDP)
    Abstract: We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.00064
  2. By: Nabeel Ahmad Saidd
    Abstract: Multi-horizon price forecasting is central to portfolio allocation, risk management, and algorithmic trading, yet deep learning architectures have proliferated faster than rigorous financial benchmarks can evaluate them. This study provides a controlled comparison of nine architectures (Autoformer, DLinear, iTransformer, LSTM, ModernTCN, N-HiTS, PatchTST, TimesNet, and TimeXer) spanning Transformer, MLP, CNN, and RNN families across cryptocurrency, forex, and equity index markets at 4-hour and 24-hour horizons. A total of 918 experiments were conducted under a strict five-stage protocol including fixed-seed Bayesian hyperparameter optimization, configuration freezing per asset class, multi-seed retraining, uncertainty aggregation, and statistical validation. ModernTCN achieves the best mean rank (1.333) with a 75 percent first-place rate, followed by PatchTST (2.000). Results reveal a clear three-tier ranking structure and show that architecture explains nearly all performance variance, while seed randomness is negligible. Rankings remain stable across horizons despite 2 to 2.5 times error amplification. Directional accuracy remains near 50 percent across all configurations, indicating that MSE-trained models lack directional skill at hourly resolution. The findings highlight the importance of architectural inductive bias over raw parameter count and provide reproducible guidance for multi-step financial forecasting.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.16886
  3. By: Aditya Humnabadkar
    Abstract: Network analysis of inter-industry payment flows reveals structural economic relationships invisible to traditional bilateral measurement approaches, with significant implications for real-time economic monitoring. Analysing 532, 346 UK payment records (2017--2024) across 89 industry sectors, we demonstrate that graph-theoretic features which include centrality measures and clustering coefficients improve payment flow forecasting by 8.8 percentage points beyond traditional time-series methods. Critically, network features prove most valuable during economic disruptions: during the COVID-19 pandemic, when traditional forecasting accuracy collapsed (R2} falling from 0.38 to 0.19), network-enhanced models maintained substantially better performance, with network contributions reaching +13.8 percentage points. The analysis identifies Financial Services, Wholesale Trade, and Professional Services as structurally central industries whose network positions indicate systemic importance beyond their transaction volumes. Network density increased 12.5\% over the sample period, with visible disruption during 2020 followed by recovery exceeding pre-pandemic integration levels. These findings suggest payment network monitoring could enhance official statistics production by providing leading indicators of structural economic change and improving nowcasting accuracy during periods when traditional temporal patterns prove unreliable.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.02068
  4. By: Alexander Eliseev (Bank of Russia, Russian Federation); Sergei Seleznev (Bank of Russia, Russian Federation)
    Abstract: Large language models (LLMs) are a type of machine learning tool that economists have started to apply in their empirical research. One such application is macroeconomic forecasting with backtesting of LLMs, even though they are trained on the same data that is used to estimate their forecasting performance. Can these in-sample accuracy results be extrapolated to the model’s out-of-sample performance? To answer this question, we developed a family of prompt sensitivity tests and two members of this family, which we call the fake date tests. These tests aim to detect two types of biases in LLMs’ in-sample forecasts: lookahead bias and context bias. According to the empirical results, none of the modern LLMs tested in this study passed our tests, signaling the presence of biases in their in-sample forecasts.
    Keywords: large language models, macroeconomic forecasting, lookahead bias, context bias
    JEL: C12 C52 C53
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:bkr:wpaper:wps167
  5. By: Yilin Xiao; Jamie L. Cross
    Abstract: We propose a new class of Regularized Random Subspace Regressions (RRSRs) that combine the variance reduction benefits of regularized estimators with the nonlinearities of random subspace ensembles. The approach introduces regularization in the selection of predictor subspaces, coefficient estimation within each subspace, or in both, yielding a flexible family of models that nest both RSR and standard penalized regressions as special cases. Using the FRED-MD database as a large predictor space, we show that RRSRs consistently outperform traditional RSR and several widely used econometric and machine learning benchmarks when forecasting four key macroeconomic indicators: inflation, output, unemployment, and the federal funds rate. The most systematic gains arise from the double-regularized specification, underscoring the value of applying shrinkage jointly to subspace selection and coefficient estimation.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:bny:wpaper:0146
  6. By: Abdelfatah, Omar Sharafeldin Mohamed
    Abstract: This research article investigates the transformative impact of Artificial Intelligence (AI) and Machine Learning (ML) on demand forecasting and subsequent inventory optimization. Utilizing a mixed-methods approach—including a survey of 204 supply chain professionals and 22 executive interviews—the study quantifies how advanced models like LSTM, XGBoost, and ensemble methods outperform traditional statistical approaches (e.g., ARIMA, Exponential Smoothing). Key findings include: A 31.2% average reduction in Mean Absolute Percentage Error (MAPE) across the sample. Significant downstream improvements: 24.7% increase in inventory turnover and a 19.4% reduction in safety stock. Identification of model sophistication, data richness, and integration depth as primary predictors of success. The paper introduces a three-stage AI Forecasting Maturity Model and the AI Forecasting–Inventory Performance (AFIP) framework to guide practitioners in transitioning from basic statistical augmentation to probabilistic AI optimization.
    Date: 2026–03–21
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:uw57j_v1
  7. By: Sergio Lago Alves; Waldyr Areosa; Carlos Carvalho
    Abstract: Professional ináation forecasts contain valuable information but exhibit information frictions. We extract improved forecasts by explicitly modeling these frictions using US Survey of Professional Forecasters data, and find that forecast rigidity increases systematically with horizon, rising from near zero for backcasts to 0.81 beyond two quarters. In pseudo-real-time tests, our Resetting Nowcasts reduce mean squared errors by 50 percent relative to SPF averages. We derive a novel theoretical criterion showing that improved forecasts dominate when disagreement lies within an optimal interval determined by simple su¢ cient statistics, easily computable from any survey microdata. The criterion determines in advance the horizons where improved forecasts should dominate, without estimating friction parameters. This generalizes easily to other surveys and variables, providing a tractable, method for identifying which forecast horizons o§er the greatest potential for improvement.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:bcb:wpaper:643
  8. By: Hardhik Mohanty; Bhaskar Krishnamachari
    Abstract: Daily probability changes in Kalshi macro prediction markets forecast cryptocurrency realized volatility through two distinct channels. The monetary policy channel, measured by Fed rate repricing on KXFED contracts, predicts Bitcoin volatility in sample with t = 3.63 and p
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.01431
  9. By: Demetrio Lacava
    Abstract: This paper introduces a new extension of the Conditional Autoregressive Value at Risk (CAViaR) model aimed at improving tail risk forecasting across assets. The proposed component-based model, CAViaR with Spillover Effects (CAViaR-SE), decomposes the conditional Value at Risk into a proper-risk component and a spillover component driven by a linear combination of tail risks from influential assets. These assets are selected via a recursive partial correlation algorithm, allowing multiple spillover sources with minimal parameterization. The spillover component acts as a predictable quantile shifter, directly affecting the conditional quantile dynamics rather than the volatility scale. Empirical results on Dow Jones Industrial Average stocks show that spillover effects account for a substantial share of total tail risk and significantly improve out-of-sample tail risk forecasts. Backtesting procedures, together with Model Confidence Set (MCS) analysis, confirm that CAViaR-SE provides well-calibrated risk measures and statistically superior forecasts compared to standard and augmented CAViaR models.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.25217
  10. By: Victor H. Aguiar; Nail Kashaev
    Abstract: Modern pretrained time-series foundation models can forecast without task-specific training, but they do not fully incorporate economic behavior. We show that teaching them basic economic logic improves how they predict demand using an experimental panel. We fine-tune Amazon Chronos-2, a transformer-based probabilistic time-series model, on synthetic data generated from utility-maximizing agents. We exploit Afriat's theorem, which guarantees that demand satisfies the Generalized Axiom of Revealed Preference (GARP) if and only if it can be generated by maximizing some utility function subject to a budget constraint. GARP is a simple condition to check that allows us to generate time series from a large class of utilities efficiently. The fine-tuned model serves as a rationality-constrained forecasting prior: it learns price-quantity relations from GARP-consistent synthetic histories and then uses those relations to predict the choices of real consumers. We find that fine-tuning on GARP-consistent synthetic data substantially improves prediction relative to zero-shot Chronos-2 at all forecast horizons we study. Our results show that economic theory can be used to generate structured synthetic data that improves foundation-model predictions when the theory implies observable patterns in the data.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.23993
  11. By: Kyungsub Lee
    Abstract: This paper presents a method for forecasting limit order book durations using a self-exciting flexible residual point process. High-frequency events in modern exchanges exhibit heavy-tailed interarrival times, posing a significant challenge for accurate prediction. The proposed approach incorporates the empirical distributional features of interarrival times while preserving the self-exciting and decay structure. This work also examines the stochastic stability of the process, which can be interpreted as a general state-space Markov chain. Under suitable conditions, the process is irreducible, aperiodic, positive Harris recurrent, and has a stationary distribution. An empirical study demonstrates that the model achieves strong predictive performance compared with several alternative approaches when forecasting durations in ultra-high-frequency trading data.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.00346
  12. By: Massimiliano Caporin; Daniele Girolimetto; Emanuele Lopetuso
    Abstract: We assess the advantage of combining univariate and multivariate portfolio risk forecasts with the aid of forecast reconciliation techniques. In our analyzes, we assume knowledge of portfolio weights, a standard for portfolio risk management applications. With an extensive simulation experiment, we show that, if the true covariance is known, forecast reconciliation improves over a standard multivariate approach, in particular when the adopted multivariate model is misspecified. However, if noisy proxies are used, correctly specified models and the misspecified ones (for instance, neglecting spillovers) turn out to be, in several cases, indistinguishable, with forecast reconciliation still providing improvements. The noise in the covariance proxy plays a crucial role in determining the improvement of both the forecast reconciliation and the correct model specification. An empirical analysis shows how forecast reconciliation can be adopted with real data to improve traditional GARCH-based portfolio variance forecasts.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.17463
  13. By: Ms. Sunwoo Lee
    Abstract: In light of recent global shocks and rising external volatility, there is a growing need to effectively monitor short-term economic fluctuations, especially in countries with limited access to high-frequency growth data. This paper examines the application of the Bayesian Structural Time Series (BSTS) model to the case of nowcasting quarterly economic growth in Tanzania, leveraging a range of high-frequency economic indicators. The BSTS model provides a flexible framework that incorporates trends, seasonal variations, and regression effects, while its spike-and-slab variable selection helps identify relevant indicators. This paper outlines a framework for model selection and evaluation, including robustness checks and sensitivity analysis, and demonstrate the model’s relative performance. Additionally, the model’s capacity to adapt to longer forecast horizons and dynamic regressors enhances its utility for understanding growth trends in changing economic environments.
    Keywords: Nowcasting; Bayesian models; economic activity; GDP; low-income countries
    Date: 2026–03–20
    URL: https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/049
  14. By: Ricardo Crisostomo; Diana Mykhalyuk
    Abstract: This paper investigates whether large language models (LLMs) can generate reliable stock market predictions. We evaluate four state-of-the-art models - ChatGPT, Gemini, DeepSeek, and Perplexity - across three prompting strategies: a naive query, a structured approach, and chain-of-thought reasoning. Our results show that LLM-generated recommendations are hindered by recurring reasoning failures, including financial misconceptions, carryover errors, and reliance on outdated or hallucinated information. When appropriately guided and supervised, LLMs demonstrate the capacity to outperform the market, but realizing LLMs' full potential requires substantial human oversight. We also find that grounding stock recommendations in official regulatory filings increases their forecasting accuracy. Overall, our findings underscore the need for robust safeguards and validation when deploying LLMs in financial markets.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.19944
  15. By: Valter T. Yoshida Jr.; Rafael Schiozer; Alan de Genaro; Toni R.E. dos Santos
    Abstract: This paper addresses the challenge of forecasting the best-performing credit scoring model in outof-time settings, focusing on the decision between segmented (bank-specific) and full data (financial system-wide) models. Building upon the Credit Scoring Model Risk (CSMR) metric, defined as one minus the correlation between observed defaults and predicted scores, we highlight the instability of in-sample performance measures when applied to evolving loan portfolios and changing macroeconomic conditions. We propose three complementary approaches to predict out-of-time model performance: (i) an analytical method based on Copas shrinkage concept utilizing estimated covariances and prediction variances; (ii) a Monte Carlo simulation leveraging average model predictions to simulate default events; and (iii) a Bayesian estimation framework for covariances grounded in conditional expectations of predictions given default. Empirical analysis using a large Brazilian loan dataset reveals that segmented models outperform full data models in in-sample contexts but not consistently out-of-time. Among the approaches, the Monte Carlo simulation achieved the highest accuracy (70.8%) in forecasting the superior out-of-time model, followed by the Bayesian method (66.7%) and the analytical shrinkage approach (54.2%). The study underscores the importance of considering population shifts via the Population Stability Index (PSI) to detect model decalibration and overfitting. The proposed methodologies offer practitioners and regulators practical tools for informed model selection, enhancing predictive reliability over time amid portfolio and economic dynamics.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:bcb:wpaper:645
  16. By: Kris Boudt; Arno De Block; Feliciaan De Palmenaer; Elsa Laura Verbeken (-)
    Abstract: Bank transaction data are increasingly used to nowcast real GDP growth due to their high frequency and broad coverage. A key challenge is the choice of an appropriate price deflator to transform nominal transaction values into real terms, as transaction values reflect invoiced amounts that are observed with a delay and based on prices quoted in earlier periods. This timing mismatch complicates the use of contemporaneous inflation measures. We find that using one-quarter lagged inflation, in particular, of the GDP deflator, and of an equally weighted estimate of the first lags of price indices, consistently outperforms the benchmark model that does not adjust for inflation and models using contemporaneous inflation, across different settings and periods. At its best, the model using the one-quarter lag of the GDP deflator outperforms the benchmark in 68% of cases and achieves a maximum RMSFE reduction of 5.5%. The equally weighted prediction of models using the one-quarter lag of price indices, improves the benchmark in 54% of cases and attains a maximum RMSFE reduction of 3.78%. These findings suggest that relying on the most recent inflation data or nowcasting delayed figures like the GDP deflator may be unnecessary or even counterproductive, as lagged inflation data often offer more stable and informative signals for real-time analysis.
    URL: https://d.repec.org/n?u=RePEc:rug:rugwps:26/1139
  17. By: A. Fernandez-Perez; A.-M. Fuertes; J. Miffre (Audencia Business School)
    Abstract: This paper presents a comprehensive analysis of traditional versus selective hedging strategies in commodity futures markets. Traditional hedging aims solely to reduce spot price risk, while selective hedging also seeks to enhance returns by predicting movements in commodity futures prices. We construct selective hedges using a range of forecasting techniques, from simple historical averages to advanced machine learning models, and evaluate their performance based on the expected mean-variance utility of hedge portfolio returns. Out-of-sample results for 24 commodities do not favor selective hedging over traditional hedging, as the former increases risk without delivering additional returns. These findings are robust across various hedge reformulations, expanding estimation windows, and rebalancing frequencies.
    Keywords: Commodity futures markets, Expected utility, Selective hedging, Traditional hedging
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-05563835
  18. By: Minchul Shin
    Abstract: AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.17381
  19. By: Illia Donhauzer; Pierluigi Cesana; Tomoyuki Shirai; Yuichi Ikeda
    Abstract: The aim of this research is to study XRP cryptoasset price dynamics, with a particular focus on forecasting atypical price movements. Recent studies suggest that topological properties of transaction graphs are highly informative for understanding cryptocurrency price behavior. In this work, we show that specific topological properties of the XRP transaction graphs provide important information about extreme XRP price surges, and can be used for more competitive prediction of anomalous price dynamics.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2603.18021

This nep-for issue is ©2026 by Malte Knüppel. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.