nep-for New Economics Papers
on Forecasting
Issue of 2026–04–20
fourteen papers chosen by
Malte Knüppel, Deutsche Bundesbank


  1. Forecasting Oil Prices Across the Distribution: A Quantile VAR Approach By Hilde C. Bjornland; Nicolas Hardy; Dimitris Korobilis
  2. Regime-Aware Specialist Routing for Volatility Forecasting By Tenghan Zhong
  3. When 3% Means Nothing: Calibrating Escalation Limits to a Bank’s Own Forecasting Error Distribution By Marcin Dec
  4. PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data By Pu Cheng; Juncheng Liu; Yunshen Long
  5. Machine Learning Forecasting of U.S. Stock Market Volatility: The Role of Stock and Oil Bubbles By Onur Polat; Rangan Gupta; Dhanashree Somani; Sayar Karmakar
  6. Target-Driven Bayesian Stacking of Realized and Implied Volatility Forecasts By Guo, Hongfei; Marín Díazaraque, Juan Miguel; Veiga, Helena
  7. Who Saw It Coming? Historical Experience and the 2021 Inflation Forecast Failure By Dalibor Stevanovic
  8. Unexpecting the Expected in Real-Time Inflation Forecasting: The Inflation Expectations Channel? By Nicolás Bonino-Gayoso; Mónica Correa-López
  9. Dynamic Forecasting and Temporal Feature Evolution of Stock Repurchases in Listed Companies Using Attention-Based Deep Temporal Networks By Xiang Ao; Jingxuan Zhang; Xinyu Zhao
  10. When Forecast Accuracy Fails: Rank Correlation and Decision Quality in Multi-Market Battery Storage Optimization By Alessandro Falezza
  11. Forecasting the Economic Effects of AI By Ezra Karger; Otto Kuusela; Jason Abaluck; Kevin A. Bryan; Basil Halperin; Todd R. Jones; Connacher Murphy; Philip Trammell; Matt Reynolds; Dan Mayland; Ria Viswanathan; Ananaya Mittal; Rebecca Ceppas de Castro; Josh Rosenberg; Philip Tetlock
  12. Which Voices Move Markets? Speaker Identity and the Cross-Section of Post-Earnings Returns By Karmanpartap Singh Sidhu; Junyi Fan; Maryam Pishgar
  13. Nowcasting and Forecasting Russian Regional CPI: Sparse Models and the Time-Varying Value of Online Data By Fantazzini, Dean; Kurbatskii, Alexey
  14. Exploiting Heterogeneity in the Survey of Professional Forecasters By Tae-Hwy Lee; Saerom Lee

  1. By: Hilde C. Bjornland; Nicolas Hardy; Dimitris Korobilis
    Abstract: We develop a Quantile Bayesian Vector Autoregression (QBVAR) to forecast real oil prices across different quantiles of the conditional distribution. The model allows predictor effects to vary across quantiles, capturing asymmetries that standard mean-focused approaches miss. Using monthly data from 1975 to 2025, we document three findings. First, the QBVAR improves median forecasts by 2-5\% relative to Bayesian VARs, demonstrating that quantile-specific dynamics matter even for point prediction. Second, uncertainty and financial condition variables strongly predict downside risk, with left-tail forecast improvements of 10-25\% that intensify during crisis episodes. Third, right-tail forecasting remains difficult; stochastic volatility models dominate for upside risk, though forecast combinations that include the QBVAR recover these losses. The results show that modeling the conditional distribution yields substantial gains for tail risk assessment, particularly during major oil market disruptions.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.12927
  2. By: Tenghan Zhong
    Abstract: Volatility forecasting becomes challenging when market conditions change and model performance varies across regimes. Motivated by this instability, we develop a regime-aware specialist routing framework for ETF volatility forecasting. The framework uses online risk-sensitive evaluation and state-dependent gating to combine different forecasting specialists across calm and stressed market states. Using a daily panel of six ETFs under a rolling walk-forward design, we find that the strongest forecaster is regime-dependent rather than global. Relative to the rolling-best baseline, the proposed routing framework reduces high-volatility forecast loss by about 24\% and underprediction loss by about 22\%. These results suggest that specialist routing provides a practical adaptive forecasting architecture for changing market conditions.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.10402
  3. By: Marcin Dec (Group for Research in Applied Economics (GRAPE))
    Abstract: Forecasting accuracy for Net Interest Income (NII) and Interest Rate Risk in the Banking Book (IRRBB) is central to banks’ earnings stability, balance-sheet management, and supervisory credibility. Yet many institutions continue to apply fixed deviation thresholds (for example, 3/4/5%) to govern forecast performance, even though forecast uncertainty widens with the horizon and may exhibit heavy-tailed behavior. Such limits therefore lack a consistent probabilistic interpretation and often misalign with the statistical properties of the underlying forecasting process. This paper develops an integrated, probability-coherent framework for monitoring NII forecasterrors and assessing IRRBB limit breaches. First, drawing on the Federal Reserve’s use of Root Mean Squared Error (RMSE) and fan charts to communicate forecast uncertainty, we construct horizon-specific, quantile-anchored thresholds that preserve consistent meaning across forecast horizons. The framework incorporates interval-forecast evaluation (unconditional and conditional coverage tests), quantile elicitability, bias-dispersion decomposition, and extreme-value modeling of rare outcomes. Second, we extend the methodology to IRRBB by quantifying the probability that limits on changes in NII (?NII) are breached solely due to forecast or model uncertainty.
    Keywords: Net Interest Income forecasting, Interest rate risk in the banking book (IRRBB), Quantile-based escalation thresholds, Forecast uncertainty
    JEL: G21 G32 C58 G28
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:fme:wpaper:114
  4. By: Pu Cheng; Juncheng Liu; Yunshen Long
    Abstract: Predicting real-world events from live market signals demands systems that fuse qualitative news with quantitative order-book dynamics under strict temporal discipline -- a challenge existing benchmarks fail to capture. We present \textbf{PolyBench}, a multimodal benchmark derived from Polymarket that records point-in-time cross-sections of 38, 666 binary prediction markets spanning 4, 997 events, synchronously coupling each snapshot with a Central Limit Order Book (CLOB) state and a real-time news stream. Using PolyBench, we evaluate seven state-of-the-art Large Language Models -- spanning open- and closed-source families -- generating 36, 165 predictions under identical, timestamp-locked market states collected between February 6 and 12, 2026. Our multidimensional framework assesses directional accuracy, our proposed Confidence-Weighted Return (CWR), Annualized Percentage Yield (APY), and Sharpe ratio via realistic order-book execution simulation. The results reveal a pronounced performance divergence: only two of seven models achieve positive financial returns -- MiMo-V2-Flash at \textbf{17.6%} CWR and Gemini-3-Flash at 6.2% CWR -- while the remaining five incur losses despite uniformly high stated confidence. These findings highlight the gap between surface-level language fluency and genuine probabilistic reasoning under live market uncertainty, and establish PolyBench as a contamination-proof, financially-grounded evaluation standard for future LLM research. Our dataset and code available at \underline{\href{https://github.com/Poly Bench/PolyBench}{https://github.com/Poly Bench/PolyBench}}.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.14199
  5. By: Onur Polat (Institute of Informatics, Hacettepe University, Beytepe Campus, 06800 Cankaya, Ankara, Turkiye); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Dhanashree Somani (Department of Statistics, University of Florida, 230 Newell Drive, Gainesville, FL, 32601, USA); Sayar Karmakar (Department of Statistics, University of Florida, 230 Newell Drive, Gainesville, FL, 32601, USA)
    Abstract: This study examines the predictive power of multi-scale positive and negative speculative bubbles in equity and energy markets for S&P 500 realized variance across horizons from 1 to 24 months. Using a hierarchical modeling framework and machine learning estimators, the analysis evaluates whether stock and oil bubbles provide incremental information beyond macroeconomic variables and financial uncertainty. Applying Clark and West's (2007) tests for nested model comparisons, the results reveal a hierarchy in predictive content that varies by forecast horizon. At the 1-month horizon, neither stock nor oil bubbles improves forecast accuracy. At the 3-month horizon, oil bubbles emerge as the dominant predictor; the Bayesian Regularized Neural Network (BRNN) estimator achieves a statistically significant improvement when oil bubbles are included with stock bubbles, resulting in a 30.7 percent reduction in mean squared error (MSE). At the 6-month horizon, stock bubbles become more important, with both the Gradient Boosting Machine (GBM) and BRNN estimators showing significant improvements. For longer horizons, oil bubbles remain relevant, but their predictive value depends on the estimator: BRNN captures oil bubble effects at 12 months, while GBM does so at 24 months. These findings highlight the importance of horizonspecific model selection and indicate a complex transmission of speculative shocks across asset classes.
    Keywords: Stock Market Realized Variance, Stock and Oil Bubbles, Machine Learning, Forecasting
    JEL: C22 C53 G10 Q51
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:pre:wpaper:202611
  6. By: Guo, Hongfei; Marín Díazaraque, Juan Miguel; Veiga, Helena
    Abstract: We propose target-driven Bayesian stacking for a fixed six-model ensemble of GARCH and stochastic-volatility forecasts with realised- and VIX-based extensions. Two rolling stacking rules target either log predictive density or QLIKE. In S&P 500, the objective changes the preferred information channel: LPD stacking remains centred on GARCH-RV, whereas QLIKE stacking shifts toward GARCH-VIX. Across 56 rolling windows, the QLIKE stack improves certainty-equivalent returns by roughly one to one-and-a-half percentage points per year, depending on the investor's risk aversion. In the 30 windows where the QLIKE stack assigns material weight to implied volatility models, the gain exceeds two percentage points per year with a 90% win rate. However, LPD stacking delivers tighter 5% Value-at-Risk calibration
    Keywords: Bayesian stacking; QLIKE; Implied volatility; Realised variance; Value-at-risk; Volatility forecasting
    JEL: C11 G17 C53
    Date: 2026–04–15
    URL: https://d.repec.org/n?u=RePEc:cte:wsrepe:49851
  7. By: Dalibor Stevanovic
    Abstract: This paper studies the 2021 U.S. inflation forecasting failure. I show that the failure was primarily driven by sample composition rather than functional-form misspecification: estimation samples dominated by the Great Moderation underweight supply-shock regimes, and expectations anchored to that regime were slow to recognize the shift. Three historically informed adjustments, an intercept correction, a similarity re-estimation on 1970s data, and a kernel-weighted estimator, substantially close the forecast gap, and the gains extend to eight additional U.S. price indices. Household survey respondents over 60, whose lifetime includes the 1970s, reported higher inflation expectations from early 2021, consistent with experience-based learning; younger cohorts remained anchored to the prevailing regime. A controlled experiment with large language models conditioned on ``experienced'' and ``young'' professional personas confirms that experiential priors generate significant forecast differences under a common training leakage assumption. Across all three exercises, the source of the prior mattered more than the sophistication of the model.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.14467
  8. By: Nicolás Bonino-Gayoso (Universidad Complutense de Madrid); Mónica Correa-López (Banco de España)
    Abstract: This paper empirically explores the pass-through channel of inflation expectations to inflation by looking at a real-time macroeconomic forecasting exercise conducted by an exogenous observer. Models that are informed either by households’ updated beliefs about future inflation or, especially, by services firms’ expected changes in their own prices can systematically predict core inflation more accurately – and do so in a stable way – than a class of commonly used models that do not use this information. Qualitative updates in households and firms price surveys emerge as relevant signals of consumer and firm behavior, since they influence aggregate inflation dynamics. These results point to an economically meaningful pass-through channel of short-term inflation expectations to inflation.
    Keywords: inflation, inflation expectations, Phillips curve, real-time forecasting
    JEL: E31 E37 E52
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:bde:wpaper:2613
  9. By: Xiang Ao; Jingxuan Zhang; Xinyu Zhao
    Abstract: Accurately predicting stock repurchases is crucial for quantitative investment and risk management, yet traditional static models fail to capture the complex temporal dependencies of corporate financial conditions. This paper proposes a dynamic early warning system integrating economic theory with deep temporal networks. Using Chinese A-share panel data (2014-2024), we employ a hybrid Temporal Convolutional Network (TCN) and Attention-based LSTM to capture long- and short-term financial evolutionary patterns. Rolling-window cross-validation demonstrates our model significantly outperforms static baselines like Logistic Regression and XGBoost. Furthermore, utilizing Explainable AI (XAI), we reveal the temporal dynamics of repurchase decisions: prolonged "undervaluation" serves as the long-term underlying motive, while a sharp increase in "cash flow" acts as the decisive short-term trigger. This study provides a robust deep learning paradigm for financial forecasting and offers dynamic empirical support for classic corporate finance hypotheses.
    Date: 2026–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.09650
  10. By: Alessandro Falezza
    Abstract: Battery energy storage systems (BESS) participating in multi-market electricity trading require price forecasts to optimize dispatch decisions. A widely held assumption is that forecast accuracy, measured by standard metrics such as mean absolute error (MAE), drives trading performance. We challenge this assumption using a hierarchical three-layer optimization system trading simultaneously on frequency containment reserve (FCR), automatic frequency restoration reserve (aFRR), day-ahead, and continuous intraday (XBID) markets in Germany and Switzerland over 2020-2025, with real market data from Regelleistung.net and Swissgrid. We find that rank correlation (Kendall tau), rather than MAE, is the primary predictor of intraday dispatch value: forecasts above an empirical threshold of tau approximately 0.85-0.95 capture up to 97-100% of perfect-foresight revenue, while persistence forecasts with near-zero tau capture only 33%. This threshold is stable across market regimes and volatility levels, and reflects the ordinal structure of the dispatch problem. Furthermore, under reserve market constraints, FCR capacity revenue exceeds XBID by 6.5x per MW, making capacity allocation -- not forecast accuracy -- the primary driver of total revenue. In the Swiss market, hydrological surplus anomalies are significantly associated with balancing market revenue (p = 0.0005), a mechanism absent from existing German-focused literature. These findings reframe forecast evaluation for BESS operators: the relevant question is not what the MAE is, but whether the forecast achieves tau-sufficiency.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.12082
  11. By: Ezra Karger; Otto Kuusela; Jason Abaluck; Kevin A. Bryan; Basil Halperin; Todd R. Jones; Connacher Murphy; Philip Trammell; Matt Reynolds; Dan Mayland; Ria Viswanathan; Ananaya Mittal; Rebecca Ceppas de Castro; Josh Rosenberg; Philip Tetlock
    Abstract: We elicit forecasts of how AI will affect the U.S. economy, comparing the beliefs of five groups: academic economists, employees at AI companies, policy researchers focused on AI, highly accurate forecasters, and the general public. The median respondent in each group expects substantial advances in AI capabilities by 2030, small declines in labor force participation consistent with demographic shifts, and an annual GDP growth rate of 2.5%, which exceeds both the typical medium-run (2.0%) and long-run (1.7%) baseline forecasts from government agencies and private-sector forecasters. Conditional on a “rapid” AI progress scenario, in which AI systems surpass human performance on many cognitive and physical tasks, experts forecast substantial, though not historically unprecedented, economic shifts: annualized GDP growth rising to around 4% and the labor force participation rate falling from its current level of 62% to 55% by 2050, with roughly half of that decline—equivalent to around 10 million lost jobs—attributable to AI. A variance decomposition suggests that expert disagreement about these effects is driven primarily by different beliefs about the economic effects of highly capable AI systems rather than by disagreement about the pace of AI progress. These forecasts map onto notably different policy preferences across groups: experts strongly favor targeted measures such as worker retraining, whereas the general public supports both targeted programs and broader interventions, including a job guarantee and universal basic income.
    JEL: C83 E27 J21 O33 O47
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:35046
  12. By: Karmanpartap Singh Sidhu; Junyi Fan; Maryam Pishgar
    Abstract: We utilize FinBERT, a domain-specific transformer model, to parse 6.5 million sentences from 16, 428 S&P 500 quarterly earnings call transcripts (2015-2025) and demonstrate that post-earnings stock returns are not equally affected by all speakers in a conference call. Our section-weighted sentiment, with empirically derived speaker weights (Analyst 49%, CFO 30%, Executive 16%, Other 5%), achieves an out-of-sample Spearman IC of 0.142 versus 0.115 in-sample, generates monthly long-short alpha of 2.03% unexplained by the Fama-French five-factor model (t = 6.49), and remains significant after controlling for standardized unexpected earnings (SUE). FinBERT section-weighted sentiment entirely subsumes the Loughran-McDonald dictionary approach (FinBERT t = 5.90; LM t = 0.86 in the combined specification). Signal decay analysis and cumulative abnormal return charts confirm gradual price adjustment consistent with sluggish assimilation of soft information. All results undergo rigorous out-of-sample validation with an explicit temporal split, yielding improved rather than deteriorated predictive power.
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2604.13260
  13. By: Fantazzini, Dean; Kurbatskii, Alexey
    Abstract: This paper investigates the utility of Google Trends data for nowcasting and forecasting regional Consumer Price Indices (CPIs) in Russia. For nowcasting, we compare random walk, ARIMA, and Autoregressive Distributed Lag (ARDL) models, with and without search data. For forecasting, we evaluate ten approaches, including Vector Autoregression (VAR) with Hierarchical Lasso (HLag), dynamic factor models, and shrinkage methods. Results show that for nowcasting, multivariate ARDL models with macroeconomic data consistently outperform simpler ones, while Google Trends adds positive but limited value. In forecasting, search data offers negligible average improvement due to a structural break in early 2022: its predictive power was significant before the geopolitical shift but degraded sharply afterward. Instead, the VAR model with HLag sparsity and comprehensive macroeconomic data consistently proves superior. A robustness check with random forests confirms the advantage of the sparse structured approach. The study highlights the nuanced role of online data and the importance of sparse models for robust forecasting in Russian regions.
    Keywords: Nowcasting and Forecasting; Google Trends; Russian Regions; ARDL; VAR; Hierarchical Lasso; Random Forests; Regional CPI; Nonparametric Shrinkage
    JEL: C14 C32 C53 C55 E31 E37 R11
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:pra:mprapa:128456
  14. By: Tae-Hwy Lee (Department of Economics, University of California Riverside); Saerom Lee (University of California, Riverside)
    Abstract: The mean response in the Survey of Professional Forecasters (SPF) is widely used to summarize individual forecasts. In this paper, we propose a novel summary forecast that enhances the predictive power of the mean response by selectively incorporating idiosyncratic signals. Our framework is motivated by the observation that while individual forecasts are highly correlated—suggesting a factor structure—they also exhibit significant heterogeneity. We treat the mean response as the primary common factor and define heterogeneity as the idiosyncratic component of each individual forecast after accounting for this commonality. Employing a factor-adjusted regularized framework, we integrate informative idiosyncratic components to improve the mean response. Using SPF data from the Federal Reserve Bank of Philadelphia and the European Central Bank, we show that incorporating these idiosyncratic components leads to significant predictive gains over the mean response.
    Keywords: mean response; heterogeneity; common components; idiosyncratic components
    JEL: C22 C32
    Date: 2026–04
    URL: https://d.repec.org/n?u=RePEc:ucr:wpaper:202602

This nep-for issue is ©2026 by Malte Knüppel. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.