nep-for New Economics Papers
on Forecasting
Issue of 2026–02–16
seventeen papers chosen by
Malte Knüppel, Deutsche Bundesbank


  1. Regularized Ensemble Forecasting for Learning Weights from Historical and Current Forecasts By Han Su; Xiaojia Guo; Xiaoke Zhang
  2. Forecasting Realized Volatility of State-Level Stock Markets of the United States: The Role of Sentiment By Giovanni Bonaccolto; Massimiliano Caporin; Oguzhan Cepni; Rangan Gupta
  3. Test-Time Adaptation for Non-stationary Time Series: From Synthetic Regime Shifts to Financial Markets By Yurui Wu; Qingying Deng; Wonou Chung; Mairui Li
  4. Forecasting Oil Consumption: The Statistical Review of World Energy Meets Machine Learning By Jan Ditzen; Erkal Ersoy; Haoyang Li; Francesco Ravazzolo
  5. ChatMacro: Evaluating Inflation Forecasts of Generative AI By M.Jahangir Alam; Shane Boyle; Huiyu Li; Tatevik Sekhposyan
  6. Predictive Synthesis under Sporadic Participation: Evidence from Inflation Density Surveys By Matthew C. Johnson; Matteo Luciani; Minzhengxiong Zhang; Kenichiro McAlinn
  7. Exploring the Interpretability of Forecasting Models for Energy Balancing Market By Oskar V{\aa}le; Shiliang Zhang; Sabita Maharjan; Gro Kl{\ae}boe
  8. Nowcasting Economic Growth with Machine Learning and Satellite Data By Eurydice Fotopoulou; Iyke Maduako; M. Belen Sbrancia; Prachi Srivastava
  9. Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series By Luis Ontaneda Mijares; Nick Firoozye
  10. Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis By Yuanhong Wu; Wei Ye; Jingyan Xu; D. Frank Hsu
  11. The Strategic Foresight of LLMs: Evidence from a Fully Prospective Venture Tournament By Felipe A. Csaszar; Aticus Peterson; Daniel Wilde
  12. Generative AI for Stock Selection By Keywan Christian Rasekhschaffe
  13. Extracting Risk Free Interest Rate Expectations in a Less Liquid Government Bond Markets By Marcin Dec
  14. VS-LTGARCHX: A Flexible Variable Selection in Log-TGARCHX Models By Samir Orujov; Victor Elvira; Audrey Poterie; Farid Rajabov; Francois Septier
  15. Understanding and Forecasting Inflation in Timor-Leste By Kohei Asao; Raju Huidrom
  16. Semantic Similarity Measures in Newspaper Text for Detecting and Predicting Disruptive Institutional Events By Mayoral, L.; Mueller, H.; Philipp, M.; Rauh, C.; Vassallo, R.
  17. Monitoring Macroeconomic Prospects with a Meta VAR-E Dashboard By Kevin Lee; Kalvinder Shields

  1. By: Han Su; Xiaojia Guo; Xiaoke Zhang
    Abstract: Combining forecasts from multiple experts often yields more accurate results than relying on a single expert. In this paper, we introduce a novel regularized ensemble method that extends the traditional linear opinion pool by leveraging both current forecasts and historical performances to set the weights. Unlike existing approaches that rely only on either the current forecasts or past accuracy, our method accounts for both sources simultaneously. It learns weights by minimizing the variance of the combined forecast (or its transformed version) while incorporating a regularization term informed by historical performances. We also show that this approach has a Bayesian interpretation. Different distributional assumptions within this Bayesian framework yield different functional forms for the variance component and the regularization term, adapting the method to various scenarios. In empirical studies on Walmart sales and macroeconomic forecasting, our ensemble outperforms leading benchmark models both when experts' full forecasting histories are available and when experts enter and exit over time, resulting in incomplete historical records. Throughout, we provide illustrative examples that show how the optimal weights are determined and, based on the empirical results, we discuss where the framework's strengths lie and when experts' past versus current forecasts are more informative.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.11379
  2. By: Giovanni Bonaccolto (Department of Economics and Law, ``Kore" University of Enna, Piazza dell'Universita, 94100 Enna, Italy); Massimiliano Caporin (Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241/243, Padova, Italy); Oguzhan Cepni (Ostim Technical University, Ankara, Turkiye; University of Edinburgh Business School, Centre for Business, Climate Change, and Sustainability; Department of Economics, Copenhagen Business School, Denmark); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa)
    Abstract: We investigate whether sentiment innovations help forecast realized volatility in U.S. state-level stock markets. We combine 5-minute intraday data for 50 U.S. states with a daily state-level Twitter-based sentiment index over the period August 2011 to August 2024. Realized variance, skewness, and kurtosis are constructed using intermittency-adjusted estimators that account for sparse trading and zero returns. We adopt a Heterogeneous Autoregressive framework and enrich it with higher-order realized moments and changes in state-level sentiment, estimating the models via weighted least squares to mitigate heteroskedasticity effects. Out-of-sample performance is assessed in a rolling-window forecasting design for daily, weekly, and monthly horizons, and formal forecast comparisons are conducted using Diebold-Mariano and Clark-West tests. Our results confirm that the Heterogeneous Autoregressive components remain the dominant drivers of realized volatility dynamics across all horizons. Importantly, tail-risk information, proxied by realized kurtosis, delivers the most systematic and economically meaningful improvements in predictive accuracy, particularly at short horizons. Sentiment changes exhibit an episodic but non-negligible predictive foot-print: while their average in-sample contribution is limited, they enhance forecast performance for a subset of states, especially when combined with higher-moment information in richer specifications. Overall, our findings highlight that integrating in-traday distributional characteristics and sentiment innovations can improve volatility forecasting at the regional level, albeit in a state- and horizon-dependent manner.
    Keywords: State-level stock markets, Sentiment, HAR-RV, Realized moments, Forecast evaluation
    JEL: C53 C58 G11 G17
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:pre:wpaper:202603
  3. By: Yurui Wu; Qingying Deng; Wonou Chung; Mairui Li
    Abstract: Time series encountered in practice are rarely stationary. When the data distribution changes, a forecasting model trained on past observations can lose accuracy. We study a small-footprint test-time adaptation (TTA) framework for causal timeseries forecasting and direction classification. The backbone is frozen, and only normalization affine parameters are updated using recent unlabeled windows. For classification we minimize entropy and enforce temporal consistency; for regression we minimize prediction variance across weak time-preserving augmentations and optionally distill from an EMA teacher. A quadratic drift penalty and an uncertainty triggered fallback keep updates stable. We evaluate this framework in two stages: synthetic regime shifts on ETT benchmarks, and daily equity and FX series (SPY, QQQ, EUR/USD) across pandemic, high-inflation, and recovery regimes. On synthetic gradual drift, normalization-based TTA improves forecasting error, while in financial markets a simple batch-normalization statistics update is a robust default and more aggressive norm-only adaptation can even hurt. Our results provide practical guidance for deploying TTA on non-stationary time series.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.00073
  4. By: Jan Ditzen; Erkal Ersoy; Haoyang Li; Francesco Ravazzolo
    Abstract: This paper studies whether a small set of dominant countries can account for most of the dynamics of regional oil demand and improve forecasting performance. We focus on dominant drivers within the OECD and a broad GVAR sample covering over 90\% of world GDP. Our approach identifies dominant drivers from a high-dimensional concentration matrix estimated row by row using two complementary variable-selection methods, LASSO and the one-covariate-at-a-time multiple testing (OCMT) procedure. Dominant countries are selected by ordering the columns of the concentration matrix by their norms and applying a criterion based on consecutive norm ratios, combined with economically motivated restrictions to rule out pseudo-dominance. The United States emerges as a global dominant driver, while France and Japan act as robust regional hubs representing European and Asian components, respectively. Including these dominant drivers as regressors for all countries yields statistically significant forecast gains over autoregressive benchmarks and country-specific LASSO models, particularly during periods of heightened global volatility. The proposed framework is flexible and can be applied to other macroeconomic and energy variables with network structure or spatial dependence.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.01963
  5. By: M.Jahangir Alam; Shane Boyle; Huiyu Li; Tatevik Sekhposyan
    Abstract: Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions.
    Keywords: large language models; generative AI; inflation forecasting
    JEL: C45 E31 E37
    Date: 2026–02–05
    URL: https://d.repec.org/n?u=RePEc:fip:fedfwp:102407
  6. By: Matthew C. Johnson; Matteo Luciani; Minzhengxiong Zhang; Kenichiro McAlinn
    Abstract: Central banks rely on density forecasts from professional surveys to assess inflation risks and communicate uncertainty. A central challenge in using these surveys is irregular participation: forecasters enter and exit, skip rounds, and reappear after long gaps. In the European Central Bank's Survey of Professional Forecasters, turnover and missingness vary substantially over time, causing the set of submitted predictions to change from quarter to quarter. Standard aggregation rules -- such as equal-weight pooling, renormalization after dropping missing forecasters, or ad hoc imputation -- can generate artificial jumps in combined predictions driven by panel composition rather than economic information, complicating real-time interpretation and obscuring forecaster performance. We develop coherent Bayesian updating rules for forecast combination under sporadic participation that maintain a well-defined latent predictive state for each forecaster even when their forecast is unobserved. Rather than relying on renormalization or imputation, the combined predictive distribution is updated through the implied conditional structure of the panel. This approach isolates genuine performance differences from mechanical participation effects and yields interpretable dynamics in forecaster influence. In the ECB survey, it improves predictive accuracy relative to equal-weight benchmarks and delivers smoother and better-calibrated inflation density forecasts, particularly during periods of high turnover.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.05226
  7. By: Oskar V{\aa}le; Shiliang Zhang; Sabita Maharjan; Gro Kl{\ae}boe
    Abstract: The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand. Modeling dynamics in the balancing market can provide valuable insights and prognosis for power grid stability and secure energy supply. While complex machine learning models can achieve high accuracy, their black-box nature severely limits the model interpretability. In this paper, we explore the trade-off between model accuracy and interpretability for the energy balancing market. Particularly, we take the example of forecasting manual frequency restoration reserve (mFRR) activation price in the balancing market using real market data from different energy price zones. We explore the interpretability of mFRR forecasting using two models: extreme gradient boosting (XGBoost) machine and explainable boosting machine (EBM). We also integrate the two models, and we benchmark all the models against a baseline naive model. Our results show that EBM provides forecasting accuracy comparable to XGBoost while yielding a considerable level of interpretability. Our analysis also underscores the challenge of accurately predicting the mFRR price for the instances when the activation price deviates significantly from the spot price. Importantly, EBM's interpretability features reveal insights into non-linear mFRR price drivers and regional market dynamics. Our study demonstrates that EBM is a viable and valuable interpretable alternative to complex black-box AI models in the forecast for the balancing market.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.00049
  8. By: Eurydice Fotopoulou; Iyke Maduako; M. Belen Sbrancia; Prachi Srivastava
    Abstract: The absence of reliable data on fundamental economic indicators (e.g. real GDP), combined with structural shifts in the economy, can severely constrain the ability to conduct accurate macroeconomic analysis and forecasting. This paper explores alternatives to address data limitations by integrating machine learning and satellite data to estimate real GDP. Specifically, it finds that incorporating satellite-based nightlight data into a random forest model significantly improves the accuracy of quarterly GDP growth estimates compared with models relying solely on traditional indicators. This empirical application contributes to the emerging nowcasting field to enhance economic forecasting in economies with significant data gaps.
    Keywords: Macroeconomic forecast; Machine learning; Nowcasting; GDP; Satellite data; Random Forest
    Date: 2026–01–30
    URL: https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/020
  9. By: Luis Ontaneda Mijares; Nick Firoozye
    Abstract: Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2601.22200
  10. By: Yuanhong Wu; Wei Ye; Jingyan Xu; D. Frank Hsu
    Abstract: In this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financial product has always been a big topic in finance, as the successful prediction of the price can yield significant profit. Every machine learning model has its own strength and weakness, which hinders progress toward robustness. CFA has been used to enhance models by leveraging rank-score characteristic (RSC) function and cognitive diversity in the combination of a moderate set of diverse and relatively well-performed models. Our method utilizes both score and rank combinations as well as other weighted combination techniques. Key metrics such as RMSE and MAPE are used to evaluate our methodology performance. Our proposal presents a notable MAPE performance of 0.19\%. The proposed method greatly improves upon individual model performance, as well as outperforms other Bitcoin price prediction models.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.00037
  11. By: Felipe A. Csaszar; Aticus Peterson; Daniel Wilde
    Abstract: Can artificial intelligence outperform humans at strategic foresight -- the capacity to form accurate judgments about uncertain, high-stakes outcomes before they unfold? We address this question through a fully prospective prediction tournament using live Kickstarter crowdfunding projects. Thirty U.S.-based technology ventures, launched after the training cutoffs of all models studied, were evaluated while fundraising remained in progress and outcomes were unknown. A diverse suite of frontier and open-weight large language models (LLMs) completed 870 pairwise comparisons, producing complete rankings of predicted fundraising success. We benchmarked these forecasts against 346 experienced managers recruited via Prolific and three MBA-trained investors working under monitored conditions. The results are striking: human evaluators achieved rank correlations with actual outcomes between 0.04 and 0.45, while several frontier LLMs exceeded 0.60, with the best (Gemini 2.5 Pro) reaching 0.74 -- correctly ordering nearly four of every five venture pairs. These differences persist across multiple performance metrics and robustness checks. Neither wisdom-of-the-crowd ensembles nor human-AI hybrid teams outperformed the best standalone model.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.01684
  12. By: Keywan Christian Rasekhschaffe
    Abstract: We study whether generative AI can automate feature discovery in U.S. equities. Using large language models with retrieval-augmented generation and structured/programmatic prompting, we synthesize economically motivated features from analyst, options, and price-volume data. These features are then used as inputs to a tabular machine-learning model to forecast short-horizon returns. Across multiple datasets, AI-generated features are consistently competitive with baselines, with Sharpe improvements ranging from 14% to 91% depending on dataset and configuration. Retrieval quality is pivotal: better knowledge bases materially improve outcomes. The AI-generated signals are weakly correlated with traditional features, supporting combination. Overall, generative AI can meaningfully augment feature discovery when retrieval quality is controlled, producing interpretable signals while reducing manual engineering effort.
    Date: 2026–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.00196
  13. By: Marcin Dec (Group for Research in Applied Economics (GRAPE))
    Abstract: This paper shows that in a less liquid government bond market, filtering term premia through a regression-based Adrian, Crump & Moench (ACM) framework yields risk neutral short rate expectations that match, and often rival, the accuracy of Survey of Professional Forecasters (SPF). Using monthly zero-coupon yields, we extract a model consistent risk free yield curve whose implied forward rates exhibit forecasting performance comparable to SPF paths across horizons up to three years. Crucially, these expectations can be generated daily, providing far higher frequency information than SPF’s quarterly releases. We find that term premia are negligible at the short end but rise with maturity, and that the level factor—despite capturing most yield variance-does not command a price of risk. Cointegration tests indicate that SPF forecasts contain no incremental information beyond the filtered curve. The results highlight a practical advantage: once premia are removed, the yield curve becomes a reliable, high frequency source of monetary policy expectations suitable for policy analysis and market surveillance.
    Keywords: Term Premia Extraction, Risk Neutral Interest Rate Expectations, Yield Curve Decomposition, Survey of Professional Forecasters
    JEL: E43 G12 G17
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:fme:wpaper:113
  14. By: Samir Orujov (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique); Victor Elvira (The University of Edinburgh, Institut TELECOM/TELECOM Lille1 - IMT - Institut Mines-Télécom [Paris], CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 - Centrale Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique); Audrey Poterie (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique); Farid Rajabov (UCL - University College London [UCL]); Francois Septier (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique, UBS - Université de Bretagne Sud)
    Abstract: The log-TGARCHX model is less restrictive in terms of the inclusion of exogenous variables and asymmetry lags compared to the GARCHX model. Nevertheless, adding less (or more) covariates than necessary may lead to under- or overfitting, respectively. In this context, we propose a new algorithm, called VS-LTGARCHX, which incorporates a variable selection procedure into the log-TGARCHX estimation process. Furthermore, the VS-LTGARCHX algorithm is applied to extremely volatile BTC markets using 42 conditioning variables. Interestingly, our results show that the VS-LTGARCHX models outperform benchmark models, namely the log-GARCH(1, 1) and log-TGARCHX(1, 1) models, in one-step-ahead forecasting.
    Keywords: variable selection, Bitcoin volatility, log-GARCHX, GARCH
    Date: 2025–05–16
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-04283159
  15. By: Kohei Asao; Raju Huidrom
    Abstract: This paper presents a comprehensive analysis of inflation in Timor-Leste—a post-conflict, low-income economy and small developing state that is fully dollarized. We find that Timorese inflation was high until about mid-2010 and was strongly influenced by swings in global food prices given its high share of food in the CPI basket and heavy reliance on food imports. But inflation has been relatively low and stable in the past decade relative to peers—a period that also broadly coincided with moderate global food prices. We develop an empirical model for Timorese inflation that distills the role of these underlying drivers, and which can be deployed for forecasting inflation.
    Keywords: Inflation; Phillips curve; Low-Income Country; Fragile and Conflict State; Dollarization; Timor-Leste
    Date: 2026–02–06
    URL: https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/024
  16. By: Mayoral, L.; Mueller, H.; Philipp, M.; Rauh, C.; Vassallo, R.
    Abstract: This article proposes a semantic-similarity approach to detecting and predicting rare events in newspaper text and applies it to institutional disruptions. Using a global news corpus covering more than 170 countries, we measure the similarity of headlines to event-specific prototypes in embedding space and aggregate these signals to identify disruptions to political institutions. We combine these text-based measures with supervised nowcasting and targeted human verification to expand existing datasets on military coups, irregular term-limit extensions, and weakening of the judiciary. The resulting event data are then used to forecast the likelihood of disruptions up to 12 months ahead, providing a high-frequency and scalable tool for monitoring institutional risk. As an illustration of its empirical value, we document that coups are followed by large and persistent declines in economic growth. More broadly, the framework can be adapted to detect and track a wide range of economic and political events and policy actions from news text in real time and in historical archives.
    Keywords: Political Institutions, Autocratization, Military Coups, Term Limit Evasion, Judiciary Weakening, Semantic Similarity, Embeddings, Nowcasting, Forecasting
    JEL: C53 C55 D72 P16
    Date: 2026–01–14
    URL: https://d.repec.org/n?u=RePEc:cam:camdae:2609
  17. By: Kevin Lee; Kalvinder Shields
    Abstract: The time series properties of output and price inflation can be accurately captured using VAR-E's, Vector-Autoregressive models of actual and expected measures of the series where the latter are provided by surveys. The paper proposes a method for estimating VAR-E's that accommodate individuals' real-time understanding of the macroeconomy and which deliver forecasts in a way that is useful to decision-makers. It notes the sort of statistics and figures that might be reported in a 'dashboard' to monitor the health of the macroeconomy, and this is illustrated using the actual and expected data produced by the Bank of England's Decision-Maker Panel.
    Keywords: learning, expectations, surveys, forecasts, decision-making
    JEL: C32 D84 E31 E32
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:een:camaaa:2026-10

This nep-for issue is ©2026 by Malte Knüppel. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.