|
on Forecasting |
| By: | Han Su; Xiaojia Guo; Xiaoke Zhang |
| Abstract: | Combining forecasts from multiple experts often yields more accurate results than relying on a single expert. In this paper, we introduce a novel regularized ensemble method that extends the traditional linear opinion pool by leveraging both current forecasts and historical performances to set the weights. Unlike existing approaches that rely only on either the current forecasts or past accuracy, our method accounts for both sources simultaneously. It learns weights by minimizing the variance of the combined forecast (or its transformed version) while incorporating a regularization term informed by historical performances. We also show that this approach has a Bayesian interpretation. Different distributional assumptions within this Bayesian framework yield different functional forms for the variance component and the regularization term, adapting the method to various scenarios. In empirical studies on Walmart sales and macroeconomic forecasting, our ensemble outperforms leading benchmark models both when experts' full forecasting histories are available and when experts enter and exit over time, resulting in incomplete historical records. Throughout, we provide illustrative examples that show how the optimal weights are determined and, based on the empirical results, we discuss where the framework's strengths lie and when experts' past versus current forecasts are more informative. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.11379 |
| By: | Giovanni Bonaccolto (Department of Economics and Law, ``Kore" University of Enna, Piazza dell'Universita, 94100 Enna, Italy); Massimiliano Caporin (Department of Statistical Sciences, University of Padova, Via Cesare Battisti 241/243, Padova, Italy); Oguzhan Cepni (Ostim Technical University, Ankara, Turkiye; University of Edinburgh Business School, Centre for Business, Climate Change, and Sustainability; Department of Economics, Copenhagen Business School, Denmark); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa) |
| Abstract: | We investigate whether sentiment innovations help forecast realized volatility in U.S. state-level stock markets. We combine 5-minute intraday data for 50 U.S. states with a daily state-level Twitter-based sentiment index over the period August 2011 to August 2024. Realized variance, skewness, and kurtosis are constructed using intermittency-adjusted estimators that account for sparse trading and zero returns. We adopt a Heterogeneous Autoregressive framework and enrich it with higher-order realized moments and changes in state-level sentiment, estimating the models via weighted least squares to mitigate heteroskedasticity effects. Out-of-sample performance is assessed in a rolling-window forecasting design for daily, weekly, and monthly horizons, and formal forecast comparisons are conducted using Diebold-Mariano and Clark-West tests. Our results confirm that the Heterogeneous Autoregressive components remain the dominant drivers of realized volatility dynamics across all horizons. Importantly, tail-risk information, proxied by realized kurtosis, delivers the most systematic and economically meaningful improvements in predictive accuracy, particularly at short horizons. Sentiment changes exhibit an episodic but non-negligible predictive foot-print: while their average in-sample contribution is limited, they enhance forecast performance for a subset of states, especially when combined with higher-moment information in richer specifications. Overall, our findings highlight that integrating in-traday distributional characteristics and sentiment innovations can improve volatility forecasting at the regional level, albeit in a state- and horizon-dependent manner. |
| Keywords: | State-level stock markets, Sentiment, HAR-RV, Realized moments, Forecast evaluation |
| JEL: | C53 C58 G11 G17 |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:pre:wpaper:202603 |
| By: | Yurui Wu; Qingying Deng; Wonou Chung; Mairui Li |
| Abstract: | Time series encountered in practice are rarely stationary. When the data distribution changes, a forecasting model trained on past observations can lose accuracy. We study a small-footprint test-time adaptation (TTA) framework for causal timeseries forecasting and direction classification. The backbone is frozen, and only normalization affine parameters are updated using recent unlabeled windows. For classification we minimize entropy and enforce temporal consistency; for regression we minimize prediction variance across weak time-preserving augmentations and optionally distill from an EMA teacher. A quadratic drift penalty and an uncertainty triggered fallback keep updates stable. We evaluate this framework in two stages: synthetic regime shifts on ETT benchmarks, and daily equity and FX series (SPY, QQQ, EUR/USD) across pandemic, high-inflation, and recovery regimes. On synthetic gradual drift, normalization-based TTA improves forecasting error, while in financial markets a simple batch-normalization statistics update is a robust default and more aggressive norm-only adaptation can even hurt. Our results provide practical guidance for deploying TTA on non-stationary time series. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00073 |
| By: | Jan Ditzen; Erkal Ersoy; Haoyang Li; Francesco Ravazzolo |
| Abstract: | This paper studies whether a small set of dominant countries can account for most of the dynamics of regional oil demand and improve forecasting performance. We focus on dominant drivers within the OECD and a broad GVAR sample covering over 90\% of world GDP. Our approach identifies dominant drivers from a high-dimensional concentration matrix estimated row by row using two complementary variable-selection methods, LASSO and the one-covariate-at-a-time multiple testing (OCMT) procedure. Dominant countries are selected by ordering the columns of the concentration matrix by their norms and applying a criterion based on consecutive norm ratios, combined with economically motivated restrictions to rule out pseudo-dominance. The United States emerges as a global dominant driver, while France and Japan act as robust regional hubs representing European and Asian components, respectively. Including these dominant drivers as regressors for all countries yields statistically significant forecast gains over autoregressive benchmarks and country-specific LASSO models, particularly during periods of heightened global volatility. The proposed framework is flexible and can be applied to other macroeconomic and energy variables with network structure or spatial dependence. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01963 |
| By: | M.Jahangir Alam; Shane Boyle; Huiyu Li; Tatevik Sekhposyan |
| Abstract: | Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions. |
| Keywords: | large language models; generative AI; inflation forecasting |
| JEL: | C45 E31 E37 |
| Date: | 2026–02–05 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedfwp:102407 |
| By: | Matthew C. Johnson; Matteo Luciani; Minzhengxiong Zhang; Kenichiro McAlinn |
| Abstract: | Central banks rely on density forecasts from professional surveys to assess inflation risks and communicate uncertainty. A central challenge in using these surveys is irregular participation: forecasters enter and exit, skip rounds, and reappear after long gaps. In the European Central Bank's Survey of Professional Forecasters, turnover and missingness vary substantially over time, causing the set of submitted predictions to change from quarter to quarter. Standard aggregation rules -- such as equal-weight pooling, renormalization after dropping missing forecasters, or ad hoc imputation -- can generate artificial jumps in combined predictions driven by panel composition rather than economic information, complicating real-time interpretation and obscuring forecaster performance. We develop coherent Bayesian updating rules for forecast combination under sporadic participation that maintain a well-defined latent predictive state for each forecaster even when their forecast is unobserved. Rather than relying on renormalization or imputation, the combined predictive distribution is updated through the implied conditional structure of the panel. This approach isolates genuine performance differences from mechanical participation effects and yields interpretable dynamics in forecaster influence. In the ECB survey, it improves predictive accuracy relative to equal-weight benchmarks and delivers smoother and better-calibrated inflation density forecasts, particularly during periods of high turnover. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.05226 |
| By: | Oskar V{\aa}le; Shiliang Zhang; Sabita Maharjan; Gro Kl{\ae}boe |
| Abstract: | The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand. Modeling dynamics in the balancing market can provide valuable insights and prognosis for power grid stability and secure energy supply. While complex machine learning models can achieve high accuracy, their black-box nature severely limits the model interpretability. In this paper, we explore the trade-off between model accuracy and interpretability for the energy balancing market. Particularly, we take the example of forecasting manual frequency restoration reserve (mFRR) activation price in the balancing market using real market data from different energy price zones. We explore the interpretability of mFRR forecasting using two models: extreme gradient boosting (XGBoost) machine and explainable boosting machine (EBM). We also integrate the two models, and we benchmark all the models against a baseline naive model. Our results show that EBM provides forecasting accuracy comparable to XGBoost while yielding a considerable level of interpretability. Our analysis also underscores the challenge of accurately predicting the mFRR price for the instances when the activation price deviates significantly from the spot price. Importantly, EBM's interpretability features reveal insights into non-linear mFRR price drivers and regional market dynamics. Our study demonstrates that EBM is a viable and valuable interpretable alternative to complex black-box AI models in the forecast for the balancing market. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00049 |
| By: | Eurydice Fotopoulou; Iyke Maduako; M. Belen Sbrancia; Prachi Srivastava |
| Abstract: | The absence of reliable data on fundamental economic indicators (e.g. real GDP), combined with structural shifts in the economy, can severely constrain the ability to conduct accurate macroeconomic analysis and forecasting. This paper explores alternatives to address data limitations by integrating machine learning and satellite data to estimate real GDP. Specifically, it finds that incorporating satellite-based nightlight data into a random forest model significantly improves the accuracy of quarterly GDP growth estimates compared with models relying solely on traditional indicators. This empirical application contributes to the emerging nowcasting field to enhance economic forecasting in economies with significant data gaps. |
| Keywords: | Macroeconomic forecast; Machine learning; Nowcasting; GDP; Satellite data; Random Forest |
| Date: | 2026–01–30 |
| URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/020 |
| By: | Luis Ontaneda Mijares; Nick Firoozye |
| Abstract: | Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.22200 |
| By: | Yuanhong Wu; Wei Ye; Jingyan Xu; D. Frank Hsu |
| Abstract: | In this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financial product has always been a big topic in finance, as the successful prediction of the price can yield significant profit. Every machine learning model has its own strength and weakness, which hinders progress toward robustness. CFA has been used to enhance models by leveraging rank-score characteristic (RSC) function and cognitive diversity in the combination of a moderate set of diverse and relatively well-performed models. Our method utilizes both score and rank combinations as well as other weighted combination techniques. Key metrics such as RMSE and MAPE are used to evaluate our methodology performance. Our proposal presents a notable MAPE performance of 0.19\%. The proposed method greatly improves upon individual model performance, as well as outperforms other Bitcoin price prediction models. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00037 |
| By: | Felipe A. Csaszar; Aticus Peterson; Daniel Wilde |
| Abstract: | Can artificial intelligence outperform humans at strategic foresight -- the capacity to form accurate judgments about uncertain, high-stakes outcomes before they unfold? We address this question through a fully prospective prediction tournament using live Kickstarter crowdfunding projects. Thirty U.S.-based technology ventures, launched after the training cutoffs of all models studied, were evaluated while fundraising remained in progress and outcomes were unknown. A diverse suite of frontier and open-weight large language models (LLMs) completed 870 pairwise comparisons, producing complete rankings of predicted fundraising success. We benchmarked these forecasts against 346 experienced managers recruited via Prolific and three MBA-trained investors working under monitored conditions. The results are striking: human evaluators achieved rank correlations with actual outcomes between 0.04 and 0.45, while several frontier LLMs exceeded 0.60, with the best (Gemini 2.5 Pro) reaching 0.74 -- correctly ordering nearly four of every five venture pairs. These differences persist across multiple performance metrics and robustness checks. Neither wisdom-of-the-crowd ensembles nor human-AI hybrid teams outperformed the best standalone model. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01684 |
| By: | Keywan Christian Rasekhschaffe |
| Abstract: | We study whether generative AI can automate feature discovery in U.S. equities. Using large language models with retrieval-augmented generation and structured/programmatic prompting, we synthesize economically motivated features from analyst, options, and price-volume data. These features are then used as inputs to a tabular machine-learning model to forecast short-horizon returns. Across multiple datasets, AI-generated features are consistently competitive with baselines, with Sharpe improvements ranging from 14% to 91% depending on dataset and configuration. Retrieval quality is pivotal: better knowledge bases materially improve outcomes. The AI-generated signals are weakly correlated with traditional features, supporting combination. Overall, generative AI can meaningfully augment feature discovery when retrieval quality is controlled, producing interpretable signals while reducing manual engineering effort. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00196 |
| By: | Marcin Dec (Group for Research in Applied Economics (GRAPE)) |
| Abstract: | This paper shows that in a less liquid government bond market, filtering term premia through a regression-based Adrian, Crump & Moench (ACM) framework yields risk neutral short rate expectations that match, and often rival, the accuracy of Survey of Professional Forecasters (SPF). Using monthly zero-coupon yields, we extract a model consistent risk free yield curve whose implied forward rates exhibit forecasting performance comparable to SPF paths across horizons up to three years. Crucially, these expectations can be generated daily, providing far higher frequency information than SPF’s quarterly releases. We find that term premia are negligible at the short end but rise with maturity, and that the level factor—despite capturing most yield variance-does not command a price of risk. Cointegration tests indicate that SPF forecasts contain no incremental information beyond the filtered curve. The results highlight a practical advantage: once premia are removed, the yield curve becomes a reliable, high frequency source of monetary policy expectations suitable for policy analysis and market surveillance. |
| Keywords: | Term Premia Extraction, Risk Neutral Interest Rate Expectations, Yield Curve Decomposition, Survey of Professional Forecasters |
| JEL: | E43 G12 G17 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:fme:wpaper:113 |
| By: | Samir Orujov (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique); Victor Elvira (The University of Edinburgh, Institut TELECOM/TELECOM Lille1 - IMT - Institut Mines-Télécom [Paris], CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 - Centrale Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique); Audrey Poterie (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique); Farid Rajabov (UCL - University College London [UCL]); Francois Septier (LMBA - Laboratoire de Mathématiques de Bretagne Atlantique - UBS - Université de Bretagne Sud - UBO EPE - Université de Brest - CNRS - Centre National de la Recherche Scientifique, UBS - Université de Bretagne Sud) |
| Abstract: | The log-TGARCHX model is less restrictive in terms of the inclusion of exogenous variables and asymmetry lags compared to the GARCHX model. Nevertheless, adding less (or more) covariates than necessary may lead to under- or overfitting, respectively. In this context, we propose a new algorithm, called VS-LTGARCHX, which incorporates a variable selection procedure into the log-TGARCHX estimation process. Furthermore, the VS-LTGARCHX algorithm is applied to extremely volatile BTC markets using 42 conditioning variables. Interestingly, our results show that the VS-LTGARCHX models outperform benchmark models, namely the log-GARCH(1, 1) and log-TGARCHX(1, 1) models, in one-step-ahead forecasting. |
| Keywords: | variable selection, Bitcoin volatility, log-GARCHX, GARCH |
| Date: | 2025–05–16 |
| URL: | https://d.repec.org/n?u=RePEc:hal:journl:hal-04283159 |
| By: | Kohei Asao; Raju Huidrom |
| Abstract: | This paper presents a comprehensive analysis of inflation in Timor-Leste—a post-conflict, low-income economy and small developing state that is fully dollarized. We find that Timorese inflation was high until about mid-2010 and was strongly influenced by swings in global food prices given its high share of food in the CPI basket and heavy reliance on food imports. But inflation has been relatively low and stable in the past decade relative to peers—a period that also broadly coincided with moderate global food prices. We develop an empirical model for Timorese inflation that distills the role of these underlying drivers, and which can be deployed for forecasting inflation. |
| Keywords: | Inflation; Phillips curve; Low-Income Country; Fragile and Conflict State; Dollarization; Timor-Leste |
| Date: | 2026–02–06 |
| URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/024 |
| By: | Mayoral, L.; Mueller, H.; Philipp, M.; Rauh, C.; Vassallo, R. |
| Abstract: | This article proposes a semantic-similarity approach to detecting and predicting rare events in newspaper text and applies it to institutional disruptions. Using a global news corpus covering more than 170 countries, we measure the similarity of headlines to event-specific prototypes in embedding space and aggregate these signals to identify disruptions to political institutions. We combine these text-based measures with supervised nowcasting and targeted human verification to expand existing datasets on military coups, irregular term-limit extensions, and weakening of the judiciary. The resulting event data are then used to forecast the likelihood of disruptions up to 12 months ahead, providing a high-frequency and scalable tool for monitoring institutional risk. As an illustration of its empirical value, we document that coups are followed by large and persistent declines in economic growth. More broadly, the framework can be adapted to detect and track a wide range of economic and political events and policy actions from news text in real time and in historical archives. |
| Keywords: | Political Institutions, Autocratization, Military Coups, Term Limit Evasion, Judiciary Weakening, Semantic Similarity, Embeddings, Nowcasting, Forecasting |
| JEL: | C53 C55 D72 P16 |
| Date: | 2026–01–14 |
| URL: | https://d.repec.org/n?u=RePEc:cam:camdae:2609 |
| By: | Kevin Lee; Kalvinder Shields |
| Abstract: | The time series properties of output and price inflation can be accurately captured using VAR-E's, Vector-Autoregressive models of actual and expected measures of the series where the latter are provided by surveys. The paper proposes a method for estimating VAR-E's that accommodate individuals' real-time understanding of the macroeconomy and which deliver forecasts in a way that is useful to decision-makers. It notes the sort of statistics and figures that might be reported in a 'dashboard' to monitor the health of the macroeconomy, and this is illustrated using the actual and expected data produced by the Bank of England's Decision-Maker Panel. |
| Keywords: | learning, expectations, surveys, forecasts, decision-making |
| JEL: | C32 D84 E31 E32 |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:een:camaaa:2026-10 |