|
on Forecasting |
| By: | Simon Hirsch; Florian Ziel |
| Abstract: | Electricity price forecasting supports decision-making in energy markets and asset operation. Probabilistic forecasts are increasingly adopted to explicitly quantify uncertainty, typically issued as quantile predictions or ensembles of the full predictive distribution. However, how improvements in statistical forecast quality translate into economic value remains unclear. Battery storage arbitrage in day-ahead markets is a popular application-based benchmark for this purpose. We analyze quantile-based trading strategies (QBTS) and identify two critical flaws: they do not incentivize honest probabilistic forecasting and they ignore the intertemporal dependence structure of electricity prices. We therefore frame battery optimization as a stochastic program based on fully probabilistic forecasts and examine decision quality measurement for risk-neutral and risk-averse settings under different uncertainty models. Our discussion touches both sides of the coin: How reliable is the economic evaluation of forecasting models though (simplified) application studies - and how do improvements in statistical forecast quality for stochastic programs relate to the decision-quality and economic performance? We provide theoretical justification and empirical evidence from a case study on the German electricity market. Our results highlight the pitfalls of ranking forecasting models through battery trading strategies. We conclude with implications for evaluation practice and directions for future research in application-based forecast assessment. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2604.19580 |
| By: | Hilde C. Bjørnland; Nicolás Hardy; Dimitris Korobilis |
| Abstract: | We develop a Quantile Bayesian Vector Autoregression (QBVAR) to forecast real oil prices across different quantiles of the conditional distribution. The model allows predictor effects to vary across quantiles, capturing asymmetries that standard mean-focused approaches miss. Using monthly data from 1975 to 2025, we document three findings. First, the QBVAR improves median forecasts by 2-5% relative to Bayesian VARs, demonstrating that quantile-specific dynamics matter even for point prediction. Second, uncertainty and financial condition variables strongly predict downside risk, with left-tail forecast improvements of 10-25% that intensify during crisis episodes. Third, right-tail forecasting remains difficult; stochastic volatility models dominate for upside risk, though forecast combinations that include the QBVAR recover these losses. The results show that modeling the conditional distribution yields substantial gains for tail risk assessment, particularly during major oil market disruptions. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:bny:wpaper:0148 |
| By: | Hardy, Nicolas; Korobilis, Dimitris |
| Abstract: | Composite quantile regression (CQR) is a robust and efficient estimator under heavy-tailed and contaminated errors. Existing Bayesian extensions rely on working likelihoods that require latent-variable augmentation and can deliver poorly calibrated credible intervals. We develop generalized Bayesian CQR, which exponentiates the composite quantile loss directly, targeting the same objective as frequentist CQR. Because generalized Bayes replaces point optimization with posterior averaging over the loss surface, it is especially relevant under heavy-tailed errors where the composite quantile loss flattens near its minimum. In generalized Bayes posterior dispersion depends on a learning rate that we calibrate by matching marginal variances to their frequentist sandwich counterparts. The resulting credible intervals achieve near-nominal coverage in cross-sectional settings and substantially reduce the undercoverage of i.i.d.\ intervals under serial dependence, with a residual shortfall under high persistence that mirrors the finite-sample bias of frequentist HAC inference. The calibration has a closed-form solution under flat priors and extends to normal and spike-and-slab LASSO priors for shrinkage and variable selection. Sampling uses standard Metropolis-Hastings with no latent variables, achieving roughly 100-fold computational gains over likelihood-based Bayesian CQR at a common quantile grid. Monte Carlo experiments show competitive or improved point estimation relative to frequentist CQR, reliable coverage, and robust variable selection across Gaussian, heavy-tailed, and contaminated error distributions. An equity premium forecasting application demonstrates that the efficiency and robustness gains translate into economically meaningful improvements in out-of-sample portfolio performance. |
| Keywords: | Composite quantile regression, Gibbs posterior, Generalized Bayes, Learning rate calibration, Equity premium forecasting, Spike-and-slab priors |
| JEL: | C11 C14 C21 C52 C53 E37 G17 |
| Date: | 2026–04–14 |
| URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:128752 |
| By: | Nicolas Hardy; Dimitris Korobilis |
| Abstract: | Composite quantile regression (CQR) is a robust and efficient estimator under heavy-tailed and contaminated errors. Existing Bayesian extensions rely on working likelihoods that require latent-variable augmentation and can deliver poorly calibrated credible intervals. We develop generalized Bayesian CQR, which exponentiates the composite quantile loss directly, targeting the same objective as frequentist CQR. Because generalized Bayes replaces point optimization with posterior averaging over the loss surface, it is especially relevant under heavy-tailed errors where the composite quantile loss flattens near its minimum. In generalized Bayes posterior dispersion depends on a learning rate that we calibrate by matching marginal variances to their frequentist sandwich counterparts. The resulting credible intervals achieve near-nominal coverage in cross-sectional settings and substantially reduce the undercoverage of i.i.d. intervals under serial dependence, with a residual shortfall under high persistence that mirrors the finite-sample bias of frequentist HAC inference. The calibration has a closed-form solution under flat priors and extends to normal and spike-and-slab LASSO priors for shrinkage and variable selection. Sampling uses standard Metropolis-Hastings with no latent variables, achieving roughly 100-fold computational gains over likelihood-based Bayesian CQR at a common quantile grid. Monte Carlo experiments show competitive or improved point estimation relative to frequentist CQR, reliable coverage, and robust variable selection across Gaussian, heavy-tailed, and contaminated error distributions. An equity premium forecasting application demonstrates that the efficiency and robustness gains translate into economically meaningful improvements in out-of-sample portfolio performance. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:bny:wpaper:0149 |
| By: | Ramón Talvi Robledo; Christopher Rauh; Ben Seimon; Hannes Mueller; Laura Mayoral |
| Abstract: | Forced displacement is an important policy challenge, yet forecasting is hindered by sparse, annually observed flow data and reporting delays. This article proposes a forecasting method for country outflows and dyadic flows tailored to this sparse data setting. We combine slow-moving structural predictors with high-frequency text-based signals, compress high-dimensional news into low-dimensional topic representations via Latent Dirichlet Allocation to mitigate overfitting, and estimate a stacked ensemble of gradient-boosted trees that captures non-linear origin–destination interactions while making optimal use of the available data. We further apply conformal prediction to construct statistically valid prediction intervals for bilateral flows. Analyzing the text component yields that destination-specific search intensity of migration terms is a central predictor of subsequent dyadic displacement flows. |
| Keywords: | conformal prediction, dyadic, early warning, forced displacement, forecasting, Google trends, machine learning |
| JEL: | P16 C53 D72 |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:bge:wpaper:1573 |
| By: | Dalibor Stevanovic |
| Abstract: | This paper studies the 2021 U.S. inflation forecasting failure. The author shows that the failure was primarily driven by sample composition rather than functional-form misspecification: estimation samples dominated by the Great Moderation underweight supply-shock regimes, and expectations anchored to that regime were slow to recognize the shift. Three historically informed adjustments, an intercept correction, a similarity re-estimation on 1970s data, and a kernel-weighted estimator, substantially close the forecast gap, and the gains extend to eight additional U.S. price indices. Household survey respondents over 60, whose lifetime includes the 1970s, reported higher inflation expectations from early 2021, consistent with experience-based learning; younger cohorts remained anchored to the prevailing regime. A controlled experiment with large language models conditioned on “experienced” and “young” professional personas confirms that experiential priors generate significant forecast differences under a common training leakage assumption. Across all three exercises, the source of the prior mattered more than the sophistication of the model. Cet article étudie l’échec des prévisions d’inflation aux États-Unis en 2021. L'auteur montre que cet échec s’explique principalement par la composition de l’échantillon d’estimation plutôt que par une mauvaise spécification de la forme fonctionnelle : des échantillons dominés par la période de la Grande Modération ont sous-pondéré les régimes marqués par des chocs d’offre, et des anticipations ancrées dans ce régime ont tardé à reconnaître le changement. Trois ajustements fondés sur l’expérience historique, une correction de constante, une ré-estimation par similarité à partir des données des années 1970, et un estimateur pondéré par noyau, réduisent substantiellement l’écart de prévision, et ces gains s’étendent à huit indices de prix américains supplémentaires. Les répondants aux enquêtes auprès des ménages âgés de plus de 60 ans, dont l’expérience de vie inclut les années 1970, ont déclaré des anticipations d’inflation plus élevées dès le début de 2021, ce qui est cohérent avec l’hypothèse d’un apprentissage fondé sur l’expérience ; les cohortes plus jeunes sont restées ancrées dans le régime dominant. Une expérience contrôlée utilisant de grands modèles de langage conditionnés par des profils professionnels « expérimentés » et « jeunes » confirme que des priors expérientiels génèrent des différences significatives de prévision sous une hypothèse commune de fuite d’information liée à l’entraînement. Dans les trois exercices, la source des croyances initiales a compté davantage que la sophistication du modèle. |
| Keywords: | Inflation forecasting, regime change, historical analogy, experience-based learning, expectations anchoring, large language models, Prévision de l’inflation, changement de régime, analogie historique, apprentissage fondé sur l’expérience, ancrage des anticipations, grands modèles de langage |
| JEL: | C22 C53 D84 E31 E37 |
| Date: | 2026–04–22 |
| URL: | https://d.repec.org/n?u=RePEc:cir:cirwor:2026s-06 |
| By: | Dalibor Stevanovic (University of Quebec in Montreal) |
| Abstract: | This paper studies the 2021 U.S. inflation forecasting failure. I show that the failure was primarily driven by sample composition rather than functional-form misspecification: estimation samples dominated by the Great Moderation underweight supplyshock regimes, and expectations anchored to that regime were slow to recognize the shift. Three historically informed adjustments, an intercept correction, a similarity re-estimation on 1970s data, and a kernel-weighted estimator, substantially close the forecast gap, and the gains extend to eight additional U.S. price indices. Household survey respondents over 60, whose lifetime includes the 1970s, reported higher inflation expectations from early 2021, consistent with experience-based learning; younger cohorts remained anchored to the prevailing regime. A controlled experiment with large language models conditioned on “experienced†and “young†professional personas confirms that experiential priors generate significant forecast differences under a common training leakage assumption. Across all three exercises, the source of the prior mattered more than the sophistication of the model. |
| Keywords: | Inflation forecasting, regime change, historical analogy, experience-based learning, expectations anchoring, large language models |
| JEL: | C22 C53 D84 E31 E37 |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:bbh:wpaper:26-02 |
| By: | Haibin Jiao |
| Abstract: | Shanghai Composite Index prediction has become a hot issue for many investors and academic researchers. Deep learning models are widely applied in multivariate time series forecasting, including recurrent neural networks (RNN), convolutional neural networks (CNN), and transformers. Specifically, the Transformer encoder, with its unique attention mechanism and parallel processing capabilities, has become an important tool in time series prediction, and has an advantage in dealing with long sequence dependencies and multivariate data correlations. Drawing on the strengths of various models, we propose the CNN-Transformer-LSTM Networks (CTLNet). This paper explores the application of CTLNet for Shanghai Composite Index prediction and the comparative experiments show that the proposed model outperforms state-of-the-art baselines. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2604.16835 |
| By: | Gould, Elliot (Interdisciplinary MetaResearch Group (SCORE Project)); Gray, Charles T.; Willcox, Aaron (Melbourne University); O'Dea, Rose E; Groenewegen, Rebecca; Wilkinson, David Peter |
| Abstract: | Structured elicitation protocols, such as the IDEA protocol, are used to elicit probabilistic judgements from multiple domain experts about uncertain events across fields including ecology, biosecurity risk assessment, and metascience. Individual expert judgements must subsequently be mathematically aggregated into a single group forecast. While the simplest case involves combining a set of point-estimates from multiple individuals, this process is further complicated when judgements include uncertainty bounds, or when elicitation is conducted across multiple rounds. This paper presents aggreCAT, an open-source R package that provides 29 aggregation methods for combining individual expert judgements into a single probabilistic estimate, accommodating designs ranging from single-round point estimates to multi-round three-point elicitation. The package follows tidy data principles, enabling straightforward integration with existing R workflows for application at scale. Methods range from unweighted arithmetic combinations to performance-weighted schemes and Bayesian models, with weights derived from uncertainty intervals, shifts in judgements between elicitation rounds, and breadth of expert reasoning. We provide worked examples illustrating the mechanics of representative aggregation methods, a general workflow for batch aggregation across multiple forecasts and methods, and built-in functions for evaluating and visualising forecast performance against known outcomes. aggreCAT fills a substantive gap in open software for mathematically aggregating expert judgement, and is intended to support researchers and decision analysts in rapidly and rigorously synthesising outputs from structured elicitation exercises. |
| Date: | 2026–04–14 |
| URL: | https://d.repec.org/n?u=RePEc:osf:metaar:74tfv_v2 |
| By: | Yonggeun Jung |
| Abstract: | Satellite data are increasingly used to measure economic activity, yet port-level trade remains largely unmeasured from space. This paper combines synthetic aperture radar imagery, nighttime lights, and port characteristics to measure monthly port-level maritime trade using only publicly available data. The model achieves strong out-of-sample accuracy for U.S. ports, with satellite signals and port attributes playing complementary roles. While absolute levels are difficult to extrapolate beyond the training domain, percentage changes are reliably recovered, as we confirm through a leave-one-region-out exercise and Monte Carlo simulation. Applying the framework to Russian ports after the 2022 sanctions, we detect shifts consistent with trade reorientation toward the Far East. The approach complements AIS-based methods by remaining robust to strategic signal manipulation. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2604.15444 |
| By: | Sotirios D. Nikolopoulos |
| Abstract: | Adaptive specification search generates statistically significant backtests even under martingale-difference nulls. We introduce a falsification audit testing complete predictive workflows against synthetic reference classes, including zero-predictability environments and microstructure placebos. Workflows generating significant walk-forward evidence in these environments are falsified. For passing workflows, we quantify selection-induced performance inflation using an absolute magnitude gap linking optimized in-sample evidence to disjoint walk-forward realizations, adjusted for effective multiplicity. Simulations validate extreme-value scaling under correlated searches and demonstrate detection power under genuine structure. Empirical case studies confirm that many apparent findings represent methodological artifacts rather than genuine predictability. |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2604.15531 |
| By: | Philipp Wegmuller; Jan P.A.M. Jacobs; Marc Burri |
| Abstract: | We compile a harmonised real-time vintage dataset of quarterly GDP releases for 12 countries as well as the European Union and euro area aggregates, and evaluate the quality of flash estimates relative to later releases and more mature benchmarks. We document substantial cross-country heterogeneity in revision behaviour, with revision magnitudes increasing markedly during periods of elevated volatility. We further show that revision-aware state-space methods can, in some settings and depending on the evaluation benchmark, improve upon the raw flash release as an approximation to more mature GDP growth. Overall, the results highlight the trade-off between timeliness and precision in early national-accounts data and show that the real-time reliability of flash GDP depends importantly on benchmark choice, revision dynamics, and national compilation practices. |
| Keywords: | GDP, advanced releases, revisions, nowcasting, state-space models |
| JEL: | E23 E32 E37 |
| Date: | 2026–04 |
| URL: | https://d.repec.org/n?u=RePEc:een:camaaa:2026-26 |