nep-for New Economics Papers
on Forecasting
Issue of 2022‒04‒04
eight papers chosen by
Rob J Hyndman
Monash University

  1. Volatility forecasting with machine learning and intraday commonality By Chao Zhang; Yihuang Zhang; Mihai Cucuringu; Zhongmin Qian
  2. Forecasting US Inflation Using Bayesian Nonparametric Models By Todd E. Clark; Florian Huber; Gary Koop; Massimiliano Marcellino
  3. Forecasting GDP growth using stock returns in Japan: A factor-augmented MIDAS approach By Morita, Hiroshi
  4. Nowcasting real GDP in Tunisia using large datasets and mixed-frequency models By Hagher Ben Rhomdhane; Brahim Mehdi Benlallouna
  5. Explainable Artificial Intelligence: interpreting default forecasting models based on Machine Learning By Giuseppe Cascarino; Mirko Moscatelli; Fabio Parlapiano
  6. Optimal Forecast under Structural Breaks By Tae-Hwy Lee; Shahnaz Parsaeian; Aman Ullah
  7. Variational Bayes in State Space Models: Inferential and Predictive Accuracy By David T. Frazier; Gael M. Martin; Ruben Loaiza-Maya
  8. Predicting refugee flows from Ukraine with an approach to Big (Crisis) Data: a new opportunity for refugee and humanitarian studies By Jurić, Tado

  1. By: Chao Zhang; Yihuang Zhang; Mihai Cucuringu; Zhongmin Qian
    Abstract: We apply machine learning models to forecast intraday realized volatility (RV), by exploiting commonality in intraday volatility via pooling stock data together, and by incorporating a proxy for the market volatility. Neural networks dominate linear regressions and tree models in terms of performance, due to their ability to uncover and model complex latent interactions among variables. Our findings remain robust when we apply trained models to new stocks that have not been included in the training set, thus providing new empirical evidence for a universal volatility mechanism among stocks. Finally, we propose a new approach to forecasting one-day-ahead RVs using past intraday RVs as predictors, and highlight interesting diurnal effects that aid the forecasting mechanism. The results demonstrate that the proposed methodology yields superior out-of-sample forecasts over a strong set of traditional baselines that only rely on past daily RVs.
    Date: 2022–02
  2. By: Todd E. Clark; Florian Huber; Gary Koop; Massimiliano Marcellino
    Abstract: The relationship between inflation and predictors such as unemployment is potentially nonlinear with a strength that varies over time, and prediction errors error may be subject to large, asymmetric shocks. Inspired by these concerns, we develop a model for inflation forecasting that is nonparametric both in the conditional mean and in the error using Gaussian and Dirichlet processes, respectively. We discuss how both these features may be important in producing accurate forecasts of inflation. In a forecasting exercise involving CPI inflation, we find that our approach has substantial benefits, both overall and in the left tail, with nonparametric modeling of the conditional mean being of particular importance.
    Keywords: nonparametric regression; Gaussian process; Dirichlet process mixture; inflation forecasting
    JEL: C11 C32 C53
    Date: 2022–03–02
  3. By: Morita, Hiroshi
    Abstract: Asset prices reflect expectations of future economic conditions. In this study, we use the property of asset prices, especially stock prices, to forecast the GDP growth rate in Japan. For optimal use of the rich time-series and cross-sectional information of stock prices, we combine MIDAS (mixed-data sampling) regression and factor analysis to examine which dimensions of information contribute to the accuracy of the GDP growth rate forecast. Our results show that the use of factors significantly improves forecast accuracy and that extracting factors from a broader set of stock prices further improves accuracy. This highlights the important role of cross-sectional stock market information in forecasting macroeconomic activity.
    Keywords: Forecasting, MIDAS regression, factor model, stock returns
    JEL: C22 C53 E37
    Date: 2022–03
  4. By: Hagher Ben Rhomdhane (Central Bank of Tunisia); Brahim Mehdi Benlallouna (Central Bank of Tunisia)
    Abstract: This study aims to construct a new monthly leading indicator for Tunisian economic activity and to forecast Tunisian quarterly real GDP (RGDP) using several mixed-frequency models. These include a mixed dynamic factor model, unrestricted mixed-data sampling (UMIDAS), and a threepass regression filter (3PRF) developed at the Central Bank of Tunisia, based on a monthly/quarterly set of economic and financial indicators as predictors. Our methodology is based on direct and indirect approaches, and the direct approach nowcasts aggregate RGDPs. The indirect approach is a disaggregated approach based on the output side of GDP (manufacturing, non-manufacturing, and services) using a set of available monthly indicators by sector. Furthermore, mixed-frequency dynamic factor models and unrestricted MIDAS perform well in terms of root mean squared errors compared to the benchmark model VAR (2). The forecast errors derived from the disaggregated approach during the recent COVID period are smaller than those derived from classical models such as VAR (2). In our model, we used indicators such as electricity consumption by sector, stock market index detailed by sector, and international economic surveys to capture the pandemic effect. The financial variables improve forecasting for all horizons. Additionally, we find that it is better to employ several UMIDAS-ARs by each component of GDP at constant prices and to pool the results rather than relying on aggregated GDP, specifically in volatile times.
    Keywords: Mixed-Frequency Data Sampling; Nowcasting; short-term forecasting
    JEL: E37 C55 F17 O11
    Date: 2022–03–07
  5. By: Giuseppe Cascarino (Bank of Italy); Mirko Moscatelli (Bank of Italy); Fabio Parlapiano (Bank of Italy)
    Abstract: Forecasting models based on machine learning (ML) algorithms have been shown to outperform traditional models in several applications. The lack of an easily interpretable functional form, however, is a major challenge for their adoption, especially when a knowledge of the estimated relationships and an explanation of individual forecasts are needed, for instance due to regulatory requirements or when forecasts are used in policy making. We apply some of the most established methods from the eXplainable Artificial Intelligence (XAI) literature to shed light on the random forest corporate default forecasting model in Moscatelli et al. (2019) applied to Italian non-financial firms. The methods provide insight into the relative importance of financial and credit variables to predict firms’ financial distress. We complement the analysis by showing how the importance of these variables in explaining default risk changes over time in the period 2009-19. When financial conditions deteriorate, the variables characterized by a more complex relationship with financial distress, such as firms’ liquidity and indebtedness indicators, become more important in predicting borrowers’ defaults. We also discuss how ML models could enhance the accuracy of credit assessment for those borrowers with less developed credit relationships such as smaller firms
    Keywords: explainable artificial intelligence, model-agnostic explainability, artificial intelligence, machine learning, credit scoring, fintech
    JEL: G2 C52 C55 D83
    Date: 2022–03
  6. By: Tae-Hwy Lee (Department of Economics, University of California at Riverside, CA 92521); Shahnaz Parsaeian (Department of Economics, University of Kansas, Lawrence, KS 66045); Aman Ullah (Department of Economics, University of California at Riverside, CA 92521)
    Abstract: This paper develops an optimal combined estimator to forecast out-of-sample under structural breaks. When it comes to forecasting, using only the post-break observations after the most recent break point may not be optimal. In this paper we propose a new estimation method that exploits the pre-break information. In particular, we show how to combine the estimator using the full-sample (i.e., both the pre-break and post-break data) and the estimator using only the post-break sample. The full-sample estimator is inconsistent when there is a break while it is efficient. The post-break estimator is consistent but inefficient. Hence, depending on the severity of the breaks, the full-sample estimator and the post-break estimator can be combined to balance the consistency and effciency. We derive the Stein-like combined estimator of the full-sample and the post-break estimators, to balance the bias-variance trade-o . The combination weight depends on the break severity, which we measure by the Wu-Hausman statistic. We examine the properties of the proposed method, analytically in theory, numerically in simulation, and also empirically in forecasting real output growth across nine industrial economies.
    Keywords: Structural breaks, Combined estimator
    Date: 2021–01
  7. By: David T. Frazier; Gael M. Martin; Ruben Loaiza-Maya
    Abstract: Using theoretical and numerical results, we document the accuracy of commonly applied variational Bayes methods across a range of state space models. The results demonstrate that, in terms of accuracy on fixed parameters, there is a clear hierarchy in terms of the methods, with approaches that do not approximate the states yielding superior accuracy over methods that do. We also document numerically that the inferential discrepancies between the various methods often yield only small discrepancies in predictive accuracy over small out-of-sample evaluation periods. Nevertheless, in certain settings, these predictive discrepancies can become meaningful over a longer out-of-sample period. This finding indicates that the invariance of predictive results to inferential inaccuracy, which has been an oft-touted point made by practitioners seeking to justify the use of variational inference, is not ubiquitous and must be assessed on a case-by-case basis.
    Keywords: state space models, variational inference, probabilistic forecasting, Bayesian consistency, scoring rules
    Date: 2022
  8. By: Jurić, Tado
    Abstract: Background: This paper shows that Big Data and the so-called tools of digital demography, such as Google Trends (GT) and insights from social networks such as Instagram, Twitter and Facebook, can be useful for determining, estimating, and predicting the forced migration flows to the EU caused by the war in Ukraine. Objective: The objective of this study was to test the usefulness of Google Trends indexes to predict further forced migration from Ukraine to the EU (mainly to Germany) and gain demographic insights from social networks into the age and gender structure of refugees. Methods: The primary methodological concept of our approach is to monitor the digital trace of Internet searches in Ukrainian, Russian and English with the Google Trends analytical tool ( Initially, keywords were chosen that are most predictive, specific, and common enough to predict the forced migration from Ukraine. We requested the data before and during the war outbreak and divided the keyword frequency for each migration-related query to standardise the data. We compared this search frequency index with official statistics from UNHCR to prove the significations of results and correlations and test the models predictive potential. Since UNHCR does not yet have complete data on the demographic structure of refugees, to fill this gap, we used three other alternative Big Data sources: Facebook, Twitter and Instagram. Results: All tested migration-related search queries about emigration planning from Ukraine show the positive linear association between Google index and data from official UNHCR statistics; R2 = 0.1211 for searches in Russian and R2 = 0.1831 for searches in Ukrainian. It is noticed that Ukrainians use the Russian language more often to search for terms than Ukrainian. Increase in migration-related search activities in Ukraine such as граница (Rus. border), кордону (Ukr. border); Польща (Poland); Германия (Rus. Germany), Німеччина (Ukr. Germany) and Угорщина and Венгрия (Hungary) correlate strongly with officially UNHCR data for externally displaced persons from Ukraine. All three languages show that the interest in Poland is the highest. When refugees arrive in nearby countries, the search for terms related to Germany, such as crossing the border + Germany, etc., is proliferating. This result confirms our hypothesis that one-third of all refugees will cross into Germany. According to Big Data insights, the estimate of the total number of expected refugees is to expect 5,4 Million refugees. The age group most represented is between 24 and 45 years (data for children are unavailable), and over 65% are women. Conclusion: The increase in migration-related search queries is correlated with the rise in the number of refugees from Ukraine in the EU. Thus this method allows reliable forecasts. Understanding the consequences of forced migration from Ukraine is crucial to enabling UNHCR and governments to develop optimal humanitarian strategies and prepare for refugee reception and possible integration. The benefit of this method is reliable estimates and forecasting that can allow governments and UNHCR to prepare and better respond to the recent humanitarian crisis.
    Keywords: refugee,Ukraine,Big Data,forced migration,Google Trends,UNHCR
    Date: 2022

This nep-for issue is ©2022 by Rob J Hyndman. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.