nep-for New Economics Papers
on Forecasting
Issue of 2019‒08‒19
three papers chosen by
Rob J Hyndman
Monash University

  1. Machine Learning for Forecasting Excess Stock Returns – The Five-Year-View By Ioannis Kyriakou; Parastoo Mousavi; Jens Perch Nielsen; Michael Scholz
  2. A Vine-copula extension for the HAR model By Martin Magris
  3. Clustering, Forecasting and Cluster Forecasting: using k-medoids, k-NNs and random forests for cluster selection By Dinesh Reddy Vangumalli; Konstantinos Nikolopoulos; Konstantia Litsiou

  1. By: Ioannis Kyriakou (Cass Business School, City, University of London, UK); Parastoo Mousavi (Cass Business School, City, University of London, UK); Jens Perch Nielsen (Cass Business School, City, University of London, UK); Michael Scholz (University of Graz, Austria)
    Abstract: In this paper, we apply machine learning to forecast stock returns in excess of different benchmarks, including the short-term interest rate, long-term interest rate, earnings-by-price ratio, and the inflation. In particular, we adopt and implement a fully nonparametric smoother with the covariates and the smoothing parameter chosen by cross-validation. We find that for both one-year and five-year returns, the term spread is, overall, the most powerful predictive variable for excess stock returns. Differently combined covariates can then achieve higher predictability for different forecast horizons. Nevertheless, the set of earnings-by-price and term spread predictors under the inflation benchmark strikes the right balance between the one-year and five-year horizon.
    Keywords: Benchmark; Cross-validation; Prediction; Stock returns; Long-term forecasts; Overlapping returns; Autocorrelation
    JEL: C14 C53 C58 G17 G22
    Date: 2019–08
  2. By: Martin Magris
    Abstract: The heterogeneous autoregressive (HAR) model is revised by modeling the joint distribution of the four partial-volatility terms therein involved. Namely, today's, yesterday's, last week's and last month's volatility components. The joint distribution relies on a (C-) Vine copula construction, allowing to conveniently extract volatility forecasts based on the conditional expectation of today's volatility given its past terms. The proposed empirical application involves more than seven years of high-frequency transaction prices for ten stocks and evaluates the in-sample, out-of-sample and one-step-ahead forecast performance of our model for daily realized-kernel measures. The model proposed in this paper is shown to outperform the HAR counterpart under different models for marginal distributions, copula construction methods, and forecasting settings.
    Date: 2019–07
  3. By: Dinesh Reddy Vangumalli (Oracle America Inc); Konstantinos Nikolopoulos (Bangor University); Konstantia Litsiou (Manchester Metropolitan University)
    Abstract: Data analysts when facing a forecasting task involving a large number of time series, they regularly employ one of the following two methodological approaches: either select a single forecasting method for the entire dataset (aggregate selection), or use the best forecasting method for each time series (individual selection). There is evidence in the predictive analytics literature that the former is more robust than the latter, as in individual selection you tend to overfit models to the data. A third approach is to firstly identify homogeneous clusters within the dataset, and then select a single forecasting method for each cluster (cluster selection). This research examines the performance of three well-celebrated machine learning clustering methods: k-medoids, k-NN and random forests. We then forecast every cluster with the best possible method, and the performance is compared to that of aggregate selection. The aforementioned methods are very often used for classification tasks, but since in our case there is no set of predefined classes, the methods are used for pure clustering. The evaluation is performed in the 645 yearly series of the M3 competition. The empirical evidence suggests that: a) random forests provide the best clusters for the sequential forecasting task, and b) cluster selection has the potential to outperform aggregate selection.
    Keywords: Clustering; k-medoids; Nearest Neighbors; Random Forests; Forecasting;
    Date: 2019–08

This nep-for issue is ©2019 by Rob J Hyndman. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.