nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒11‒28
twenty-one papers chosen by
Sune Karlsson
Örebro universitet

  1. On Estimation and Inference of Large Approximate Dynamic Factor Models via the Principal Component Analysis By Matteo Barigozzi
  2. Low-rank Panel Quantile Regression: Estimation and Inference By Yiren Wang; Liangjun Su; Yichong Zhang
  3. Network Synthetic Interventions: A Framework for Panel Data with Network Interference By Anish Agarwal; Sarah Cen; Devavrat Shah; Christina Lee Yu
  4. Estimation of Heterogeneous Treatment Effects Using a Conditional Moment Based Approach By Xiaolin Sun
  5. Weak Identification in Low-Dimensional Factor Models with One or Two Factors By Gregory Cox
  6. Allowing for weak identification when testing GARCH-X type models By Philipp Ketz
  7. Efficient variational approximations for state space models By Rub\'en Loaiza-Maya; Didier Nibbering
  8. Unit Averaging for Heterogeneous Panels By Christian Brownlees; Vladislav Morozov
  9. Eigenvalue tests for the number of latent factors in short panels By Alain-Philippe Fortin; Patrick Gagliardini; Olivier Scaillet
  10. Online Instrumental Variable Regression: Regret Analysis and Bandit Feedback By Riccardo Della Vecchia; Debabrota Basu
  11. Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data By Giovanni Ballarin; Petros Dellaportas; Lyudmila Grigoryeva; Marcel Hirt; Sophie van Huellen; Juan-Pablo Ortega
  12. The Anatomy of Out-of-Sample Forecasting Accuracy By Daniel Borup; Philippe Goulet Coulombe; Erik Christian Montes Schütte; David E. Rapach; Sander Schwenk-Nebbe
  13. Modelling the Bitcoin prices and the media attention to Bitcoin via the jump-type processes By Ekaterina Morozova; Vladimir Panov
  14. Differentiable State-Space Models and Hamiltonian Monte Carlo Estimation By David Childers; Jesús Fernández-Villaverde; Jesse Perla; Christopher Rackauckas; Peifan Wu
  15. There’s More in the Data! Using Month-Specific Information to Estimate Changes Before and After Major Life Events By Hudde, Ansgar; Jacob, Marita
  16. Uncertainty, Skewness and the Business Cycle - Through the MIDAS Lens By Efrem Castelnuovo; Lorenzo Mori
  17. Forecasting Ination: A GARCH-in-Mean-Level Model with Time Varying Predictability. By Alessandra Canepa,; Karanasos, Menelaos; Paraskevopoulos, Athanasios; Chini, Emilio Zanetti
  18. How to Measure Agreement, Consensus, and Polarization in Ordinal Data By Aeppli, Clem; Ruedin, Didier
  19. A New Test for Market Efficiency and Uncovered Interest Parity By Richard T. Baillie; Francis X. Diebold; George Kapetanios; Kun Ho Kim
  20. A parametric approach to the estimation of convex risk functionals based on Wasserstein distance By Max Nendel; Alessandro Sgarabottolo
  21. Do Pre-Registration and Pre-analysis Plans Reduce p-Hacking and Publication Bias? By Brodeur, Abel; Cook, Nikolai; Hartley, Jonathan; Heyes, Anthony

  1. By: Matteo Barigozzi
    Abstract: This paper revisits and provides an alternative derivation of the asymptotic results for the Principal Components estimator of a large approximate factor model as considered in Stock and Watson (2002), Bai (2003), and Forni et al. (2009). Results are derived under a minimal set of assumptions with a special focus on the time series setting, which is usually considered in almost all recent empirical applications. Hence, $n$ and $T$ are not treated symmetrically, the former being the dimension of the considered vector of time series, while the latter being the sample size and, therefore, being relevant only for estimation purposes, but not when it comes to just studying the properties of the model at a population level. As a consequence, following Stock and Watson (2002) and Forni et al. (2009), estimation is based on the classical $n \times n$ sample covariance matrix. As expected, all asymptotic results we derive are equivalent to those stated in Bai (2003), where, however, a $T\times T$ covariance matrix is considered as a starting point. A series of useful complementary results is also given. In particular, we give some alternative sets of primitive conditions for mean-squared consistency of the sample covariance matrix of the factors, of the idiosyncratic components, and of the observed time series. We also give more intuitive asymptotic expansions for the estimators showing that PCA is equivalent to OLS as long as $\sqrt{T}/n\to 0$ and $\sqrt{n}/T\to 0$, that is loadings are estimated in a time series regression as if the factors were known, while factors are estimated in a cross-sectional regression as if the loadings were known. The issue of testing multiple restrictions on the loadings as well as building joint confidence intervals for the factors is discussed.
    Date: 2022–11
  2. By: Yiren Wang; Liangjun Su; Yichong Zhang
    Abstract: In this paper, we propose a class of low-rank panel quantile regression models which allow for unobserved slope heterogeneity over both individuals and time. We estimate the heterogeneous intercept and slope matrices via nuclear norm regularization followed by sample splitting, row- and column-wise quantile regressions and debiasing. We show that the estimators of the factors and factor loadings associated with the intercept and slope matrices are asymptotically normally distributed. In addition, we develop two specification tests: one for the null hypothesis that the slope coefficient is a constant over time and/or individuals under the case that true rank of slope matrix equals one, and the other for the null hypothesis that the slope coefficient exhibits an additive structure under the case that the true rank of slope matrix equals two. We illustrate the finite sample performance of estimation and inference via Monte Carlo simulations and real datasets.
    Date: 2022–10
  3. By: Anish Agarwal; Sarah Cen; Devavrat Shah; Christina Lee Yu
    Abstract: We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific treatment effects from panel data where there are spillover effects across units and in the presence of unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, "network synthetic interventions", and show that it consistently estimates the mean outcomes for a unit under an arbitrary sequence of treatments for itself and its neighborhood, given certain observation patterns hold in the data. We corroborate our theoretical findings with simulations.
    Date: 2022–10
  4. By: Xiaolin Sun
    Abstract: We propose a new estimator for heterogeneous treatment effects in a partially linear model (PLM) with many exogenous covariates and a possibly endogenous treatment variable. The PLM has a parametric part that includes the treatment and the interactions between the treatment and exogenous characteristics, and a nonparametric part that contains those characteristics and many other covariates. The new estimator is a combination of a Robinson transformation to partial out the nonparametric part of the model, the Smooth Minimum Distance (SMD) approach to exploit all the information of the conditional mean independence restriction, and a Neyman-Orthogonalized first-order condition (FOC). With the SMD method, our estimator using only one valid binary instrument identifies both parameters. With the sparsity assumption, using regularized machine learning methods (i.e., the Lasso method) allows us to choose a relatively small number of polynomials of covariates. The Neyman-Orthogonalized FOC reduces the effect of the bias associated with the regularization method on estimates of the parameters of interest. Our new estimator allows for many covariates and is less biased, consistent, and $\sqrt{n}$-asymptotically normal under standard regularity conditions. Our simulations show that our estimator behaves well with different sets of instruments, but the GMM type estimators do not. We estimate the heterogeneous treatment effects of Medicaid on individual outcome variables from the Oregon Health Insurance Experiment. We find using our new method with only one valid instrument produces more significant and more reliable results for heterogeneous treatment effects of health insurance programs on economic outcomes than using GMM type estimators.
    Date: 2022–10
  5. By: Gregory Cox
    Abstract: This paper describes how to reparameterize low-dimensional factor models to fit the weak identification theory developed for generalized method of moments (GMM) models. Identification conditions in low-dimensional factor models can be close to failing in a similar way to identification conditions in instrumental variables or GMM models. Weak identification estimation theory requires a reparameterization to separate the weakly identified parameters from the strongly identified parameters. Furthermore, identification-robust hypothesis tests benefit from a reparameterization that makes the nuisance parameters strongly identified. We describe such a reparameterization in low-dimensional factor models with one or two factors. Simulations show that identification-robust hypothesis tests that require the reparameterization are less conservative than identification-robust hypothesis tests that use the original parameterization. The simulations also show that estimates of the number of factors frequently include weakly identified factors. An empirical application to a factor model of parental investments in children is included.
    Date: 2022–11
  6. By: Philipp Ketz
    Abstract: In this paper, we use the results in Andrews and Cheng (2012), extended to allow for parameters to be near or at the boundary of the parameter space, to derive the asymptotic distributions of the two test statistics that are used in the two-step (testing) procedure proposed by Pedersen and Rahbek (2019). The latter aims at testing the null hypothesis that a GARCH-X type model, with exogenous covariates (X), reduces to a standard GARCH type model, while allowing the "GARCH parameter" to be unidentified. We then provide a characterization result for the asymptotic size of any test for testing this null hypothesis before numerically establishing a lower bound on the asymptotic size of the two-step procedure at the 5% nominal level. This lower bound exceeds the nominal level, revealing that the two-step procedure does not control asymptotic size. In a simulation study, we show that this finding is relevant for finite samples, in that the two-step procedure can suffer from overrejection in finite samples. We also propose a new test that, by construction, controls asymptotic size and is found to be more powerful than the two-step procedure when the "ARCH parameter" is "very small" (in which case the two-step procedure underrejects).
    Date: 2022–10
  7. By: Rub\'en Loaiza-Maya; Didier Nibbering
    Abstract: Variational Bayes methods are a scalable estimation approach for many complex state space models. However, existing methods exhibit a trade-off between accurate estimation and computational efficiency. This paper proposes a variational approximation that mitigates this trade-off. This approximation is based on importance densities that have been proposed in the context of efficient importance sampling. By directly conditioning on the observed data, the proposed method produces an accurate approximation to the exact posterior distribution. Because the steps required for its calibration are computationally efficient, the approach is faster than existing variational Bayes methods. The proposed method can be applied to any state space model that has a closed-form measurement density function and a state transition distribution that belongs to the exponential family of distributions. We illustrate the method in numerical experiments with stochastic volatility models and a macroeconomic empirical application using a high-dimensional state space model.
    Date: 2022–10
  8. By: Christian Brownlees; Vladislav Morozov
    Abstract: In this work we introduce a unit averaging procedure to efficiently recover unit-specific parameters in a heterogeneous panel model. The procedure consists in estimating the parameter of a given unit using a weighted average of all the unit-specific parameter estimators in the panel. The weights of the average are determined by minimizing an MSE criterion. We analyze the properties of the minimum MSE unit averaging estimator in a local heterogeneity framework inspired by the literature on frequentist model averaging. The analysis of the estimator covers both the cases in which the cross-sectional dimension of the panel is fixed and large. In both cases, we obtain the local asymptotic distribution of the minimum MSE unit averaging estimators and of the associated weights. A GDP nowcasting application for a panel of European countries showcases the benefits of the procedure.
    Date: 2022–10
  9. By: Alain-Philippe Fortin; Patrick Gagliardini; Olivier Scaillet
    Abstract: This paper studies new tests for the number of latent factors in a large cross-sectional factor model with small time dimension. These tests are based on the eigenvalues of variance-covariance matrices of (possibly weighted) asset returns, and rely on either the assumption of spherical errors, or instrumental variables for factor betas. We establish the asymptotic distributional results using expansion theorems based on perturbation theory for symmetric matrices. Our framework accommodates semi-strong factors in the systematic components. We propose a novel statistical test for weak factors against strong or semi-strong factors. We provide an empirical application to US equity data. Evidence for a different number of latent factors according to market downturns and market upturns, is statistically ambiguous in the considered subperiods. In particular, our results contradicts the common wisdom of a single factor model in bear markets.
    Date: 2022–10
  10. By: Riccardo Della Vecchia (Scool - Scool - Inria Lille - Nord Europe - Inria - Institut National de Recherche en Informatique et en Automatique - CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 - Centrale Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique); Debabrota Basu (Scool - Scool - Inria Lille - Nord Europe - Inria - Institut National de Recherche en Informatique et en Automatique - CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 - Centrale Lille - Université de Lille - CNRS - Centre National de la Recherche Scientifique)
    Abstract: The independence of noise and covariates is a standard assumption in online linear regression and linear bandit literature. This assumption and the following analysis are invalid in the case of endogeneity, i.e., when the noise and covariates are correlated. In this paper, we study the online setting of instrumental variable (IV) regression, which is widely used in economics to tackle endogeneity. Specifically, we analyse and upper bound regret of Two-Stage Least Squares (2SLS) approach to IV regression in the online setting. Our analysis shows that Online 2SLS (O2SLS) achieves $O(d^2 \log^2 T)$ regret after $T$ interactions, where d is the dimension of covariates. Following that, we leverage the O2SLS as an oracle to design OFUL-IV, a linear bandit algorithm. OFUL-IV can tackle endogeneity and achieves $O(d \sqrt{T} \log T)$ regret. For datasets with endogeneity, we experimentally demonstrate that O2SLS and OFUL-IV incur lower regrets than the state-of-the-art algorithms for both the online linear regression and linear bandit settings.
    Keywords: Causality,Instrumental Variables,Online linear regression,Online learning,Bandit / imperfect feedback,Linear bandits,Regret Bounds,Econometrics,Two-stage regression
    Date: 2022–10–26
  11. By: Giovanni Ballarin; Petros Dellaportas; Lyudmila Grigoryeva; Marcel Hirt; Sophie van Huellen; Juan-Pablo Ortega
    Abstract: Macroeconomic forecasting has recently started embracing techniques that can deal with large-scale datasets and series with unequal release periods. The aim is to exploit the information contained in heterogeneous data sampled at different frequencies to improve forecasting exercises. Currently, MIxed-DAta Sampling (MIDAS) and Dynamic Factor Models (DFM) are the two main state-of-the-art approaches that allow modeling series with non-homogeneous frequencies. We introduce a new framework called the Multi-Frequency Echo State Network (MFESN), which originates from a relatively novel machine learning paradigm called reservoir computing (RC). Echo State Networks are recurrent neural networks with random weights and trainable readout. They are formulated as nonlinear state-space systems with random state coefficients where only the observation map is subject to estimation. This feature makes the estimation of MFESNs considerably more efficient than DFMs. In addition, the MFESN modeling framework allows to incorporate many series, as opposed to MIDAS models, which are prone to the curse of dimensionality. Our discussion encompasses hyperparameter tuning, penalization, and nonlinear multistep forecast computation. In passing, a new DFM aggregation scheme with Almon exponential structure is also presented, bridging MIDAS and dynamic factor models. All methods are compared in extensive multistep forecasting exercises targeting US GDP growth. We find that our ESN models achieve comparable or better performance than MIDAS and DFMs at a much lower computational cost.
    Date: 2022–11
  12. By: Daniel Borup; Philippe Goulet Coulombe; Erik Christian Montes Schütte; David E. Rapach; Sander Schwenk-Nebbe
    Abstract: We develop metrics based on Shapley values for interpreting time-series forecasting models, including “black-box” models from machine learning. Our metrics are model agnostic, so that they are applicable to any model (linear or nonlinear, parametric or nonparametric). Two of the metrics, iShapley-VI and oShapley-VI, measure the importance of individual predictors in fitted models for explaining the in-sample and out-of-sample predicted target values, respectively. The third metric is the performance-based Shapley value (PBSV), our main methodological contribution. PBSV measures the contributions of individual predictors in fitted models to the out-of-sample loss and thereby anatomizes out-of-sample forecasting accuracy. In an empirical application forecasting US inflation, we find important discrepancies between individual predictor relevance according to the in-sample iShapley-VI and out-of-sample PBSV. We use simulations to analyze potential sources of the discrepancies, including overfitting, structural breaks, and evolving predictor volatilities.
    Keywords: variable importance; out-of-sample performance; Shapley value; loss function; machine learning; inflation
    JEL: C22 C45 C53 E37 G17
    Date: 2022–11–07
  13. By: Ekaterina Morozova; Vladimir Panov
    Abstract: In this paper, we present a new bivariate model for the joint description of the Bitcoin prices and the media attention to Bitcoin. Our model is based on the class of the L\'evy processes and is able to realistically reproduce the jump-type dynamics of the considered time series. We focus on the low-frequency setup, which is for the L\'evy - based models essentially more difficult than the high-frequency case. We design a semiparametric estimation procedure for the statistical inference on the parameters and the L\'evy measures of the considered processes. We show that the dynamics of the market attention can be effectively modelled by the L\'evy processes with finite L\'evy measures, and propose a data-driven procedure for the description of the Bitcoin prices.
    Date: 2022–10
  14. By: David Childers; Jesús Fernández-Villaverde; Jesse Perla; Christopher Rackauckas; Peifan Wu
    Abstract: We propose a methodology to take dynamic stochastic general equilibrium (DSGE) models to the data based on the combination of differentiable state-space models and the Hamiltonian Monte Carlo (HMC) sampler. First, we introduce a method for implicit automatic differentiation of perturbation solutions of DSGE models with respect to the model's parameters. We can use the resulting output for various tasks requiring gradients, such as building an HMC sampler, to estimate first- and second-order approximations of DSGE models. The availability of derivatives also enables a general filter-free method to estimate nonlinear, non-Gaussian DSGE models by sampling the joint likelihood of parameters and latent states. We show that the gradient-based joint likelihood sampling approach is superior in efficiency and robustness to standard Metropolis-Hastings samplers by estimating a canonical real business cycle model, a real small open economy model, and a medium-scale New Keynesian DSGE model.
    JEL: C01 C10 C11 E0
    Date: 2022–10
  15. By: Hudde, Ansgar; Jacob, Marita (University of Cologne)
    Abstract: Sociological research is increasingly using panel data to examine changes in diverse outcomes over life course events. Most of these studies have one striking similarity: they analyse changes between yearly time intervals. In this paper, we present a simple but effective method to model such trajectories more precisely using available data. The approach exploits month-specific information regarding interview and life-event dates. Using fixed effects regression models, we calculate monthly dummy estimates around life events and then run nonparametric smoothing to create smoothed monthly estimates. We test the approach using Monte Carlo simulations and GSOEP data. Monte Carlo simulations show that the newly proposed smoothed monthly estimates outperform yearly dummy estimates, especially when there is rapid change or discontinuities in trends at the event. In the real data analyses, the novel approach reports an amplitude of change that is roughly twice as large amplitude of change and greater gender differences than yearly estimates. It also reveals a discontinuity in trajectories at bereavement, but not at childbirth. Our proposed method can be applied to several available data sets and a variety of outcomes and life events. Thus, for research on changes around life events, it serves as a powerful new tool in the researcher’s toolbox.
    Date: 2022–10–18
  16. By: Efrem Castelnuovo; Lorenzo Mori
    Abstract: We employ a mixed-frequency quantile regression approach to model the time-varying conditional distribution of the US real GDP growth rate. We show that monthly information on the US financial cycle improves the predictive power of an otherwise quarterly-only model. We combine selected quantiles of the estimated conditional distribution to produce measures of uncertainty and skewness. Embedding these measures in a VAR framework, we show that unexpected changes in uncertainty are associated with an increase in (left) skewness and a downturn in real activity. Empirical findings related to VAR impulse responses and forecast error variance decomposition are shown to depend on the inclusion/omission of monthly-level information on financial conditions when estimating real GDP growth’s conditional density. Effects are significantly downplayed if we consider a quarterly-only quantile regression model. A counterfactual simulation conducted by shutting down the endogenous response of skewness to uncertainty shocks shows that skewness substantially amplifies the recessionary effects of uncertainty.
    Keywords: Uncertainty, skewness, quantile regressions, VARs, MIDAS
    JEL: C32 E32
    Date: 2022–10
  17. By: Alessandra Canepa,; Karanasos, Menelaos; Paraskevopoulos, Athanasios; Chini, Emilio Zanetti (University of Turin)
    Abstract: In this paper we employ an autoregressive GARCH-in-mean-level process with variable coe¢ cients to forecast in?ation and investigate the behavior of its persistence in the United States. We propose new measures of time varying persistence, which not only distinguish between changes in the dynamics of in?ation and its volatility, but are also allow for feedback between the two variables. Since it is clear from our analysis that predictability is closely interlinked with (?rst-order) persistence we coin the term persistapredictability. Our empirical results suggest that the proposed model has good forecasting properties.
    Date: 2022–09
  18. By: Aeppli, Clem; Ruedin, Didier (University of Neuchâtel)
    Abstract: Different measures exist to capture agreement, consensus, concentration, dispersion, and polarization in ordinal data. We compare consensus scores across specific situations for a better understanding of how different measures work in practice: constructed cases, simulated data where we know the underlying distribution, and empirical data. Although researchers have solved the ‘problem’ of measuring agreement, consensus, and polarization several times, we highlight similarities and equivalence across some existing approaches, while others differ substantially. The choice of method can lead to substantively different conclusions, and we recommend that researchers use a combination of measures and use graphics to examine the distribution qualitatively.
    Date: 2022–10–24
  19. By: Richard T. Baillie; Francis X. Diebold; George Kapetanios; Kun Ho Kim
    Abstract: We suggest a new single-equation test for Uncovered Interest Parity (UIP) based on a dynamic regression approach. The method provides consistent and asymptotically efficient parameter estimates, and is not dependent on assumptions of strict exogeneity. This new approach is asymptotically more efficient than the common approach of using OLS with HAC robust standard errors in the static forward premium regression. The coefficient estimates when spot return changes are regressed on the forward premium are all positive and remarkably stable across currencies. These estimates are considerably larger than those of previous studies, which frequently find negative coefficients. The method also has the advantage of showing dynamic effects of risk premia, or other events that may lead to rejection of UIP or the efficient markets hypothesis.
    JEL: C22 F30
    Date: 2022–11
  20. By: Max Nendel; Alessandro Sgarabottolo
    Abstract: In this paper, we explore a static setting for the assessment of risk in the context of mathematical finance and actuarial science that takes into account model uncertainty in the distribution of a possibly infinite-dimensional risk factor. We allow for perturbations around a baseline model, measured via Wasserstein distance, and we investigate to which extent this form of probabilistic imprecision can be parametrized. The aim is to come up with a convex risk functional that incorporates a sefety margin with respect to nonparametric uncertainty and still can be approximated through parametrized models. The particular form of the parametrization allows us to develop a numerical method, based on neural networks, which gives both the value of the risk functional and the optimal perturbation of the reference measure. Moreover, we study the problem under additional constraints on the perturbations, namely, a mean and a martingale constraint. We show that, in both cases, under suitable conditions on the loss function, it is still possible to estimate the risk functional by passing to a parametric family of perturbed models, which again allows for a numerical approximation via neural networks.
    Date: 2022–10
  21. By: Brodeur, Abel; Cook, Nikolai; Hartley, Jonathan; Heyes, Anthony
    Abstract: Randomized controlled trials (RCTs) are increasingly prominent in economics, with pre-registration and pre-analysis plans (PAPs) promoted as important in ensuring the credibility of findings. We investigate whether these tools reduce the extent of p-hacking and publication bias by collecting and studying the universe of test statistics, 15,992 in total, from RCTs published in 15 leading economics journals from 2018 through 2021. In our primary analysis, we find no meaningful difference in the distribution of test statistics from pre-registered studies, compared to their non-pre-registered counterparts. However, pre-registered studies that have a complete PAP are significantly less p-hacked. These results point to the importance of PAPs, rather than pre-registration in itself, in ensuring credibility.
    Date: 2022–08–11

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.