nep-ecm New Economics Papers
on Econometrics
Issue of 2019‒12‒16
twenty-two papers chosen by
Sune Karlsson
Örebro universitet

  1. Simple Semiparametric Estimation of Ordered Response Models: with an Application to the Interdependence Duration Models By Ruixuan Liu; Zhengfei Yu
  2. Likelihood Induced by Moment Functions Using Particle Filter: a Comparison of Particle GMM and Standard MCMC Methods By Fabio Franco
  3. Penalized Estimation of Panel Vector Autoregressive Models By Schnücker, A.M.
  4. Wavelet Estimation for Dynamic Factor Models with Time-Varying Loadings By Duván Humberto Cataño; Carlos Vladimir Rodríguez-Caballero; Daniel Peña
  5. Wild Bootstrap for Fuzzy Regression Discontinuity Designs: Obtaining Robust Bias-Corrected Confidence Intervals By He, Yang; Bartalotti, Otávio
  6. Nonlinear Cointegrating Power Function Regression with Endogeneity By Zhishui Hu; Peter C.B. Phillips; Qiying Wang
  7. Covariates impacts in compositional models and simplicial derivatives By Thomas-Agnan, Christine; Morais, Joanna
  8. High-Dimensional Forecasting in the Presence of Unit Roots and Cointegration By Stephan Smeekes; Etienne Wijler
  9. Accelerated Failure Time Models with Log-concave Errors By Ruixuan Liu; Zhengfei Yu
  10. Estimating The Anomaly Base Rate By Alexander M. Chinco; Andreas Neuhierl; Michael Weber
  11. Inference under random limit bootstrap measures By Giuseppe Cavaliere; Iliyan Georgiev
  12. On the Measurement of Bias in Geographically Weighted Regression Models By Yu, Hanchen; Fotheringham, Alexander Stewart; Li, Ziqi; Oshan, Taylor M.; Wolf, Levi John
  13. Triple the gamma -- A unifying shrinkage prior for variance and variable selection in sparse state space and TVP models By Annalisa Cadonna; Sylvia Fr\"uhwirth-Schnatter; Peter Knaus
  14. Recursive Differencing for Estimating Semiparametric Models By Chan Shen Author-1-Name-First: Chan Author-1-Name-Last: Shen
  15. A comparison of machine learning model validation schemes for non-stationary time series data By Schnaubelt, Matthias
  16. Mean-shift least squares model averaging By Kenichiro McAlinn; Kosaku Takanashi
  17. Clustering and External Validity in Randomized Controlled Trials with Stochastic Potential Outcomes By Antoine Deeb; Cl\'ement de Chaisemartin
  18. Statistical mechanics and time-series analysis by L\'evy-parameters with the possibility of real-time application By Alexander Jurisch
  19. Information Theoretic Estimation of Econometric Functions By Millie Yi Mao; Aman Ullah
  20. Refinements of Barndorff-Nielsen and Shephard model: an analysis of crude oil price with machine learning By Indranil SenGupta; William Nganje; Erik Hanson
  21. Nonparametric sign prediction of high-dimensional correlation matrix coefficients By Christian Bongiorno; Damien Challet
  22. Cointegration, root functions and minimal bases By Massimo Franchi; Paolo Paruolo

  1. By: Ruixuan Liu; Zhengfei Yu
    Abstract: We propose two simple semiparametric estimation methods for ordered response models with an unknown error distribution. The proposed methods do not require users to choose any tuning parameter and they automatically incorporate the monotonicity restriction of the unknown distribution function. Fixing finite dimensional parameters in the model, we construct nonparametric maximum likelihood estimates (NPMLE) for the error distribution based on the related binary choice data or the entire ordered response data. We then obtain estimates for finite dimensional parameters based on moment conditions given the estimated distribution function. Our semiparametric approaches deliver root-n consistent and asymptotically normal estimators of the regression coefficient and threshold parameter. We also develop valid bootstrap procedures for inference. We apply our methods to the interdependent durations model in Honore and de Paula (2010), where the social interaction e ect is directly related to the threshold parameter in the corresponding ordered response model. The advantages of our methods are borne out in simulation studies and a real data application to the joint retirement decision of married couples.
    Date: 2019–11
  2. By: Fabio Franco (University of Rome "Tor Vergata")
    Abstract: Particle filtering is a useful statistical tool which can be used to make inference on the latent variables and the structural parameters of state space models by employing it inside MCMC algorithms (Flury and Shephard, 2011). It only relies on two assumptions (Gordon et al, 1993): a: The ability to simulate from the dynamic of the model; b: The predictive measurement density can be computed. In practice the second assumption may not be obvious and implementations of particle filter can become difficult to conduct. Gallant, Giacomini and Ragusa (2016) have recently developed a particle filter which does not rely on the structural form of the measurement equation. This method uses a set of moment conditions to induce the likelihood function of a structural model under a GMM criteria. The semiparametric structure allows to use particle filtering where the standard techniques are not applicable or difficult to implement. On the other hand, the GMM representation is less efficient than the standard technique and in some cases it can affect the proper functioning of particle filter and in turn deliver poor estimates. The contribution of this paper is to provide a comparison between the standard techniques, as Kalman filter and standard bootstrap particle filter, and the method proposed by Gallant et al (2016) in order to measure the performance of particle filter with GMM representation.
    Keywords: Bootstrap particle filter, GMM likelihood representation, Metropolis-Hastings algorithm, Kalman filter, nonlinear/non-Gaussian state space models.
    JEL: C4 C8
    Date: 2019–12–04
  3. By: Schnücker, A.M.
    Abstract: This paper proposes LASSO estimation specific for panel vector autoregressive (PVAR) models. The penalty term allows for shrinkage for different lags, for shrinkage towards homogeneous coeficients across panel units, for penalization of lags of variables belonging to another cross-sectional unit, and for varying penalization across equations. The penalty parameters therefore build on time series and cross-sectional properties that are commonly found in PVAR models. Simulation results point towards advantages of using the proposed LASSO for PVAR models over ordinary least squares in terms of forecast accuracy. An empirical forecasting application with five countries support these findings.
    Keywords: Model selection, multi-country model, shrinkage estimation
    JEL: C13 C32 C33
    Date: 2019–11–01
  4. By: Duván Humberto Cataño (University of Antioquia); Carlos Vladimir Rodríguez-Caballero (ITAM and CREATES); Daniel Peña (Universidad Carlos III de Madrid)
    Abstract: We introduce a non-stationary high-dimensional factor model with time-varying loadings. We propose an estimation procedure based on two stages. First, we estimate common factors by principal components. Afterwards, in the second step, considering the factors estimates as observed, the time-varying loadings are estimated by an iterative procedure of generalized least squares using wavelet functions. We investigate the finite sample features of the proposed methodology by some Monte Carlo simulations. Finally, we use this methodology to study the electricity prices and loads of the Nord Pool power market.
    Keywords: Factor models, wavelet functions, generalized least squares, electricity prices and loads
    JEL: C13 C32 Q43
    Date: 2019–12–09
  5. By: He, Yang (Microsoft Corporation); Bartalotti, Otávio (Iowa State University)
    Abstract: This paper develops a novel wild bootstrap procedure to construct robust bias-corrected (RBC) valid confidence intervals (CIs) for fuzzy regression discontinuity designs, providing an intuitive complement to existing RBC methods. The CIs generated by this procedure are valid under conditions similar to the procedures proposed by Calonico et al. (2014) and related literature. Simulations provide evidence that this new method is at least as accurate as the plug-in analytical corrections when applied to a variety of data generating processes featuring endogeneity and clustering. Finally, we demonstrate its empirical relevance by revisiting Angrist and Lavy (1999) analysis of class size on student outcomes.
    Keywords: fuzzy regression discontinuity, robust confidence intervals, wild bootstrap, average treatment effect
    JEL: C14 C21 C26
    Date: 2019–11
  6. By: Zhishui Hu (University of Science and Technology of China); Peter C.B. Phillips (Cowles Foundation, Yale University); Qiying Wang (School of Mathematics and Statistics, The University of Sydney)
    Abstract: This paper develops an asymptotic theory for nonlinear cointegrating power function regression. The framework extends earlier work on the deterministic trend case and allows for both endogeneity and heteroskedasticity, which makes the models and inferential methods relevant to many empirical economic and ï¬ nancial applications, including predictive regression. Accompanying the asymptotic theory of nonlinear regression, the paper establishes some new results on weak convergence to stochastic integrals that go beyond the usual semi-martingale structure and considerably extend existing limit theory, complementing other recent ï¬ ndings on stochastic integral asymptotics. The paper also provides a general framework for extremum estimation limit theory that encompasses stochastically nonstationary time series and should be of wide applicability.
    Keywords: Nonlinear power regression, Least squares estimation, Nonstationarity, Endogeneity, Heteroscedasticity
    JEL: C13 C22
    Date: 2019–12
  7. By: Thomas-Agnan, Christine; Morais, Joanna
    Abstract: In the framework of Compositional Data Analysis, vectors carrying relative information, also called compositional vectors, can appear in regression models either as dependent or as explanatory variables. In some situations, they can be on both sides of the regression equation. Measuring the marginal impacts of covariates in these types of models is not straightforward since a change in one component of a closed composition automatically affects the rest of the composition. J. Morais, C. Thomas-Agnan and M. Simioni [Austrian Journal of Statistics, 47(5), 1-25, 2018] have shown how to measure, compute and interpret these marginal impacts in the case of linear regression models with compositions on both sides of the equation. The resulting natural interpretation is in terms of an elasticity, a quantity commonly used in econometrics and marketing applications. They also demonstrate the link between these elasticities and simplicial derivatives. The aim of this contribution is to extend these results to other situations, namely when the compositional vector is on a single side of the regression equation. In these cases, the marginal impact is related to a semi-elasticity and also linked to some simplicial derivative. Moreover we consider the possibility that a total variable is used as an explanatory variable, with several possible interpretations of this total and we derive the elasticity formulas in that case.
    Keywords: compositional regression model; marginal effects; simplicial derivative; elasticity; semi-elasticity.
    JEL: C10 C25 C35 C46 M31 D12
    Date: 2019–12
  8. By: Stephan Smeekes; Etienne Wijler
    Abstract: We investigate how the possible presence of unit roots and cointegration affects forecasting with Big Data. As most macroeoconomic time series are very persistent and may contain unit roots, a proper handling of unit roots and cointegration is of paramount importance for macroeconomic forecasting. The high-dimensional nature of Big Data complicates the analysis of unit roots and cointegration in two ways. First, transformations to stationarity require performing many unit root tests, increasing room for errors in the classification. Second, modelling unit roots and cointegration directly is more difficult, as standard high-dimensional techniques such as factor models and penalized regression are not directly applicable to (co)integrated data and need to be adapted. We provide an overview of both issues and review methods proposed to address these issues. These methods are also illustrated with two empirical applications.
    Date: 2019–11
  9. By: Ruixuan Liu; Zhengfei Yu
    Abstract: We study accelerated failure time (AFT) models in which the survivor function of the additive error term is log-concave. The log-concavity assumption covers large families of commonly-used distributions and also represents the aging or wear-out phenomenon of the baseline duration. For right-censored failure time data, we construct semi-parametric maximum likelihood estimates of the finite dimensional parameter and establish the large sample properties. The shape restriction is incorporated via a nonparametric maximum likelihood estimator (NPMLE) of the hazard function. Our approach guarantees the uniqueness of a global solution for the estimating equations and delivers semiparametric efficient estimates. Simulation studies and empirical applications demonstrate the usefulness of our method.
    Date: 2019–11
  10. By: Alexander M. Chinco; Andreas Neuhierl; Michael Weber
    Abstract: The academic literature literally contains hundreds of variables that seem to predict the cross-section of expected returns. This so-called "anomaly zoo" has caused many to question whether researchers are using the right tests of statistical significance. But, here's the thing: even if researchers use the right tests, they will still draw the wrong conclusions from their econometric analyses if they start out with the wrong priors---i.e., if they start out with incorrect beliefs about the ex ante probability of encountering a tradable anomaly. So, what are the right priors? What is the correct anomaly base rate? We develop a first way to estimate the anomaly base rate by combining two key insights: 1) Empirical-Bayes methods capture the implicit process by which researchers form priors based on their past experience with other variables in the anomaly zoo. 2) Under certain conditions, there is a one-to-one mapping between these prior beliefs and the best-fit tuning parameter in a penalized regression. We study trading-strategy performance to verify our estimation results. If you trade on two variables with similar one-month-ahead return forecasts in different anomaly-base-rate regimes (low vs. high), the variable in the low base-rate regime consistently underperforms the otherwise identical variable in the high base-rate regime.
    JEL: C12 C52 G11
    Date: 2019–11
  11. By: Giuseppe Cavaliere; Iliyan Georgiev
    Abstract: Asymptotic bootstrap validity is usually understood as consistency of the distribution of a bootstrap statistic, conditional on the data, for the unconditional limit distribution of a statistic of interest. From this perspective, randomness of the limit bootstrap measure is regarded as a failure of the bootstrap. We show that such limiting randomness does not necessarily invalidate bootstrap inference if validity is understood as control over the frequency of correct inferences in large samples. We first establish sufficient conditions for asymptotic bootstrap validity in cases where the unconditional limit distribution of a statistic can be obtained by averaging a (random) limiting bootstrap distribution. Further, we provide results ensuring the asymptotic validity of the bootstrap as a tool for conditional inference, the leading case being that where a bootstrap distribution estimates consistently a conditional (and thus, random) limit distribution of a statistic. We apply our framework to several inference problems in econometrics, including linear models with possibly non-stationary regressors, functional CUSUM statistics, conditional Kolmogorov-Smirnov specification tests, the `parameter on the boundary' problem and tests for constancy of parameters in dynamic econometric models.
    Date: 2019–11
  12. By: Yu, Hanchen; Fotheringham, Alexander Stewart; Li, Ziqi; Oshan, Taylor M.; Wolf, Levi John (University of Bristol)
    Abstract: Under the realization that Geographically Weighted Regression (GWR) is a data-borrowing technique, this paper derives expressions for the amount of bias introduced to local parameter estimates by borrowing data from locations where the processes might be different from those at the regression location. This is done for both GWR and Multiscale GWR (MGWR). We demonstrate the accuracy of our expressions for bias through a comparison with empirically derived estimates based on a simulated data set with known local parameter values. By being able to compute the bias in both models we are able to demonstrate the superiority of MGWR. We then demonstrate the utility of a corrected Akaike Information Criterion statistic in finding optimal bandwidths in both GWR and MGWR as a trade-off between minimizing both bias and uncertainty. We further show how bias in one set of local parameter estimates can affect the bias in another set of local estimates. The bias derived from borrowing data from other locations appears to be very small.
    Date: 2019–07–18
  13. By: Annalisa Cadonna; Sylvia Fr\"uhwirth-Schnatter; Peter Knaus
    Abstract: Time-varying parameter (TVP) models are very flexible in capturing gradual changes in the effect of a predictor on the outcome variable. However, in particular when the number of predictors is large, there is a known risk of overfitting and poor predictive performance, since the effect of some predictors is constant over time. We propose a prior for variance shrinkage in TVP models, called triple gamma. The triple gamma prior encompasses a number of priors that have been suggested previously, such as the Bayesian lasso, the double gamma prior and the Horseshoe prior. We present the desirable properties of such a prior and its relationship to Bayesian Model Averaging for variance selection. The features of the triple gamma prior are then illustrated in the context of time varying parameter vector autoregressive models, both for simulated datasets and for a series of macroeconomics variables in the Euro Area.
    Date: 2019–12
  14. By: Chan Shen Author-1-Name-First: Chan Author-1-Name-Last: Shen (Pennsylvania State University Author-2-Name-First: Roger Author-2-Name-Last: Klein Author-2-Affiliation: Rutgers University Author-2-Email:
    Abstract: Controlling the bias is central to estimating semiparametric models. Many methods have been developed to control bias in estimating conditional expectations while main- taining a desirable variance order. However, these methods typically do not perform well at moderate sample sizes. Moreover, and perhaps related to their performance, non-optimal windows are selected with undersmoothing needed to ensure the appro- priate bias order. In this paper, we propose a recursive differencing estimator for conditional expectations. When this method is combined with a bias control targeting the derivative of the semiparametric expectation, we are able to obtain asymptotic normality under optimal windows. As suggested by the structure of the recursion, in a wide variety of triple index designs, the proposed bias control performs much better at moderate sample sizes than regular or higher order kernels and local polynomials.
    Keywords: semiparametric model, bias reduction, conditional expectation
    JEL: C1 C14
    Date: 2019–11–24
  15. By: Schnaubelt, Matthias
    Abstract: Machine learning is increasingly applied to time series data, as it constitutes an attractive alternative to forecasts based on traditional time series models. For independent and identically distributed observations, cross-validation is the prevalent scheme for estimating out-of-sample performance in both model selection and assessment. For time series data, however, it is unclear whether forwardvalidation schemes, i.e., schemes that keep the temporal order of observations, should be preferred. In this paper, we perform a comprehensive empirical study of eight common validation schemes. We introduce a study design that perturbs global stationarity by introducing a slow evolution of the underlying data-generating process. Our results demonstrate that, even for relatively small perturbations, commonly used cross-validation schemes often yield estimates with the largest bias and variance, and forward-validation schemes yield better estimates of the out-of-sample error. We provide an interpretation of these results in terms of an additional evolution-induced bias and the sample-size dependent estimation error. Using a large-scale financial data set, we demonstrate the practical significance in a replication study of a statistical arbitrage problem. We conclude with some general guidelines on the selection of suitable validation schemes for time series data.
    Keywords: machine learning,model selection,model validation,time series,cross-validation
    Date: 2019
  16. By: Kenichiro McAlinn; Kosaku Takanashi
    Abstract: This paper proposes a new estimator for selecting weights to average over least squares estimates obtained from a set of models. Our proposed estimator builds on the Mallows model average (MMA) estimator of Hansen (2007), but, unlike MMA, simultaneously controls for location bias and regression error through a common constant. We show that our proposed estimator-- the mean-shift Mallows model average (MSA) estimator-- is asymptotically optimal to the original MMA estimator in terms of mean squared error. A simulation study is presented, where we show that our proposed estimator uniformly outperforms the MMA estimator.
    Date: 2019–12
  17. By: Antoine Deeb; Cl\'ement de Chaisemartin
    Abstract: In the literature studying randomized controlled trials (RCTs), it is often assumed that the potential outcomes of units participating in the experiment are deterministic. This assumption is unlikely to hold, as stochastic shocks may take place during the experiment. In this paper, we consider the case of an RCT with individual-level treatment assignment, and we allow for individual-level and cluster-level (e.g. village-level) shocks to affect the potential outcomes. We show that one can draw inference on two estimands: the ATE conditional on the realizations of the cluster-level shocks, using heteroskedasticity-robust standard errors; the ATE netted out of those shocks, using cluster-robust standard errors. By clustering, researchers can test if the treatment would still have had an effect, had the stochastic shocks that occurred during the experiment been different.
    Date: 2019–12
  18. By: Alexander Jurisch
    Abstract: We develop a method that relates the truncated cumulant-function of the fourth order with the L\'evian cumulant-function. This gives us explicit formulas for the L\'evy-parameters, which allow a real-time analysis of the state of a random-motion. Cumbersome procedures like maximum-likelihood or least-square methods are unnecessary. Furthermore, we treat the L\'evy-system in terms of statistical mechanics and work out it's thermodynamic properties. This also includes a discussion of the fractal nature of relativistic corrections. As examples for a time-series analysis, we apply our results on the time-series of the German DAX and the American S\&P-500\,.
    Date: 2019–02
  19. By: Millie Yi Mao (Azusa Pacific University); Aman Ullah (Department of Economics, University of California Riverside)
    Abstract: This chapter introduces an information theoretic approach to specify econometric functions as an alternative to avoid parametric assumptions. We investigate the performances of the information theoretic method in estimating the regression (conditional mean) and response (derivative) functions. We have demonstrated that they are easy to implement, and are advantageous over parametric models and nonparametric kernel techniques.
    Keywords: Information theory, Maximum entropy distributions, Econometric functions, Conditional mean
    Date: 2019–11
  20. By: Indranil SenGupta; William Nganje; Erik Hanson
    Abstract: A commonly used stochastic model for derivative and commodity market analysis is the Barndorff-Nielsen and Shephard (BN-S) model. Though this model is very efficient and analytically tractable, it suffers from the absence of long range dependence and many other issues. For this paper, the analysis is restricted to crude oil price dynamics. A simple way of improving the BN-S model with the implementation of various machine learning algorithms is proposed. This refined BN-S model is more efficient and has fewer parameters than other models which are used in practice as improvements of the BN-S model. The procedure and the model show the application of data science for extracting a "deterministic component" out of processes that are usually considered to be completely stochastic. Empirical applications validate the efficacy of the proposed model for long range dependence.
    Date: 2019–11
  21. By: Christian Bongiorno (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec); Damien Challet (MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec)
    Abstract: We introduce a method to predict which correlation matrix coefficients are likely to change their signs in the future in the high-dimensional regime, i.e. when the number of features is larger than the number of samples per feature. The stability of correlation signs, two-by-two relationships, is found to depend on three-by-three relationships inspired by Heider social cohesion theory in this regime. We apply our method to US and Hong Kong equities historical data to illustrate how the structure of correlation matrices influences the stability of the sign of its coefficients .
    Date: 2019–10–28
  22. By: Massimo Franchi ("Sapienza" University of Rome); Paolo Paruolo (European Commission, Joint Research Centre)
    Abstract: This paper discusses the concept of cointegrating space for systems integrated of order higher than 1. It is first observed that the notions of (polynomial) cointegrating vectors and of root functions coincide. Second, the cointegrating space is defined as a subspace of the space of rational vectors. Third, it is shown that canonical sets of root functions can be used to generate a basis of the cointegrating space. Fourth, results on how to reduce bases of rational vector spaces to polynomial bases with minimal order (i.e. minimal bases) are shown to imply the separation of cointegrating vectors that potentially do not involve differences of the process from the ones that require them. Finally, it is argued that minimality of polynomial bases and economic identification of cointegrating vectors can be properly combined.
    Keywords: VAR, Cointegration, I(d), Vector spaces.
    JEL: C12 C33 C55
    Date: 2019–12

This nep-ecm issue is ©2019 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.