nep-ecm New Economics Papers
on Econometrics
Issue of 2019‒06‒24
twenty papers chosen by
Sune Karlsson
Örebro universitet

  1. Saddlepoint Approximations for Spatial Panel Data Models By Chaonan Jiang; Davide La Vecchia; Elvezio Ronchetti; O. Scaillet
  2. The Confidence Interval Method for Selecting Valid Instrumental Variables By Frank Windmeijer; Xiaoran Liang; Fernando P Hartwig; Jack Bowden
  3. Estimation and Inference for Multi-dimensional Heterogeneous Panel Datasets with Hierarchical Multi-factor Error Structure By George Kapetanios; Laura Serlenga; Yongcheol Shin
  4. The multivariate simultaneous unobserved components model and identification via heteroskedasticity By Mengheng Li; Ivan Mendieta-Munoz
  5. Posterior Average Effects By St\'ephane Bonhomme; Martin Weidner
  6. A Correction for Regression Discontinuity Designs with Group-Specific Mismeasurement of the Running Variable By Bartalotti, Otávio; Brummet, Quentin; Dieterle, Steven G.
  7. High-Dimensional Functional Factor Models By Marc Hallin; Gilles Nisol; Shahin Tavakoli
  8. Uniform Consistency of Marked and Weighted Empirical Distributions of Residuals By Vanessa Berenguer-Rico; Søren Johansen; Bent Nielsen
  9. Detecting p-hacking By Graham Elliott; Nikolay Kudrin; Kaspar Wuthrich
  10. On the Properties of the Synthetic Control Estimator with Many Periods and Many Controls By Bruno Ferman
  11. lpdensity: Local Polynomial Density Estimation and Inference By Matias D. Cattaneo; Michael Jansson; Xinwei Ma
  12. Statistical Tests for Cross-Validation of Kriging Models By Kleijnen, Jack; van Beers, W.C.M.
  13. Online Block Layer Decomposition schemes for training Deep Neural Networks By Laura Palagi; Ruggiero Seccia
  14. Nonparametric estimation in a regression model with additive and multiplicative noise By Christophe Chesneau; Salima El Kolei; Junke Kou; Fabien Navarro
  15. Pareto Models for Top Incomes By Arthur Charpentier; Emmanuel Flachaire
  16. A Flexible Regime Switching Model for Asset Returns By Marc S. Paolella; Pawel Polak; Patrick S. Walker
  17. Partial Identification of Population Average and Quantile Treatment Effects in Observational Data under Sample Selection By Christelis, Dimitris; Messina, Julián
  18. Sentiment-Driven Stochastic Volatility Model: A High-Frequency Textual Tool for Economists By Jozef Barunik; Cathy Yi-Hsuan Chen; Jan Vecer
  19. Export sophistication: A dynamic panel data approach By Evzen Kocenda; Karen Poghosyan
  20. Score estimation of monotone partially linear index model By Taisuke Otsu; Mengshan Xu

  1. By: Chaonan Jiang (University of Geneva - Geneva School of Economics and Management); Davide La Vecchia (University of Geneva - Geneva School of Economics and Management - Research Center for Statistics); Elvezio Ronchetti (University of Geneva - Research Center for Statistics); O. Scaillet (University of Geneva GSEM and GFRI; Swiss Finance Institute; University of Geneva - Research Center for Statistics)
    Abstract: We develop new higher-order asymptotic techniques for the Gaussian maximum likelihood estimator of the parameters in a spatial panel data model, with fixed effects, time-varying covariates, and spatially correlated errors. We introduce a new saddlepoint density and tail area approximation to improve on the accuracy of the extant asymptotics. It features relative error of order O(m to the power of -1) for m = n(T -1) with n being the cross-sectional dimension and T the time-series dimension. The main theoretical tool is the tilted-Edgeworth technique. It yields a density approximation that is always non-negative, does not need resampling, and is accurate in the tails. We provide an algorithm to implement our saddlepoint approximation and we illustrate the good performance of our method via numerical examples. Monte Carlo experiments show that, for the spatial panel data model with fixed effects and T = 2, the saddlepoint approximation yields accuracy improvements over the routinely applied first-order asymptotics and Edgeworth expansions, in small to moderate sample sizes, while preserving analytical tractability. An empirical application on the investment-saving relationship in OECD countries shows disagreement between testing results based on first-order asymptotics and saddlepoint techniques, which questions some implications based on the former.
    Keywords: Spatial statistics, Panel data, Small samples, Saddlepoint approximation
    JEL: C21 C23 C52
    Date: 2019–03
  2. By: Frank Windmeijer; Xiaoran Liang; Fernando P Hartwig; Jack Bowden
    Abstract: We propose a new method, the conÂ…dence interval (CI) method, to select valid instruments from a set of potential instruments that may contain invalid ones, for instrumental variables estimation of the causal effect of an exposure on an outcome. Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. The CI method is based on the conÂ…dence intervals of the per instrument causal effects estimates. Each instrument speciÂ…fic causal effect estimate is obtained whilst treating all other instruments as invalid. The CI method selects the largest group with all conÂ…dence intervals overlapping with each other as the set of valid instruments. Under a plurality rule, we show that the resulting IV, or two-stage least squares (2SLS) estimator has oracle properties, meaning that it has the same limiting distribution as the oracle 2SLS estimator with the set of invalid instruments known. This result is the same as for the hard thresholding with voting (HT) method of Guo et al. (2018). Unlike the HT method, the number of instruments selected as valid by the CI method is guaranteed to be monotonically decreasing for decreasing values of the tuning parameter, which determines the width of the conÂ…dence intervals. For the CI method, we can therefore use a downward testing procedure based on the Sargan test for overidentifying restrictions. We Â…find in a simulation design similar to that of Guo et al. (2018) better properties for the CI method based estimation and inference than for the HT method and in an application of the effect of BMI on blood pressure that the CI method is better able to detect invalid instruments.
    Keywords: Causal inference; Instrumental variables; Invalid instruments
    Date: 2019–06–17
  3. By: George Kapetanios (King’s College London); Laura Serlenga (University of Bari "Aldo Moro"); Yongcheol Shin (University of York)
    Abstract: Given the growing availability of large datasets and following recent research trends on multi-dimensional modelling, we develop three dimensional (3D) panel data models with hierarchical error components that allow for strong cross-sectional dependence through unobserved heterogeneous global and local factors. We propose consistent estimation procedures by extending the common correlated effects (CCE) estimation approach proposed by Pesaran (2006). The standard CCE approach needs to be modified in order to account for the hierarchical factor structure in 3D panels. Further, we provide the associated asymptotic theory, including new nonparametric variance estimators. The validity of the proposed approach is confirmed by Monte Carlo simulation studies. We also demonstrate the empirical usefulness of the proposed approach through an application to a 3D panel gravity model of bilateral export flows.
    Keywords: Multi-dimensional Panel Data Models, Cross-sectional Error Dependence, Unobserved Heterogeneous Global and Local Factors, Multilateral Resistance, The Gravity Model of Bilateral Export Flows
    JEL: C13 C33 F14
    Date: 2019–06
  4. By: Mengheng Li (University of Technology Sydney); Ivan Mendieta-Munoz (University of Utah)
    Abstract: We propose a multivariate simultaneous unobserved components framework to determine the two-sided interactions between structural trend and cycle innovations. We relax the standard assumption in unobserved components models that trends are only driven by permanent shocks and cycles are only driven by transitory shocks by considering the possible spillover effects between structural innovations. The direction of spillover has a structural interpretation, whose identification is achieved via heteroskedasticity. We provide identifiability conditions and develop an efficient Bayesian MCMC procedure for estimation. Empirical implementations for both Okun’s law and the Phillips curve show evidence of significant spillovers between trend and cycle components.
    Keywords: Unobserved components; identification via heteroskedasticity; trends and cycles; permanent and transitory shocks; state space models; spillover structural effects
    JEL: C11 C32 E31 E32 E52
    Date: 2019–06–04
  5. By: St\'ephane Bonhomme; Martin Weidner
    Abstract: Economists are often interested in computing averages with respect to a distribution of unobservables. Examples are moments or distributions of individual fixed-effects, average partial effects in discrete choice models, or counterfactual policy simulations based on a structural model. We consider posterior estimators of such effects, where the average is computed conditional on the observation sample. While in various settings it is common to "shrink" individual estimates -- e.g., of teacher value-added or hospital quality -- toward a common mean to reduce estimation noise, a study of the frequentist properties of posterior average estimators is lacking. We establish two robustness properties of posterior estimators under misspecification of the assumed distribution of unobservables: they are optimal in terms of local worst-case bias, and their global bias is no larger than twice the minimum worst-case bias that can be achieved within a large class of estimators. These results provide a theoretical foundation for the use of posterior average estimators. In addition, our theory suggests a simple measure of the information contained in the posterior conditioning. For illustration, we consider two empirical settings: the estimation of the distribution of neighborhood effects in the US, and the estimation of the densities of permanent and transitory components in a model of income dynamics.
    Date: 2019–06
  6. By: Bartalotti, Otávio (Iowa State University); Brummet, Quentin (NORC at the University of Chicago); Dieterle, Steven G. (University of Edinburgh)
    Abstract: When the running variable in a regression discontinuity (RD) design is measured with error, identification of the local average treatment effect of interest will typically fail. While the form of this measurement error varies across applications, in many cases the measurement error structure is heterogeneous across different groups of observations. We develop a novel measurement error correction procedure capable of addressing heterogeneous mismeasurement structures by leveraging auxiliary information. We also provide adjusted asymptotic variance and standard errors that take into consideration the variability introduced by the estimation of nuisance parameters, and honest confidence intervals that account for potential misspecification. Simulations provide evidence that the proposed procedure corrects the bias introduced by heterogeneous measurement error and achieves empirical coverage closer to nominal test size than "naïve" alternatives. Two empirical illustrations demonstrate that correcting for measurement error can either reinforce the results of a study or provide a new empirical perspective on the data.
    Keywords: nonclassical measurement error, regression discontinuity, heterogeneous measurement error
    JEL: C21 C14 I12 J65
    Date: 2019–05
  7. By: Marc Hallin; Gilles Nisol; Shahin Tavakoli
    Abstract: In this paper, we set up the theoretical foundations for a high-dimensional functional factor model approach in the analysis of large panels of functional time series (FTS). We first establish a representation result stating that if the first r eigenvalues of the covariance operator of a cross-section of N FTS are unbounded as N diverges and if the (r + 1) th one is bounded, then we can represent each FTS as a sum of a common component driven by r factors, common to (almost) all the series, and a weakly cross-correlated idiosyncratic component (all the eigenvalues of the idiosyncratic covariance operator are bounded as N !1). Our model and theory are developed in a general Hilbert space setting that allows for panels mixing functional and scalar time series. We then turn to the estimation of the factors, their loadings, and the common components. We derive consistency results in the asymptotic regime where the number N of series and the number T of time observations diverge, thus exemplifying the “blessing of dimensionality” that explains the success of factor models in the context of high-dimensional (scalar) time series. Our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well-established results (Bai & Ng 2002).We provide numerical illustrations that corroborate the convergence rates predicted by the theory, and provide finer understanding of the interplay between N and T for estimation purposes. We conclude with an empirical illustration on a dataset of intraday S&P100 and Eurostoxx 50 stock returns, along with their scalar overnight returns.
    Keywords: Functional time series, High-dimensional time series, Factor model, Panel data, Functional data analysis..
    Date: 2019–06
  8. By: Vanessa Berenguer-Rico (University of Oxford); Søren Johansen (University of Copenhagen and CREATES); Bent Nielsen (University of Oxford)
    Abstract: A uniform weak consistency theory is presented for the marked and weighted empirical distribution function of residuals. New and weaker sufficient conditions for uniform consistency are derived. The theory allows for a wide variety of regressors and error distributions. We apply the theory to 1-step Huber-skip estimators. These estimators describe the widespread practice of removing outlying observations from an intial estimation of the model of interest and updating the estimation in a second step by applying least squares to the selected observations. Two results are presented. First, we give new and weaker conditions for consistency of the estimators. Second, we analyze the gauge, which is the rate of false detection of outliers, and which can be used to decide the cut-off in the rule for selecting outliers.
    Keywords: 1-step Huber skip, Asymptotic theory, Empirical processes, Gauge, Marked and Weighted Empirical processes, Non-stationarity, Robust Statistics, Stationarity.
    JEL: C01 C22
    Date: 2019–05–24
  9. By: Graham Elliott; Nikolay Kudrin; Kaspar Wuthrich
    Abstract: We analyze what can be learned from tests for p-hacking based on distributions of t-statistics and p-values across multiple studies. We analytically characterize restrictions on these distributions that conform with the absence of p-hacking. This forms a testable null hypothesis and suggests statistical tests for p-hacking. We extend our results to p-hacking when there is also publication bias, and also consider what types of distributions arise under the alternative hypothesis that researchers engage in p-hacking. We show that the power of statistical tests for detecting p-hacking is low even if p-hacking is quite prevalent.
    Date: 2019–06
  10. By: Bruno Ferman
    Abstract: We consider the asymptotic properties of the Synthetic Control (SC) estimator when both the number of pre-treatment periods and control units are large. If potential outcomes follow a linear factor model, we provide conditions under which the factor loadings of the SC unit converge in probability to the factor loadings of the treated unit. This happens when there are weights diluted among many control units such that a weighted average of the factor loadings of the control units reconstructs the factor loadings of the treated unit. In this case, the SC estimator is asymptotically unbiased even when treatment assignment is correlated with time-varying unobservables. This result can be valid even when the number of control units is larger than the number of pre-treatment periods.
    Date: 2019–06
  11. By: Matias D. Cattaneo; Michael Jansson; Xinwei Ma
    Abstract: Density estimation and inference methods are widely used in empirical work. When the data has compact support, as all empirical applications de facto do, conventional kernel-based density estimators are inapplicable near or at the boundary because of their well known boundary bias. Alternative smoothing methods are available to handle boundary points in density estimation, but they all require additional tuning parameter choices or other typically ad hoc modifications depending on the evaluation point and/or approach considered. This article discusses the R and Stata package lpdensity implementing a novel local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2019), which is boundary adaptive, fully data-driven and automatic, and requires only the choice of one tuning parameter. The methods implemented also cover local polynomial estimation of the cumulative distribution function and density derivatives, as well as several other theoretical and methodological results. In addition to point estimation and graphical procedures, the package offers consistent variance estimators, mean squared error optimal bandwidth selection, and robust bias-corrected inference. A comparison with several other density estimation packages and functions available in R using a Monte Carlo experiment is provided.
    Date: 2019–06
  12. By: Kleijnen, Jack (Tilburg University, Center For Economic Research); van Beers, W.C.M. (Tilburg University, Center For Economic Research)
    Abstract: We derive new statistical tests for leave-one-out cross-validation of Kriging models. Graphically, we present these tests as scatterplots augmented with confi…dence intervals. We may wish to avoid extrapolation, which we de…fine as prediction of the output for a point that is a vertex of the convex hull of the given input combinations. Moreover, we may use bootstrapping to estimate the true variance of the Kriging predictor. The resulting tests (with or without extrapolation or bootstrapping) have type-I and type-II error probabilities, which we estimate through Monte Carlo experiments. To illustrate the application of our tests, we use an example with two inputs and the popular borehole example with eight inputs.
    Keywords: validation; cross-validation; Kriging; Gaussian process; extrapolation; convex hull; Monte Carlo Technique
    JEL: C0 C1 C9 C15 C44
    Date: 2019
  13. By: Laura Palagi (Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), University of Rome La Sapienza, Rome, Italy); Ruggiero Seccia (Department of Computer, Control and Management Engineering Antonio Ruberti (DIAG), University of Rome La Sapienza, Rome, Italy)
    Abstract: Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions to the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve the performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method able to effectively tackle difficulties due to the network's depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using different deep networks, and we showed how the application of (online) BCD methods to the training phase of DFNNs permits to outperform standard batch/online algorithms leading to an improvement on both the training phase and the generalization performance of the networks.
    Keywords: Deep Feedforward Neural Networks ; Block coordinate decomposition ; Online Optimization ; Large scale optimization
    Date: 2019
  14. By: Christophe Chesneau; Salima El Kolei; Junke Kou; Fabien Navarro
    Abstract: In this paper, we consider an unknown functional estimation problem in a general nonparametric regression model with the characteristic of having both multiplicative and additive noise. We propose two wavelet estimators, which, to our knowledge, are new in this general context. We prove that they achieve fast convergence rates under the mean integrated square error over Besov spaces. The rates obtained have the particularity of being established under weak conditions on the model. A numerical study in a context comparable to stochastic frontier estimation (with the difference that the boundary is not necessarily a production function) supports the theory.
    Date: 2019–06
  15. By: Arthur Charpentier (CREM - Centre de recherche en économie et management - UNICAEN - Université de Caen Normandie - NU - Normandie Université - UR1 - Université de Rennes 1 - UNIV-RENNES - Université de Rennes - CNRS - Centre National de la Recherche Scientifique, UQAM - Département de mathématiques [Montréal] - UQAM - Université du Québec à Montréal); Emmanuel Flachaire (EUREQUA - Equipe Universitaire de Recherche en Economie Quantitative - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Top incomes are often related to Pareto distribution. To date, economists have mostly used Pareto Type I distribution to model the upper tail of income and wealth distribution. It is a parametric distribution, with an attractive property, that can be easily linked to economic theory. In this paper, we first show that modelling top incomes with Pareto Type I distribution can lead to severe over-estimation of inequality, even with millions of observations. Then, we show that the Generalized Pareto distribution and, even more, the Extended Pareto distribution, are much less sensitive to the choice of the threshold. Thus, they provide more reliable results. We discuss different types of bias that could be encountered in empirical studies and, we provide some guidance for practice. To illustrate, two applications are investigated, on the distribution of income in South Africa in 2012 and on the distribution of wealth in the United States in 2013.
    Keywords: Pareto distribution,top incomes,inequality measures
    Date: 2019–05–31
  16. By: Marc S. Paolella (University of Zurich - Department of Banking and Finance; Swiss Finance Institute); Pawel Polak (University of Zurich; Ecole Polytechnique Fédérale de Lausanne - Ecole Polytechnique Fédérale de Lausanne); Patrick S. Walker (University of Zurich, Department of Banking and Finance)
    Abstract: A non-Gaussian multivariate regime switching dynamic correlation model for fi nancial asset returns is proposed. It incorporates the multivariate generalized hyperbolic law for the conditional distribution of returns. All model parameters are estimated consistently using a new two-stage expectation-maximization algorithm that also allows for incorporation of shrinkage estimation via quasi-Bayesian priors. It is shown that use of Markov switching correlation dynamics not only leads to highly accurate risk forecasts, but also potentially reduces the regulatory capital requirements during periods of distress. In terms of portfolio performance, the new regime switching model delivers consistently higher Sharpe ratios and smaller losses than the equally weighted portfolio and all competing models. Finally, the regime forecasts are employed in a dynamic risk control strategy that avoids most losses during the fi nancial crisis and vastly improves risk-adjusted returns.
    Keywords: sGARCH; Markov Switching; Multivariate Generalized Hyperbolic Distribution; Portfolio Optimization; Value-at-Risk
    JEL: C32 C51 C53 G11 G17 G32
    Date: 2019–05
  17. By: Christelis, Dimitris; Messina, Julián
    Abstract: We partially identify population treatment effects in observational data under sample selection, without the benefit of random treatment assignment. We provide bounds both for the average and the quantile population treatment effects, combining assumptions for the selected and the non-selected subsamples. We show how different assumptions help narrow identification regions, and illustrate our methods by partially identifying the effect of maternal education on the 2015 PISA math test scores in Brazil. We find that while sample selection increases considerably the uncertainty around the effect of maternal education, it is still possible to calculate informative identification regions.
    JEL: C21 C24 I2
    Date: 2019–06
  18. By: Jozef Barunik; Cathy Yi-Hsuan Chen; Jan Vecer
    Abstract: We propose how to quantify high-frequency market sentiment using high-frequency news from NASDAQ news platform and support vector machine classifiers. News arrive at markets randomly and the resulting news sentiment behaves like a stochastic process. To characterize the joint evolution of sentiment, price, and volatility, we introduce a unified continuous-time sentiment-driven stochastic volatility model. We provide closed-form formulas for moments of the volatility and news sentiment processes and study the news impact. Further, we implement a simulation-based method to calibrate the parameters. Empirically, we document that news sentiment raises the threshold of volatility reversion, sustaining high market volatility.
    Date: 2019–05
  19. By: Evzen Kocenda (Institute of Economic Studies, Faculty of Social Sciences, Charles University in Prague, Czech Republic); Karen Poghosyan (Central Bank of Armenia, Economic Research Department, Yerevan, Armenia)
    Abstract: In this paper we analyze export sophistication based on a large panel dataset (2001?2015; 101 countries) and using various estimation algorithms. Using Monte Carlo simulations we evaluate the bias properties of estimators and show that GMM-type estimators outperform instrumentalvariable and fixed-effects estimators. Based on our analysis we document that GDP per capita and the size of the economy exhibit significant and positive effects on export sophistication; weak institutional quality exhibits negative effect. We also show that export sophistication is path-dependent and stable even during a major economic crisis, which is especially important for emerging and developing economies.
    Keywords: international trade; export sophistication; emerging and developing economies; specialization; dynamic panel data; Monte-Carlo simulation; panel data estimators
    JEL: C52 C53 F14 F47 O19
    Date: 2017–11
  20. By: Taisuke Otsu; Mengshan Xu
    Date: 2019–05

This nep-ecm issue is ©2019 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.