nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒08‒22
sixteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Detection of multiple structural breaks in large covariance matrices By Li, Yu-Ning; Li, Degui; Fryzlewicz, Piotr
  2. Detecting Grouped Local Average Treatment Effects and Selecting True Instruments By Nicolas Apfel; Helmut Farbmacher; Rebecca Groh; Martin Huber; Henrika Langen
  4. On the instrumental variable estimation with many weak and invalid instruments By Yiqi Lin; Frank Windmeijer; Xinyuan Song; Qingliang Fan
  5. A sufficient and necessary condition for identification of binary choice models with fixed effects By Yinchu Zhu
  6. Kolmogorov-Smirnov type test for generated variables By Otsu, Taisuke; Taniguchi, Go
  7. A General Limit Theory for Nonlinear Functionals of Nonstationary Time Series By Qiying Wang; Peter C. B. Phillips
  8. Estimating Continuous Treatment Effects in Panel Data using Machine Learning with an Agricultural Application By Sylvia Klosin; Max Vilgalys
  9. High-dimensional changepoint estimation with heterogeneous missingness By Follain, Bertille; Wang, Tengyao; Samworth, Richard J.
  10. Bootstrap Inference Under Cross Sectional Dependence By Timothy Conley; Sílvia Gonçalves; Min Seong Kim; Benoit Perron
  11. Model diagnostics of discrete data regression: a unifying framework using functional residuals By Zewei Lin; Dungang Liu
  12. Econometric Analysis of Asset Price Bubbles By Shuping Shi; Peter C. B. Phillips
  13. Estimating Inequality with Missing Incomes By Paolo Brunori; Pedro Salas-Rojo; Paolo Brunori
  14. The Virtue of Complexity Everywhere By Bryan T. Kelly; Semyon Malamud; Kangying Zhou
  15. A rank similarity test for quantile treatment effects in conjunction with propensity score matching: An application to heterogeneous crop yield impacts of agricultural credit By Shukla, Sumedha; Arora, Gaurav
  16. LASSO Principal Component Averaging -- a fully automated approach for point forecast pooling By Bartosz Uniejewski; Katarzyna Maciejowska

  1. By: Li, Yu-Ning; Li, Degui; Fryzlewicz, Piotr
    Abstract: This paper studies multiple structural breaks in large contemporaneous covariance matrices of high-dimensional time series satisfying an approximate factor model. The breaks in the second order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easy-to-implement CUSUM-based detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBS-Cov) and Wild Sparsified Binary Segmentation for Covariance (WSBS-Cov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with near-optimal rates (up to a logarithmic factor) achieved for the estimated breaks. Monte-Carlo simulation studies are conducted to examine the finite-sample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID-19) outbreak. An R package “BSCOV” is provided to implement the proposed algorithms.
    Keywords: approximate factor models; Binary segmentation; CUSUM; large covariance matrix; principal component analysis; structural breaks; SRG1920/ 100603; EP/L014246/1
    JEL: C1
    Date: 2022–05–18
  2. By: Nicolas Apfel; Helmut Farbmacher; Rebecca Groh; Martin Huber; Henrika Langen
    Abstract: In the context of an endogenous binary treatment with heterogeneous effects and multiple instruments, we propose a two-step procedure to identify complier groups with identical local average treatment effects (LATE), despite relying on distinct instruments and even if several instruments violate the identifying assumptions. Our procedure is based on the fact that the LATE is homogeneous for any two or multiple instruments which (i) satisfy the LATE assumptions (instrument validity and treatment monotonicity in the instrument) and (ii) generate identical complier groups in terms of treatment propensities given the respective instruments. Under the (plurality) assumption that for each set of instruments with identical treatment propensities, those instruments satisfying the LATE assumptions constitute the relative majority, our procedure permits identifying these true instruments in a data driven way. We also provide a simulation study investigating the finite sample properties of our approach and an empirical application investigating the effect of incarceration on recidivism in the US with judge assignments serving as instruments.
    Date: 2022–07
  3. By: Galina Besstremyannaya (National Research University Higher School of Economics); Sergei Golovan (National Research University Higher School of Economics)
    Abstract: The purpose of the paper is to enable inference in case of quantile regression with endogenous covariates and clustered data. We prove that the instrumental variable quantile regression estimator is consistent where there is correlation of errors within clusters. We derive an asymptotic distribution for the estimator, which may be used for inference for a given ? . As regards inference based on the entire instrumental variable quantile regression process, we prove that cluster-based bootstrapping of a statistic of a certain class offers a computationally tractable approach for implementing asymptotic tests. Our theoretical results concerning the asymptotic properties of the instrumental variable quantile regression estimator for clustered data are supported by simulation analysis. The empirical part of the paper applies the technique to estimation of the earning equations of US men and women where female labor supply is endogenous and subject to the shock of World War II
    Keywords: quantile regression, endogeneity, clustered data, instrumental variables
    JEL: C21 C23 C26 D12
    Date: 2022
  4. By: Yiqi Lin; Frank Windmeijer; Xinyuan Song; Qingliang Fan
    Abstract: We discuss the fundamental issue of identification in linear instrumental variable (IV) models with unknown IV validity. We revisit the popular majority and plurality rules and show that no identification condition can be "if and only if" in general. With the assumption of the "sparsest rule", which is equivalent to the plurality rule but becomes operational in computation algorithms, we investigate and prove the advantages of non-convex penalized approaches over other IV estimators based on two-step selections, in terms of selection consistency and accommodation for individually weak IVs. Furthermore, we propose a surrogate sparsest penalty that aligns with the identification condition and provides oracle sparse structure simultaneously. Desirable theoretical properties are derived for the proposed estimator with weaker IV strength conditions compared to the previous literature. Finite sample properties are demonstrated using simulations and the selection and estimation method is applied to an empirical study concerning the effect of trade on economic growth.
    Date: 2022–07
  5. By: Yinchu Zhu
    Abstract: We study the identification of binary choice models with fixed effects. We provide a condition called sign saturation and show that this condition is sufficient for the identification of the model. In particular, we can guarantee identification even with bounded regressors. We also show that without this condition, the model is never identified even if the errors are known to have the logistic distribution. A test is provided to check the sign saturation condition and can be implemented using existing algorithms for the maximum score estimator.
    Date: 2022–06
  6. By: Otsu, Taisuke; Taniguchi, Go
    Abstract: Distribution homogeneity testing, particularly based on the Kolmogorov-Smirnov statistic, has been applied in various empirical studies. In empirical economic analysis, it is often the case that economic variables of interest are obtained as estimated values or residuals of preliminary model fits, called generated variables. In this paper, we extend the Kolmogorov- Smirnov type homogeneity test to accommodate such generated variables, and propose an asymptotically valid bootstrap inference procedure. A small simulation study illustrates that it is crucial for reliable inference to account for estimation errors in the generated variables. The proposed method is applied to compare the total factor productivities across different countries.
    JEL: J1
    Date: 2020–10–01
  7. By: Qiying Wang (University of Sydney); Peter C. B. Phillips (Cowles Foundation, Yale University, University of Auckland, Singapore Management University, University of Southampton)
    Abstract: Limit theory is provided for a wide class of covariance functionals of a nonstationary process and stationary time series. The results are relevant to estimation and inference in nonlinear nonstationary regressions that involve unit root, local unit root or fractional processes and they include both parametric and nonparametric regressions. Self normalized versions of these statistics are considered that are useful in inference. Numerical evidence reveals a strong bimodality in the ?nite sample distributions that persists for very large sample sizes although the limit theory is Gaussian. New self normalized versions are introduced that deliver improved approximations.
    Keywords: Endogeneity, Limit theory, Local time, Nonlinear functional, Nonstationarity, Sample covariance, Zero energy
    JEL: C22 C23
    Date: 2022–07
  8. By: Sylvia Klosin; Max Vilgalys
    Abstract: This paper introduces and proves asymptotic normality for a new semi-parametric estimator of continuous treatment effects in panel data. Specifically, we estimate an average derivative of the regression function. Our estimator uses the panel structure of data to account for unobservable time-invariant heterogeneity and machine learning methods to flexibly estimate functions of high-dimensional inputs. We construct our estimator using tools from double de-biased machine learning (DML) literature. We show the performance of our method in Monte Carlo simulations and also apply our estimator to real-world data and measure the impact of extreme heat in United States (U.S.) agriculture. We use the estimator on a county-level dataset of corn yields and weather variation, measuring the elasticity of yield with respect to a marginal increase in extreme heat exposure. In our preferred specification, the difference between the estimates from OLS and our method is statistically significant and economically significant. We find a significantly higher degree of impact, corresponding to an additional $1.18 billion in annual damages by the year 2050 under median climate scenarios. We find little evidence that this elasticity is changing over time.
    Date: 2022–07
  9. By: Follain, Bertille; Wang, Tengyao; Samworth, Richard J.
    Abstract: We propose a new method for changepoint estimation in partially observed, high-dimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a ‘MissCUSUM’ transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period.
    Keywords: changepoint estimation; missing data; high-dimensional data; segmentation; sparsity; EP/N031938/1; EP/P031447/1; EP/T02772X/1; H2020 European Research Council (GrantNumber(s): 101019498); Wiley deal
    JEL: C1
    Date: 2022–07–11
  10. By: Timothy Conley (Western University); Sílvia Gonçalves (McGill University); Min Seong Kim (University of Connecticut); Benoit Perron (Université of Montréal)
    Abstract: In this paper, we introduce a method of generating bootstrap samples with unknown patterns of cross sectional/spatial dependence which we call the spatial dependent wild bootstrap. This method is a spatial counterpart to the wild dependent bootstrap of Shao (2010) and generates data by multiplying a vector of independently and identically distributed external variables by the eigendecomposition of a bootstrap kernel. We prove the validity of our method for studentized and unstudentized statistics under a linear array representation of the data. Simulation experiments document the potential for improved inference with our approach. We illustrate our method in a firm-level regression application investigating the relationship between firms’ sales growth and the import activity in their local markets using unique firm-level and imports data for Canada.
    Keywords: bootstrap, cross sectional dependence, spatial HAC, eigendecomposition, economic distance
    JEL: C12 C32 C38 C52
    Date: 2022–07
  11. By: Zewei Lin; Dungang Liu
    Abstract: Model diagnostics is an indispensable component of regression analysis, yet it is not well addressed in standard textbooks on generalized linear models. The lack of exposition is attributed to the fact that when outcome data are discrete, classical methods (e.g., Pearson/deviance residual analysis and goodness-of-fit tests) have limited utility in model diagnostics and treatment. This paper establishes a novel framework for model diagnostics of discrete data regression. Unlike the literature defining a single-valued quantity as the residual, we propose to use a function as a vehicle to retain the residual information. In the presence of discreteness, we show that such a functional residual is appropriate for summarizing the residual randomness that cannot be captured by the structural part of the model. We establish its theoretical properties, which leads to the innovation of new diagnostic tools including the functional-residual-vs covariate plot and Function-to-Function (Fn-Fn) plot. Our numerical studies demonstrate that the use of these tools can reveal a variety of model misspecifications, such as not properly including a higher-order term, an explanatory variable, an interaction effect, a dispersion parameter, or a zero-inflation component. The functional residual yields, as a byproduct, Liu-Zhang's surrogate residual mainly developed for cumulative link models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it considerably broadens the diagnostic scope as it applies to virtually all parametric models for binary, ordinal and count data, all in a unified diagnostic scheme.
    Date: 2022–07
  12. By: Shuping Shi (Macquarie University); Peter C. B. Phillips (Cowles Foundation, Yale University, University of Auckland, Singapore Management University, University of Southampton)
    Abstract: In the presence of bubbles, asset prices consist of a fundamental and a bubble component, with the bubble component following an explosive dynamic. The general idea for bubble identification is to apply explosive root tests to a proxy of the unobservable bubble. Three notable proxies are the real asset prices, log price-payoff ratios, and estimated non-fundamental components. The rationale for all three proxy choices rests on the definition of bubbles, which has been presented in various forms in the literature. This chapter provides a theoretical framework that incorporates several definitions of bubbles (and fundamentals) and offers guidance for selecting proxies. For explosive root tests, we introduce the recursive evolving test of Phillips et al. (2015b,c) along with its asymptotic properties. This procedure can serve as a real-time monitoring device and has been shown to outperform several other tests. Like all other recursive testing procedures, the PSY algorithm faces the issue of multiplicity in testing that contaminates conventional significance values. To address this issue, we propose a multiple-testing algorithm to determine appropriate test critical values and show its satisfactory performance in finite samples by simulations. To illustrate, we conduct a pseudo real-time bubble monitoring exercise in the S&P 500 stock market from January 1990 to June 2020. The empirical results reveal the importance of using a good proxy for bubbles and addressing the multiplicity issue.
    Keywords: Bubbles; econometrics identification; market fundamental; explosive root; multiplicity; S&P 500 composite index
    JEL: C15 C22
    Date: 2022–06
  13. By: Paolo Brunori (III LSE & University of Florence); Pedro Salas-Rojo (III LSE); Paolo Brunori (World Bank)
    Abstract: The measurement of income inequality is affected by missing observations, especially if they are concentrated on the tails of an income distribution. This paper conducts an experiment to test how the different correction methods proposed by the statistical, econometric and machine learning literature address measurement biases of inequality due to item non response. We take a baseline survey and artificially corrupt the data employing several alternative non-linear functions that simulate patterns of income non-response, and show how biased inequality statistics can be when item non-responses are ignored. The comparative assessment of correction methods indicates that most methods are able to partially correct for missing data biases. Sample reweighting based on probabilities on non-response produces inequality estimates quite close to true values in most simulated missing data patterns. Matching and Pareto corrections can also be effective to correct for selected missing data patterns.Other methods, such as Single and Multiple imputations and Machine Learning methods are less effective. A final discussion provides some elements that help explaining these findings.
    Keywords: Inequality, item non-response, missing, prediction
    JEL: D63 C83 C01
  14. By: Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute); Kangying Zhou (Yale School of Management)
    Abstract: We investigate the performance of non-linear return prediction models in the high complexity regime, i.e., when the number of model parameters exceeds the number of observations. We document a "virtue of complexity" in all asset classes that we study (US equities, international equities, bonds, commodities, currencies, and interest rates). Specifically, return prediction R2 and optimal portfolio Sharpe ratio generally increase with model parameterization for every asset class. The virtue of complexity is present even in extremely data-scarce environments, e.g., for predictive models with less than twenty observations and tens of thousands of predictors. The empirical association between model complexity and out-of-sample model performance exhibits a striking consistency with theoretical predictions.
    Keywords: Portfolio choice, machine learning, random matrix theory, benign overfit, overparameterization
    JEL: C3 C58 C61 G11 G12 G14
    Date: 2022–07
  15. By: Shukla, Sumedha; Arora, Gaurav
    Keywords: Research Methods/Statistical Methods, Agricultural and Food Policy, Agricultural Finance
    Date: 2022–08
  16. By: Bartosz Uniejewski; Katarzyna Maciejowska
    Abstract: This paper develops a novel, fully automated forecast averaging scheme, which combines LASSO estimation method with Principal Component Averaging (PCA). LASSO-PCA (LPCA) explores a pool of predictions based on a single model but calibrated to windows of different sizes. It uses information criteria to select tuning parameters and hence reduces the impact of researchers' at hock decisions. The method is applied to average predictions of hourly day-ahead electricity prices over 650 point forecasts obtained with various lengths of calibration windows. It is evaluated on four European and American markets with almost two and a half year of out-of-sample period and compared to other semi- and fully automated methods, such as simple mean, AW/WAW, LASSO and PCA. The results indicate that the LASSO averaging is very efficient in terms of forecast error reduction, whereas PCA method is robust to the selection of the specification parameter. LPCA inherits the advantages of both methods and outperforms other approaches in terms of MAE, remaining insensitive the the choice of a tuning parameter.
    Date: 2022–07

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.