
on Econometrics 
By:  Li, YuNing; Li, Degui; Fryzlewicz, Piotr 
Abstract:  This paper studies multiple structural breaks in large contemporaneous covariance matrices of highdimensional time series satisfying an approximate factor model. The breaks in the second order moment structure of the common components are due to sudden changes in either factor loadings or covariance of latent factors, requiring appropriate transformation of the factor models to facilitate estimation of the (transformed) common factors and factor loadings via the classical principal component analysis. With the estimated factors and idiosyncratic errors, an easytoimplement CUSUMbased detection technique is introduced to consistently estimate the location and number of breaks and correctly identify whether they originate in the common or idiosyncratic error components. The algorithms of Wild Binary Segmentation for Covariance (WBSCov) and Wild Sparsified Binary Segmentation for Covariance (WSBSCov) are used to estimate breaks in the common and idiosyncratic error components, respectively. Under some technical conditions, the asymptotic properties of the proposed methodology are derived with nearoptimal rates (up to a logarithmic factor) achieved for the estimated breaks. MonteCarlo simulation studies are conducted to examine the finitesample performance of the developed method and its comparison with other existing approaches. We finally apply our method to study the contemporaneous covariance structure of daily returns of S&P 500 constituents and identify a few breaks including those occurring during the 2007–2008 financial crisis and the recent coronavirus (COVID19) outbreak. An R package “BSCOV” is provided to implement the proposed algorithms. 
Keywords:  approximate factor models; Binary segmentation; CUSUM; large covariance matrix; principal component analysis; structural breaks; SRG1920/ 100603; EP/L014246/1 
JEL:  C1 
Date:  2022–05–18 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:115026&r= 
By:  Nicolas Apfel; Helmut Farbmacher; Rebecca Groh; Martin Huber; Henrika Langen 
Abstract:  In the context of an endogenous binary treatment with heterogeneous effects and multiple instruments, we propose a twostep procedure to identify complier groups with identical local average treatment effects (LATE), despite relying on distinct instruments and even if several instruments violate the identifying assumptions. Our procedure is based on the fact that the LATE is homogeneous for any two or multiple instruments which (i) satisfy the LATE assumptions (instrument validity and treatment monotonicity in the instrument) and (ii) generate identical complier groups in terms of treatment propensities given the respective instruments. Under the (plurality) assumption that for each set of instruments with identical treatment propensities, those instruments satisfying the LATE assumptions constitute the relative majority, our procedure permits identifying these true instruments in a data driven way. We also provide a simulation study investigating the finite sample properties of our approach and an empirical application investigating the effect of incarceration on recidivism in the US with judge assignments serving as instruments. 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2207.04481&r= 
By:  Galina Besstremyannaya (National Research University Higher School of Economics); Sergei Golovan (National Research University Higher School of Economics) 
Abstract:  The purpose of the paper is to enable inference in case of quantile regression with endogenous covariates and clustered data. We prove that the instrumental variable quantile regression estimator is consistent where there is correlation of errors within clusters. We derive an asymptotic distribution for the estimator, which may be used for inference for a given ? . As regards inference based on the entire instrumental variable quantile regression process, we prove that clusterbased bootstrapping of a statistic of a certain class offers a computationally tractable approach for implementing asymptotic tests. Our theoretical results concerning the asymptotic properties of the instrumental variable quantile regression estimator for clustered data are supported by simulation analysis. The empirical part of the paper applies the technique to estimation of the earning equations of US men and women where female labor supply is endogenous and subject to the shock of World War II 
Keywords:  quantile regression, endogeneity, clustered data, instrumental variables 
JEL:  C21 C23 C26 D12 
Date:  2022 
URL:  http://d.repec.org/n?u=RePEc:hig:wpaper:255/ec/2022&r= 
By:  Yiqi Lin; Frank Windmeijer; Xinyuan Song; Qingliang Fan 
Abstract:  We discuss the fundamental issue of identification in linear instrumental variable (IV) models with unknown IV validity. We revisit the popular majority and plurality rules and show that no identification condition can be "if and only if" in general. With the assumption of the "sparsest rule", which is equivalent to the plurality rule but becomes operational in computation algorithms, we investigate and prove the advantages of nonconvex penalized approaches over other IV estimators based on twostep selections, in terms of selection consistency and accommodation for individually weak IVs. Furthermore, we propose a surrogate sparsest penalty that aligns with the identification condition and provides oracle sparse structure simultaneously. Desirable theoretical properties are derived for the proposed estimator with weaker IV strength conditions compared to the previous literature. Finite sample properties are demonstrated using simulations and the selection and estimation method is applied to an empirical study concerning the effect of trade on economic growth. 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2207.03035&r= 
By:  Yinchu Zhu 
Abstract:  We study the identification of binary choice models with fixed effects. We provide a condition called sign saturation and show that this condition is sufficient for the identification of the model. In particular, we can guarantee identification even with bounded regressors. We also show that without this condition, the model is never identified even if the errors are known to have the logistic distribution. A test is provided to check the sign saturation condition and can be implemented using existing algorithms for the maximum score estimator. 
Date:  2022–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2206.10475&r= 
By:  Otsu, Taisuke; Taniguchi, Go 
Abstract:  Distribution homogeneity testing, particularly based on the KolmogorovSmirnov statistic, has been applied in various empirical studies. In empirical economic analysis, it is often the case that economic variables of interest are obtained as estimated values or residuals of preliminary model fits, called generated variables. In this paper, we extend the Kolmogorov Smirnov type homogeneity test to accommodate such generated variables, and propose an asymptotically valid bootstrap inference procedure. A small simulation study illustrates that it is crucial for reliable inference to account for estimation errors in the generated variables. The proposed method is applied to compare the total factor productivities across different countries. 
JEL:  J1 
Date:  2020–10–01 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:105571&r= 
By:  Qiying Wang (University of Sydney); Peter C. B. Phillips (Cowles Foundation, Yale University, University of Auckland, Singapore Management University, University of Southampton) 
Abstract:  Limit theory is provided for a wide class of covariance functionals of a nonstationary process and stationary time series. The results are relevant to estimation and inference in nonlinear nonstationary regressions that involve unit root, local unit root or fractional processes and they include both parametric and nonparametric regressions. Self normalized versions of these statistics are considered that are useful in inference. Numerical evidence reveals a strong bimodality in the ?nite sample distributions that persists for very large sample sizes although the limit theory is Gaussian. New self normalized versions are introduced that deliver improved approximations. 
Keywords:  Endogeneity, Limit theory, Local time, Nonlinear functional, Nonstationarity, Sample covariance, Zero energy 
JEL:  C22 C23 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:cwl:cwldpp:2337&r= 
By:  Sylvia Klosin; Max Vilgalys 
Abstract:  This paper introduces and proves asymptotic normality for a new semiparametric estimator of continuous treatment effects in panel data. Specifically, we estimate an average derivative of the regression function. Our estimator uses the panel structure of data to account for unobservable timeinvariant heterogeneity and machine learning methods to flexibly estimate functions of highdimensional inputs. We construct our estimator using tools from double debiased machine learning (DML) literature. We show the performance of our method in Monte Carlo simulations and also apply our estimator to realworld data and measure the impact of extreme heat in United States (U.S.) agriculture. We use the estimator on a countylevel dataset of corn yields and weather variation, measuring the elasticity of yield with respect to a marginal increase in extreme heat exposure. In our preferred specification, the difference between the estimates from OLS and our method is statistically significant and economically significant. We find a significantly higher degree of impact, corresponding to an additional $1.18 billion in annual damages by the year 2050 under median climate scenarios. We find little evidence that this elasticity is changing over time. 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2207.08789&r= 
By:  Follain, Bertille; Wang, Tengyao; Samworth, Richard J. 
Abstract:  We propose a new method for changepoint estimation in partially observed, highdimensional time series that undergo a simultaneous change in mean in a sparse subset of coordinates. Our first methodological contribution is to introduce a ‘MissCUSUM’ transformation (a generalisation of the popular cumulative sum statistics), that captures the interaction between the signal strength and the level of missingness in each coordinate. In order to borrow strength across the coordinates, we propose to project these MissCUSUM statistics along a direction found as the solution to a penalised optimisation problem tailored to the specific sparsity structure. The changepoint can then be estimated as the location of the peak of the absolute value of the projected univariate series. In a model that allows different missingness probabilities in different component series, we identify that the key interaction between the missingness and the signal is a weighted sum of squares of the signal change in each coordinate, with weights given by the observation probabilities. More specifically, we prove that the angle between the estimated and oracle projection directions, as well as the changepoint location error, are controlled with high probability by the sum of two terms, both involving this weighted sum of squares, and representing the error incurred due to noise and the error due to missingness respectively. A lower bound confirms that our changepoint estimator, which we call MissInspect, is optimal up to a logarithmic factor. The striking effectiveness of the MissInspect methodology is further demonstrated both on simulated data, and on an oceanographic data set covering the Neogene period. 
Keywords:  changepoint estimation; missing data; highdimensional data; segmentation; sparsity; EP/N031938/1; EP/P031447/1; EP/T02772X/1; H2020 European Research Council (GrantNumber(s): 101019498); Wiley deal 
JEL:  C1 
Date:  2022–07–11 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:115014&r= 
By:  Timothy Conley (Western University); Sílvia Gonçalves (McGill University); Min Seong Kim (University of Connecticut); Benoit Perron (Université of Montréal) 
Abstract:  In this paper, we introduce a method of generating bootstrap samples with unknown patterns of cross sectional/spatial dependence which we call the spatial dependent wild bootstrap. This method is a spatial counterpart to the wild dependent bootstrap of Shao (2010) and generates data by multiplying a vector of independently and identically distributed external variables by the eigendecomposition of a bootstrap kernel. We prove the validity of our method for studentized and unstudentized statistics under a linear array representation of the data. Simulation experiments document the potential for improved inference with our approach. We illustrate our method in a ﬁrmlevel regression application investigating the relationship between ﬁrms’ sales growth and the import activity in their local markets using unique ﬁrmlevel and imports data for Canada. 
Keywords:  bootstrap, cross sectional dependence, spatial HAC, eigendecomposition, economic distance 
JEL:  C12 C32 C38 C52 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:uct:uconnp:202214&r= 
By:  Zewei Lin; Dungang Liu 
Abstract:  Model diagnostics is an indispensable component of regression analysis, yet it is not well addressed in standard textbooks on generalized linear models. The lack of exposition is attributed to the fact that when outcome data are discrete, classical methods (e.g., Pearson/deviance residual analysis and goodnessoffit tests) have limited utility in model diagnostics and treatment. This paper establishes a novel framework for model diagnostics of discrete data regression. Unlike the literature defining a singlevalued quantity as the residual, we propose to use a function as a vehicle to retain the residual information. In the presence of discreteness, we show that such a functional residual is appropriate for summarizing the residual randomness that cannot be captured by the structural part of the model. We establish its theoretical properties, which leads to the innovation of new diagnostic tools including the functionalresidualvs covariate plot and FunctiontoFunction (FnFn) plot. Our numerical studies demonstrate that the use of these tools can reveal a variety of model misspecifications, such as not properly including a higherorder term, an explanatory variable, an interaction effect, a dispersion parameter, or a zeroinflation component. The functional residual yields, as a byproduct, LiuZhang's surrogate residual mainly developed for cumulative link models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it considerably broadens the diagnostic scope as it applies to virtually all parametric models for binary, ordinal and count data, all in a unified diagnostic scheme. 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2207.04299&r= 
By:  Shuping Shi (Macquarie University); Peter C. B. Phillips (Cowles Foundation, Yale University, University of Auckland, Singapore Management University, University of Southampton) 
Abstract:  In the presence of bubbles, asset prices consist of a fundamental and a bubble component, with the bubble component following an explosive dynamic. The general idea for bubble identification is to apply explosive root tests to a proxy of the unobservable bubble. Three notable proxies are the real asset prices, log pricepayoff ratios, and estimated nonfundamental components. The rationale for all three proxy choices rests on the definition of bubbles, which has been presented in various forms in the literature. This chapter provides a theoretical framework that incorporates several definitions of bubbles (and fundamentals) and offers guidance for selecting proxies. For explosive root tests, we introduce the recursive evolving test of Phillips et al. (2015b,c) along with its asymptotic properties. This procedure can serve as a realtime monitoring device and has been shown to outperform several other tests. Like all other recursive testing procedures, the PSY algorithm faces the issue of multiplicity in testing that contaminates conventional significance values. To address this issue, we propose a multipletesting algorithm to determine appropriate test critical values and show its satisfactory performance in finite samples by simulations. To illustrate, we conduct a pseudo realtime bubble monitoring exercise in the S&P 500 stock market from January 1990 to June 2020. The empirical results reveal the importance of using a good proxy for bubbles and addressing the multiplicity issue. 
Keywords:  Bubbles; econometrics identification; market fundamental; explosive root; multiplicity; S&P 500 composite index 
JEL:  C15 C22 
Date:  2022–06 
URL:  http://d.repec.org/n?u=RePEc:cwl:cwldpp:2331&r= 
By:  Paolo Brunori (III LSE & University of Florence); Pedro SalasRojo (III LSE); Paolo Brunori (World Bank) 
Abstract:  The measurement of income inequality is affected by missing observations, especially if they are concentrated on the tails of an income distribution. This paper conducts an experiment to test how the different correction methods proposed by the statistical, econometric and machine learning literature address measurement biases of inequality due to item non response. We take a baseline survey and artificially corrupt the data employing several alternative nonlinear functions that simulate patterns of income nonresponse, and show how biased inequality statistics can be when item nonresponses are ignored. The comparative assessment of correction methods indicates that most methods are able to partially correct for missing data biases. Sample reweighting based on probabilities on nonresponse produces inequality estimates quite close to true values in most simulated missing data patterns. Matching and Pareto corrections can also be effective to correct for selected missing data patterns.Other methods, such as Single and Multiple imputations and Machine Learning methods are less effective. A final discussion provides some elements that help explaining these findings. 
Keywords:  Inequality, item nonresponse, missing, prediction 
JEL:  D63 C83 C01 
URL:  http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq&r= 
By:  Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute); Kangying Zhou (Yale School of Management) 
Abstract:  We investigate the performance of nonlinear return prediction models in the high complexity regime, i.e., when the number of model parameters exceeds the number of observations. We document a "virtue of complexity" in all asset classes that we study (US equities, international equities, bonds, commodities, currencies, and interest rates). Specifically, return prediction R2 and optimal portfolio Sharpe ratio generally increase with model parameterization for every asset class. The virtue of complexity is present even in extremely datascarce environments, e.g., for predictive models with less than twenty observations and tens of thousands of predictors. The empirical association between model complexity and outofsample model performance exhibits a striking consistency with theoretical predictions. 
Keywords:  Portfolio choice, machine learning, random matrix theory, benign overfit, overparameterization 
JEL:  C3 C58 C61 G11 G12 G14 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:chf:rpseri:rp2257&r= 
By:  Shukla, Sumedha; Arora, Gaurav 
Keywords:  Research Methods/Statistical Methods, Agricultural and Food Policy, Agricultural Finance 
Date:  2022–08 
URL:  http://d.repec.org/n?u=RePEc:ags:aaea22:322569&r= 
By:  Bartosz Uniejewski; Katarzyna Maciejowska 
Abstract:  This paper develops a novel, fully automated forecast averaging scheme, which combines LASSO estimation method with Principal Component Averaging (PCA). LASSOPCA (LPCA) explores a pool of predictions based on a single model but calibrated to windows of different sizes. It uses information criteria to select tuning parameters and hence reduces the impact of researchers' at hock decisions. The method is applied to average predictions of hourly dayahead electricity prices over 650 point forecasts obtained with various lengths of calibration windows. It is evaluated on four European and American markets with almost two and a half year of outofsample period and compared to other semi and fully automated methods, such as simple mean, AW/WAW, LASSO and PCA. The results indicate that the LASSO averaging is very efficient in terms of forecast error reduction, whereas PCA method is robust to the selection of the specification parameter. LPCA inherits the advantages of both methods and outperforms other approaches in terms of MAE, remaining insensitive the the choice of a tuning parameter. 
Date:  2022–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2207.04794&r= 