
on Econometrics 
By:  Wang, Wenjie 
Abstract:  Bootstrap procedures based on instrumental variable (IV) estimates or tstatistics are generally invalid when the instruments are weak. The bootstrap may even fail when applied to identificationrobust test statistics. For subvector inference based on the AndersonRubin (AR) statistic, Wang and Doko Tchatoka (2018) show that the residual bootstrap is inconsistent under weak IVs. In particular, the residual bootstrap depends on certain estimator of structural parameters to generate bootstrap pseudodata, while the estimator is inconsistent under weak IVs. It is thus tempting to consider nonparametric bootstrap. In this note, under the assumptions of conditional homoskedasticity and one nuisance structural parameter, we investigate the bootstrap consistency for the subvector AR statistic based on the nonparametric i.i.d. bootstap and its recentered version proposed by Hall and Horowitz (1996). We find that both procedures are inconsistent under weak IVs: although able to mimic the weakidentification situation in the data, both procedures result in approximation errors, which leads to the discrepancy between the bootstrap world and the original sample. In particular, both bootstrap tests can be very conservative under weak IVs. 
Keywords:  Nonparametric Bootstrap; Weak Identification; Weak Instrument; Subvector Inference; AndersonRubin Test. 
JEL:  C1 C12 C13 C26 
Date:  2020–03–07 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:99109&r=all 
By:  Rui Duan; C. Jason Liang; Pamela Shaw; Cheng Yong Tang; Yong Chen 
Abstract:  Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. The conventional notions include the three common potential classes  missing completely at random, missing at random, and missing not at random. In this paper, we present a new hypothesis testing approach for deciding between missing at random and missing not at random. Since the potential alternatives of missing at random are broad, we focus our investigation on a general class of models with instrumental variables for data missing not at random. Our setting is broadly applicable, thanks to that the model concerning the missing data is nonparametric, requiring no explicit model specification for the data missingness. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our new hypothesis testing approach achieves an objective data oriented choice between missing at random or not. We demonstrate the feasibility, validity, and efficacy of the new test by theoretical analysis, simulation studies, and a real data analysis. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.11181&r=all 
By:  Víctor MoralesOñate; Federico Crudu; Moreno Bevilacqua 
Abstract:  In this paper we propose a spatiotemporal blockwise Euclidean likelihood method for the estimation of covariance models when dealing with large spatiotemporal Gaussian data. The method uses moment conditions coming from the score of the pairwise composite likelihood. The blockwise approach guarantees considerable computational improvements over the standard pairwise composite likelihood method. In order to further speed up computation we consider a general purpose graphics processing unit implementation using OpenCL. We derive the asymptotic properties of the proposed estimator and we illustrate the nite sampleproperties of our methodology by means of a simulation study highlighting the computational gains of the OpenCL graphics processing unit implementation. Finally, we apply our estimation method to a wind component data set. 
Keywords:  Composite likelihood; Euclidean likelihood; Gaussian random elds; Parallel computing; OpenCL 
JEL:  C14 C21 C23 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:usi:wpaper:822&r=all 
By:  Weber, Frank; Glass, Anne; Kundt, Günther; Knapp, Guido; Ickstadt, Katja 
Abstract:  There exists a variety of interval estimators for the overall treatment effect in a randomeffects metaanalysis. A recent literature review summarizing existing methods suggested that in most situations, the HartungKnapp/SidikJonkman (HKSJ) method was preferable. However, a quantitative comparison of those methods in a common simulation study is still lacking. Thus, we conduct such a simulation study for continuous and binary outcomes, focusing on the medical field for application. Based on the literature review and some new theoretical considerations, a practicable number of interval estimators is selected for this comparison: the classical normalapproximation interval using the DerSimonianLaird heterogeneity estimator, the HKSJ interval using either the PauleMandel or the SidikJonkman heterogeneity estimator, the Skovgaard higherorder profile likelihood interval, a parametric bootstrap interval, and a Bayesian interval using different priors. We evaluate the performance measures (coverage and interval length) at specific points in the parameter space, i.e. not averaging over a prior distribution. In this sense, our study is conducted from a frequentist point of view. We confirm the main finding of the literature review, the general recommendation of the HKSJ method (here with the SidikJonkman heterogeneity estimator). For metaanalyses including only 2 studies, the high length of the HKSJ interval limits its practical usage. In this case, the Bayesian interval using a weakly informative prior for the heterogeneity may help. Our recommendations are illustrated using a realworld metaanalysis dealing with the efficacy of an intramyocardial bone marrow stem cell transplantation during coronary artery bypass grafting. 
Date:  2020–03–16 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:5zbh6&r=all 
By:  Natalia Bailey; George Kapetanios; M. Hashem Pesaran 
Abstract:  This paper proposes an estimator of factor strength and establishes its consistency and asymptotic distribution. The proposed estimator is based on the number of statistically significant factor loadings, taking account of the multiple testing problem. We focus on the case where the factors are observed which is of primary interest in many applications in macroeconomics and finance. We also consider using cross section averages as a proxy in the case of unobserved common factors. We face a fundamental factor identification issue when there are more than one unobserved common factors. We investigate the small sample properties of the proposed estimator by means of Monte Carlo experiments under a variety of scenarios. In general, we find that the estimator, and the associated inference, perform well. The test is conservative under the null hypothesis, but, nevertheless, has excellent power properties, especially when the factor strength is sufficiently high. Application of the proposed estimation strategy to factor models of asset returns shows that out of 146 factors recently considered in the finance literature, only the market factor is truly strong, while all other factors are at best semistrong, with their strength varying considerably over time. Similarly, we only find evidence of semistrong factors in an updated version of the Stock and Watson (2012) macroeconomic dataset. 
Keywords:  factor models, factor strength, measures of pervasiveness, crosssectional dependence, market factor 
JEL:  C38 E20 G20 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:ces:ceswps:_8146&r=all 
By:  TaeHwan Kim (School of Economics, Yonsei University  Yonsei University); Christophe Muller (AMSE  AixMarseille Sciences Economiques  EHESS  École des hautes études en sciences sociales  AMU  Aix Marseille Université  ECM  École Centrale de Marseille  CNRS  Centre National de la Recherche Scientifique) 
Abstract:  In this paper, we propose a new variance reduction method for quantile regressions with endogeneity problems, for alphamixing or mdependent covariates and error terms. First, we derive the asymptotic distribution of twostage quantile estimators based on the fittedvalue approach under very general conditions. Second, we exhibit an inconsistency transmission property derived from the asymptotic representation of our estimator. Third, using a reformulation of the dependent variable, we improve the efficiency of the twostage quantile estimators by exploiting a tradeoff between an inconsistency confined to the intercept estimator and a reduction of the variance of the slope estimator. Monte Carlo simulation results show the fine performance of our approach. In particular, by combining quantile regressions with firststage trimmed leastsquares estimators, we obtain more accurate slope estimates than 2SLS, 2SLAD and other estimators for a broad set of distributions. Finally, we apply our method to food demand equations in Egypt. 
Keywords:  Twostage estimation,Variance reduction,Quantile regression,Asymptotic bias 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal02084505&r=all 
By:  Huang, Wenxin (Shanghai Jiao Tong University); Jin, Sainan (School of Economics, Singapore Management University); Phillips, Peter C.B. (Yale University); Su, Liangjun (School of Economics, Singapore Management University) 
Abstract:  This paper proposes a novel Lassobased approach to handle unobserved parameter heterogeneity and crosssection dependence in nonstationary panel models. In particular, a penalized principal component (PPC) method is developed to estimate groupspecific longrun relationships and unobserved common factors and jointly to identify the unknown group membership. The PPC estimators are shown to be consistent under weakly dependent innovation processes. But they suffer an asymptotically nonnegligible bias from correlations between the nonstationary regressors and unobserved stationary common factors and/or the equation errors. To remedy these shortcomings we provide three biascorrection procedures under which the estimators are recentered about zero as both dimensions (N and T) of the panel tend to infinity. We establish a mixed normal limit theory for the estimators of the groupspecific longrun coefficients, which permits inference using standard test statistics. Simulations suggest the good finite sample performance of the proposed method. An empirical application applies the methodology to study international R&D spillovers and the results offer a convincing explanation for the growth convergence puzzle through the heterogeneous impact of R&D spillovers. 
Keywords:  Nonstationarity; Parameter heterogeneity; Latent group patterns; Penalized principal component; Crosssection dependence; Classifier Lasso; R&D spillovers 
JEL:  C13 C33 C38 C51 F43 O32 O40 
Date:  2020–03–24 
URL:  http://d.repec.org/n?u=RePEc:ris:smuesw:2020_007&r=all 
By:  Su, Liangjun (School of Economics, Singapore Management University); Wang, Xia (Lingnan (University) College, Sun Yatsen University) 
Abstract:  We note that Su and Wang (2017, On Timevarying Factor Models: Estimation and Testing, Journal of Econometrics 198, 84101) ignore the bias terms when estimating the timevarying factor models. In this note, we correct the theoretical results on the estimation of timevarying factor models. The asymptotic results for testing the correct speciﬁcation of time invariant factor loadings are not aﬀected. 
Keywords:  Approximation error; Bias; Correction; Factor Model; Timevarying 
JEL:  C12 C14 C33 C38 
Date:  2020–02–27 
URL:  http://d.repec.org/n?u=RePEc:ris:smuesw:2020_008&r=all 
By:  Shantanu Gupta; Zachary C. Lipton; David Childers 
Abstract:  Given a causal graph, the docalculus can express treatment effects as functionals of the observational joint distribution that can be estimated empirically. Sometimes the docalculus identifies multiple valid formulae, prompting us to compare the statistical properties of the corresponding estimators. For example, the backdoor formula applies when all confounders are observed and the frontdoor formula applies when an observed mediator transmits the causal effect. In this paper, we investigate the overidentified scenario where both confounders and mediators are observed, rendering both estimators valid. Addressing the linear Gaussian causal model, we derive the finitesample variance for both estimators and demonstrate that either estimator can dominate the other by an unbounded constant factor depending on the model parameters. Next, we derive an optimal estimator, which leverages all observed variables to strictly outperform the backdoor and frontdoor estimators. We also present a procedure for combining two datasets, with confounders observed in one and mediators in the other. Finally, we evaluate our methods on both simulated data and the IHDP and JTPA datasets. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.11991&r=all 
By:  Ben Jann 
Abstract:  In Jann (2019) I provided some reflections on influence functions for linear regression (with an application to regression adjustment). Based on an analogy to variance estimation in the generalized method of moments (GMM), I extend the discussion in this paper to maximumlikelihood models such as logistic regression and then provide influence functions for a variety of treatment effect estimators such as inverseprobability weighting (IPW), regression adjustment (RA), inverseprobability weighted regression adjustment (IPWRA), exact matching (EM), Mahalanobis distance matching (MD), and entropy balancing (EB). The goal of this exercise is to provide a framework for standard error estimation in all these estimators. 
Keywords:  influence function, sampling variance, standard error, generalized method of moments, maximum likelihood, logistic regression, inverseprobability weighting, inverseprobability weighted regression adjustment, exact matching, Mahalanobis distance matching, entropy balancing, average treatment effect, causal inference 
JEL:  C01 C12 C13 C21 C25 C31 C83 C87 
Date:  2020–03–28 
URL:  http://d.repec.org/n?u=RePEc:bss:wpaper:35&r=all 
By:  Tyrel Stokes; Russell Steele; Ian Shrier 
Abstract:  Recent theoretical work in causal inference has explored an important class of variables which, when conditioned on, may further amplify existing unmeasured confounding bias (bias amplification). Despite this theoretical work, existing simulations of bias amplification in clinical settings have suggested bias amplification may not be as important in many practical cases as suggested in the theoretical literature.We resolve this tension by using tools from the semiparametric regression literature leading to a general characterization in terms of the geometry of OLS estimators which allows us to extend current results to a larger class of DAGs, functional forms, and distributional assumptions. We further use these results to understand the limitations of current simulation approaches and to propose a new framework for performing causal simulation experiments to compare estimators. We then evaluate the challenges and benefits of extending this simulation approach to the context of a real clinical data set with a binary treatment, laying the groundwork for a principled approach to sensitivity analysis for bias amplification in the presence of unmeasured confounding. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.08449&r=all 
By:  Yuan, Huiling; Zhou, Yong; Xu, Lu; Sun, Yulei; Cui, Xiangyu 
Abstract:  Volatility asymmetry is a hot topic in highfrequency financial market. In this paper, we propose a new econometric model, which could describe volatility asymmetry based on highfrequency historical data and lowfrequency historical data. After providing the quasimaximum likelihood estimators for the parameters, we establish their asymptotic properties. We also conduct a series of simulation studies to check the finite sample performance and volatility forecasting performance of the proposed methodologies. And an empirical application is demonstrated that the new model has stronger volatility prediction power than GARCHIt\^{o} model in the literature. 
Date:  2020–03–27 
URL:  http://d.repec.org/n?u=RePEc:osf:socarx:hkzdr&r=all 
By:  Einmahl, John (Tilburg University, Center For Economic Research); Ferreira, Ana; de Haan, Laurens; Neves, C.; Zhou, C. 
Abstract:  The statistical theory of extremes is extended to observations that are nonstationary and not indepen dent. The nonstationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail quantile process based on all observations, taken over time and space. The results yield two statistical tests for homoscedastic ity in the tail, one in space and one in time. Further, we show that the common extreme value index can be estimated via a pseudomaximum likelihood procedure based on pooling all (nonstationary and dependent) observations. Our leading example and application is rainfall in Northern Germany. 
Keywords:  Multivariate extreme value statistics; nonidentical distributions; sequential tail empirical process; testing 
JEL:  C12 C13 C14 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:tiu:tiucen:ae5818cdf07142759577d7b816807429&r=all 
By:  LeSage, James P.; Fischer, Manfred M. 
Abstract:  In this paper, we introduce a model of trade flows between countries over time that allows for network dependence in flows, based on sociocultural connectivity structures. We show that conventional multidimensional fixed effects model specifications exhibit crosssectional dependence between countries that should be modeled to avoid simultaneity bias. Given that the source of network interaction is unknown, we propose a panel gravity model that examines multiple network interaction structures, using Bayesian model probabilities to determine those most consistent with the sample data. This is accomplished with the use of computationally efficient Markov Chain Monte Carlo estimation methods that produce a Monte Carlo integration estimate of the logmarginal likelihood that can be used for model comparison. Application of the model to a panel of trade flows points to network spillover effects, suggesting the presence of network dependence and biased estimates from conventional trade flow specifications. The most important sources of network dependence were found to be membership in trade organizations, historical colonial ties, common currency, and spatial proximity of countries. 
Keywords:  origindestination panel data ows, crosssectional dependence, MCMC estimation, logmarginal likelihood, gravity models of trade, sociocultural distance 
Date:  2020–03–30 
URL:  http://d.repec.org/n?u=RePEc:wiw:wus046:7534&r=all 
By:  Davide Viviano 
Abstract:  This paper discusses the problem of the design of experiments under network interference. We allow for a possibly fully connected network and a general class of estimands, which encompasses average treatment and average spillover effects, as well as estimands obtained from interactions of the two. We discuss a nearoptimal design mechanism, where the experimenter optimizes over participants and treatment assignments to minimize the variance of the estimators of interest, using a firstwave experiment for estimation of the variance. We guarantee valid asymptotic inference on causal effects using either parametric or nonparametric estimators under the proposed experimental design, allowing for local dependence of potential outcomes, arbitrary dependence of the treatment assignment indicators, and spillovers across units. We showcase asymptotic optimality and finitesample upper bounds on the regret of the proposed design mechanism. Simulations illustrate the advantage of the method over stateofart methodologies. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.08421&r=all 
By:  Elena Dumitrescu; Sullivan Hué; Christophe Hurlin (University of Orleans  LEO); Sessi Tokpavi (LEO  Laboratoire d'économie d'Orleans  UO  Université d'Orléans  CNRS  Centre National de la Recherche Scientifique) 
Abstract:  Decision trees and related ensemble methods like random forest are stateoftheart tools in the field of machine learning for credit scoring. Although they are shown to outperform logistic regression, they lack interpretability and this drastically reduces their use in the credit risk management industry, where decisionmakers and regulators need transparent score functions. This paper proposes to get the best of both worlds, introducing a new, simple and interpretable credit scoring method which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various shortdepth decision trees built with couples of predictive variables are used as predictors in a penalized or regularized logistic regression. By modeling such univariate and bivariate threshold effects, we achieve significant improvement in model performance for the logistic regression while preserving its simple interpretation. Applications using simulated and four real credit defaults datasets show that our new method outperforms traditional logistic regressions. Moreover, it compares competitively to random forest, while providing an interpretable scoring function. JEL Classification: G10 C25, C53 
Keywords:  Credit scoring,Machine Learning,Risk management,Interpretability,Econometrics 
Date:  2020–03–13 
URL:  http://d.repec.org/n?u=RePEc:hal:wpaper:hal02507499&r=all 
By:  Vladimir Hlasny (The World Bank Group) 
Abstract:  Approximating the top of income distributions with smooth parametric forms is valuable for descriptive purposes, as well as for correcting income distributions for various topincome measurement and sampling problems. The proliferation of distinct branches of modeling literature over the past decades has given rise to the need to survey the alternative modeling options and develop systematic tools to discriminate among them. This paper reviews the state of methodological and empirical knowledge regarding the adoptable distribution functions, and lists references and statistical programs allowing practitioners to apply these parametric models to microdata in household income surveys, administrative registers, or groupedrecords data from national accounts statistics. Implications for modeling the distribution of other economic outcomes including consumption and wealth are drawn. For incomes, recent consensus shows that among the many candidate distribution functions, only a handful have proved to be consistently successful, namely the generalized Pareto, and the 3–4 parameter distributions in the generalized beta family, including the SinghMaddala and the GB2 distributions. Understanding these functions in relation to other known alternatives is one contribution of this review. 
Keywords:  statistical size distribution of incomes, top incomes measuremen, parametric estimation, extreme value theory, Pareto, inequality 
JEL:  C1 D31 D63 
URL:  http://d.repec.org/n?u=RePEc:tul:ceqwps:90&r=all 
By:  Eric Blankmeyer 
Abstract:  An economic model of crime is used to explore the consistent estimation of a simultaneous linear equation without recourse to instrumental variables. A maximumlikelihood procedure (NISE) is introduced, and its results are compared to ordinary least squares and twostage least squares. The paper is motivated by previous research on the crime model and by the wellknown practical problem that valid instruments are frequently unavailable. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.07860&r=all 
By:  Beltempo, Marc (McGill University); Bresson, Georges (University of Paris 2); Etienne, JeanMichel (Université ParisSud); Lacroix, Guy (Université Laval) 
Abstract:  The paper investigates the effects of nursing overtime on nosocomial infections and medical accidents in a neonatal intensive care unit (NICU). The literature lacks clear evidence on this issue and we conjecture that this may be due to empirical and methodological factors. We thus focus on a single NICU, thereby removing much variation in specialty mixes such neonatologists, fellows, residents, nurse practitioners that are observed across units. We model the occurrences of both outcomes using a sample of 3,979 neonates which represents over 84,846 observations (infant/days). We use a semiparametric panel data Logit model with random coefficients. The nonparametric components of the model allow to unearth potentially highly nonlinear relationships between the outcomes and various policyrelevant covariates. We use the mean field variational Bayes approximation method to estimate the models. Our results show unequivocally that both health outcomes are affected by nursing overtime. Furthermore, they are both highly sensitive to infant and NICUrelated characteristics. 
Keywords:  neonatal health outcomes, nursing overtime, semiparametric panel data logit model, mean field variational Bayes, random coefficients 
JEL:  I1 J2 C11 C14 C23 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:iza:izadps:dp13046&r=all 
By:  Daniel Bartl; Ludovic Tangpi 
Abstract:  Consider the problem of computing the riskiness $\rho(F(S))$ of a financial position $F$ written on the underlying $S$ with respect to a general law invariant risk measure $\rho$; for instance, $\rho$ can be the average value at risk. In practice the true distribution of $S$ is typically unknown and one needs to resort to historical data for the computation. In this article we investigate rates of convergence of $\rho(F(S_N))$ to $\rho(F(S))$, where $S_N$ is distributed as the empirical measure of $S$ with $N$ observations. We provide (sharp) nonasymptotic rates for both the deviation probability and the expectation of the estimation error. Our framework further allows for hedging, and the convergence rates we obtain depend neither on the dimension of the underlying stocks nor on the number of options available for trading. 
Date:  2020–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2003.10479&r=all 
By:  Wolf, Elias; Mokinski, Frieder; Schüler, Yves 
Abstract:  We show that one should not use the onesided HodrickPrescott filter (HP1s) as the realtime version of the twosided HodrickPrescott filter (HP2s): First, in terms of the extracted cyclical component, HP1s fails to remove lowfrequency fluctuations to the same extent as HP2s. Second, HP1s dampens fluctuations at all frequencies  even those it is meant to extract. As a remedy, we propose two small adjustments to HP1s, aligning its properties closely with HP2s: (1) a lower value for the smoothing parameter and (2) a multiplicative rescaling of the extracted cyclical component. For example, for HP2s with = 1,600 (value of smoothing parameter), the adjusted onesided HP filter uses = 650 and rescales the extracted cyclical component by a factor of 1:1513. Using simulated and empirical data, we illustrate the relevance of the adjustments. For instance, financial cycles may appear 1.7 times more volatile than business cycles, where in fact volatilities differ only marginally. 
Keywords:  Realtime analysis,detrending,business cycles,financial cycles 
JEL:  C10 E32 E58 G01 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:zbw:bubdps:112020&r=all 