
on Econometrics 
By:  Fabrizio Iacone (Universita degli Studi di Milano); Morten Ørregaard Nielsen (Queen's University and CREATES); A.M. Robert Taylor (University of Essex) 
Abstract:  Lobato and Robinson (1998) develop semiparametric tests for the null hypothesis that a series is weakly autocorrelated, or I(0), about a constant level, against fractionally integrated alternatives. These tests have the advantage that the user is not required to specify a parametric model for any weak autocorrelation present in the series. We extend this approach in two distinct ways. First, we show that it can be generalised to allow for testing of the null hypothesis that a series is I(\delta) for any \delta lying in the usual stationary and invertible region of the parameter space. Second, it is well known in the literature that long memory and level breaks can be mistaken for one another, with unmodelled level breaks rendering fractional integration tests highly unreliable. We therefore extend the Lobato and Robinson (1998) approach to allow for the possibility of changes in level at unknown points in the series. We show that the resulting statistics have standard limiting null distributions, and that the tests based on these statistics attain the same asymptotic local power functions as infeasible tests based on the unobserved errors, and hence there is no loss in asymptotic local power from allowing for level breaks, even where none is present. We report results from a Monte Carlo study into the finitesample behaviour of our proposed tests, as well as several empirical examples. 
Keywords:  fractional integration, level breaks, Lagrange multiplier testing principle, spurious long memory, local Whittle likelihood, conditional heteroskedasticity 
JEL:  C22 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:qed:wpaper:1431&r=all 
By:  Linton, O.; Tang, H. 
Abstract:  We propose a new estimator, the quadratic form estimator, of the Kronecker product model for covariance matrices. We show that this estimator has good properties in the large dimensional case (i.e., the crosssectional dimension n is large relative to the sample size T ). In particular, the quadratic form estimator is consistent in a relative Frobenius norm sense provided log 3 n/T → 0. We obtain the limiting distributions of Lagrange multiplier (LM) and Wald tests under both the null and local alternatives concerning the mean vector μ. Testing linear restrictions of μ is also investigated. Finally, our methodology performs well in the finitesample situations both when the Kronecker product model is true, and when it is not true. 
Keywords:  Covariance matrix, Kronecker product, Quadratic form, Lagrange multiplier test, Wald test 
Date:  2020–06–01 
URL:  http://d.repec.org/n?u=RePEc:cam:camdae:2050&r=all 
By:  Christoph Breunig; Xiaohong Chen 
Abstract:  This paper proposes simple, datadriven, optimal rateadaptive inferences on a structural function in seminonparametric conditional moment restrictions. We consider two types of hypothesis tests based on leaveoneout sieve estimators. A structurespace test (ST) uses a quadratic distance between the structural functions of endogenous variables; while an imagespace test (IT) uses a quadratic distance of the conditional moment from zero. For both tests, we analyze their respective classes of nonparametric alternative models that are separated from the null hypothesis by the minimax rate of testing. That is, the sum of the type I and the type II errors of the test, uniformly over the class of nonparametric alternative models, cannot be improved by any other test. Our new minimax rate of ST differs from the known minimax rate of estimation in nonparametric instrumental variables (NPIV) models. We propose computationally simple and novel exponential scan datadriven choices of sieve regularization parameters and adjusted chisquared critical values. The resulting tests attain the minimax rate of testing, and hence optimally adapt to the unknown smoothness of functions and are robust to the unknown degree of illposedness (endogeneity). Datadriven confidence sets are easily obtained by inverting the adaptive ST. Monte Carlo studies demonstrate that our adaptive ST has good size and power properties in finite samples for testing monotonicity or equality restrictions in NPIV models. Empirical applications to nonparametric multiproduct demands with endogenous prices are presented. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.09587&r=all 
By:  Rustam Ibragimov; Jihyun Kim; Anton Skrobotov 
Abstract:  We propose two robust methods for testing hypotheses on unknown parameters of predictive regression models under heterogeneous and persistent volatility as well as endogenous, persistent and/or fattailed regressors and errors. The proposed robust testing approaches are applicable both in the case of discrete and continuous time models. Both of the methods use the Cauchy estimator to effectively handle the problems of endogeneity, persistence and/or fattailedness in regressors and errors. The difference between our two methods is how the heterogeneous volatility is controlled. The first method relies on robust tstatistic inference using group estimators of a regression parameter of interest proposed in Ibragimov and Muller, 2010. It is simple to implement, but requires the exogenous volatility assumption. To relax the exogenous volatility assumption, we propose another method which relies on the nonparametric correction of volatility. The proposed methods perform well compared with widely used alternative inference procedures in terms of their finite sample properties. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.01191&r=all 
By:  Jochmans, K. 
Abstract:  We consider inference in linear regression models that is robust to heteroskedasticity and the presence of many control variables. When the number of control variables increases at the same rate as the sample size the usual heteroskedasticityrobust estimators of the covariance matrix are inconsistent. Hence, tests based on these estimators are size distorted even in large samples. An alternative covariancematrix estimator for such a setting is presented that complements recent work by Cattaneo, Jansson and Newey (2018). We provide highlevel conditions for our approach to deliver (asymptotically) sizecorrect inference as well as more primitive conditions for three special cases. Simulation results and an empirical illustration to inference on the union premium are also provided. 
Keywords:  heteroskedasticity, inference, many regressors, statistical leverage 
JEL:  C12 
Date:  2020–04–28 
URL:  http://d.repec.org/n?u=RePEc:cam:camdae:2033&r=all 
By:  Zhang, Haoran; Chen, Yunxiao; Li, Xiaoou 
Abstract:  We revisit a singular value decomposition (SVD) algorithm given in Chen et al. (2019b) for exploratory Item Factor Analysis (IFA). This algorithm estimates a mul tidimensional IFA model by SVD and was used to obtain a starting point for joint maximum likelihood estimation in Chen et al. (2019b). Thanks to the analytic and computational properties of SVD, this algorithm guarantees a unique solution and has computational advantage over other exploratory IFA methods. Its computational ad vantage becomes significant when the numbers of respondents, items, and factors are all large. This algorithm can be viewed as a generalization of principal component analysis (PCA) to binary data. In this note, we provide the statistical underpinning of the algorithm. In particular, we show its statistical consistency under the same double asymptotic setting as in Chen et al. (2019b). We also demonstrate how this algorithm provides a scree plot for investigating the number of factors and provide its asymptotic theory. Further extensions of the algorithm are discussed. Finally, simulation studies suggest that the algorithm has good finite sample performance. 
Keywords:  exploratory item factor analysis; IFA; singular value decomposition; double asymptotics; generalised PCA fir binary data 
JEL:  C1 
Date:  2020–05–26 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:104166&r=all 
By:  Sobin Joseph; Lekhapriya Dheeraj Kashyap; Shashi Jain 
Abstract:  Multidimensional Hawkes process (MHP) is a class of self and mutually exciting point processes that find wide range of applications  from prediction of earthquakes to modelling of order books in high frequency trading. This paper makes two major contributions, we first find an unbiased estimator for the loglikelihood estimator of the Hawkes process to enable efficient use of the stochastic gradient descent method for maximum likelihood estimation. The second contribution is, we propose a specific single hidden layered neural network for the nonparametric estimation of the underlying kernels of the MHP. We evaluate the proposed model on both synthetic and real datasets, and find the method has comparable or better performance than existing estimation methods. The use of shallow neural network ensures that we do not compromise on the interpretability of the Hawkes model, while at the same time have the flexibility to estimate any nonstandard Hawkes excitation kernel. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.02460&r=all 
By:  Paolo Frumento; Matteo Bottai; Iv\'an Fern\'andezVal 
Abstract:  In ordinary quantile regression, quantiles of different order are estimated one at a time. An alternative approach, which is referred to as quantile regression coefficients modeling (QRCM), is to model quantile regression coefficients as parametric functions of the order of the quantile. In this paper, we describe how the QRCM paradigm can be applied to longitudinal data. We introduce a twolevel quantile function, in which two different quantile regression models are used to describe the (conditional) distribution of the withinsubject response and that of the individual effects. We propose a novel type of penalized fixedeffects estimator, and discuss its advantages over standard methods based on $\ell_1$ and $\ell_2$ penalization. We provide model identifiability conditions, derive asymptotic properties, describe goodnessoffit measures and model selection criteria, present simulation results, and discuss an application. The proposed method has been implemented in the R package qrcm. 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.00160&r=all 
By:  Yuya Sasaki; Yulong Wang 
Abstract:  Common approaches to statistical inference for structural and reducedform parameters in empirical economic analysis are based on the rootn asymptotic normality of the GMM and M estimators. The canonical rootn asymptotic normality for these classes of the estimators requires at least the second moment of the score to be bounded. In this article, we present a method of testing this condition for the asymptotic normality of the GMM and M estimators. Our test has a uniform size control over the set of data generating processes compatible with the rootn asymptotic normality. Simulation studies support this theoretical result. Applying the proposed test to the market share data from the Dominick's Finer Foods retail chain, we find that a common ad hoc procedure to deal with zero market shares results in a failure of the rootn asymptotic normality 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.02541&r=all 
By:  Joris Pinkse; Karl Schurter 
Abstract:  We estimate the density and its derivatives using a local polynomial approximation to the logarithm of an unknown density $f$. The estimator is guaranteed to be nonnegative and achieves the same optimal rate of convergence in the interior as well as the boundary of the support of $f$. The estimator is therefore wellsuited to applications in which nonnegative density estimates are required, such as in semiparametric maximum likelihood estimation. In addition, we show that our estimator compares favorably with other kernelbased methods, both in terms of asymptotic performance and computational ease. Simulation results confirm that our method can perform similarly in finite samples to these alternative methods when they are used with optimal inputs, i.e. an Epanechnikov kernel and optimally chosen bandwidth sequence. Further simulation evidence demonstrates that, if the researcher modifies the inputs and chooses a larger bandwidth, our approach can even improve upon these optimized alternatives, asymptotically. We provide code in several languages. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.01328&r=all 
By:  BaoGen Li; DianYi Ling; ZuGuo Yu 
Abstract:  When common factors strongly influence two crosscorrelated time series recorded in complex natural and social systems, the results will be biased if we use multifractal detrended crosscorrelation analysis (MFDXA) without considering these common factors. Based on multifractal temporally weighted detrended crosscorrelation analysis (MFTWXDFA) proposed by our group and multifractal partial crosscorrelation analysis (MFDPXA) proposed by Qian et al., we propose a new methodmultifractal temporally weighted detrended partial crosscorrelation analysis (MFTWDPCCA) to quantify intrinsic powerlaw crosscorrelation of two nonstationary time series affected by common external factors in this paper. We use MFTWDPCCA to characterize the intrinsic crosscorrelations between the two simultaneously recorded time series by removing the effects of other potential time series. To test the performance of MFTWDPCCA, we apply it, MFTWXDFA and MFDPXA on simulated series. Numerical tests on artificially simulated series demonstrate that MFTWDPCCA can accurately detect the intrinsic crosscorrelations for two simultaneously recorded series. To further show the utility of MFTWDPCCA, we apply it on time series from stock markets and find that there exists significantly multifractal powerlaw crosscorrelation between stock returns. A new partial crosscorrelation coefficient is defined to quantify the level of intrinsic crosscorrelation between two time series. 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.09154&r=all 
By:  Nicklas Werge (LPSM); Olivier Wintenberger (LPSM) 
Abstract:  The QuasiMaximum Likelihood (QML) procedure is widely used for statistical inference due to its robustness against overdispersion. However, while there are extensive references on nonrecursive QML estimation, recursive QML estimation has attracted little attention until recently. In this paper, we investigate the convergence properties of the QML procedure in a general conditionally heteroscedastic time series model, extending the classical offline optimization routines to recursive approximation. We propose an adaptive recursive estimation routine for GARCH models using the technique of Variance Targeting Estimation (VTE) to alleviate the convergence difficulties encountered in the usual QML estimation. Finally, empirical results demonstrate a favorable tradeoff between the ability to adapt to timevarying estimates and stability of the estimation routine. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.02077&r=all 
By:  Miranda Gualdrón, Karen Alejandra; Ruiz Ortega, Esther; Poncela Blanco, Maria Pilar 
Abstract:  Dynamic Factor Models, which assume the existence of a small number of unobservedlatent factors that capture the comovements in a system of variables, are the main "bigdata" tool used by empirical macroeconomists during the last 30 years. One importanttool to extract the factors is based on Kalman lter and smoothing procedures that cancope with missing data, mixed frequency data, timevarying parameters, nonlinearities,nonstationarity and many other characteristics often observed in real systems of economicvariables. This paper surveys the literature on latent common factors extracted using Kalmanfilter and smoothing procedures in the context of Dynamic Factor Models. Signal extractionand parameter estimation issues are separately analyzed. Identi cation issues are also tackledin both stationary and nonstationary models. Finally, empirical applications are surveyedin both cases. 
Keywords:  StateSpace Model; Identi Cation; Em Algorithm; Dynamic Factor Model 
Date:  2020–06–25 
URL:  http://d.repec.org/n?u=RePEc:cte:wsrepe:30644&r=all 
By:  Juan Carlos Escanciano 
Abstract:  This paper provides new uniform rate results for kernel estimators of absolutely regular stationary processes that are uniform in the bandwidth and in infinitedimensional classes of dependent variables and regressors. Our results are useful for establishing asymptotic theory for twostep semiparametric estimators in time series models. We apply our results to obtain nonparametric estimates and their rates for Expected Shortfall processes. 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2005.09951&r=all 
By:  Ghattas Badih (I2M  Institut de Mathématiques de Marseille  AMU  Aix Marseille Université  ECM  École Centrale de Marseille  CNRS  Centre National de la Recherche Scientifique); Michel Pierre (I2M  Institut de Mathématiques de Marseille  AMU  Aix Marseille Université  ECM  École Centrale de Marseille  CNRS  Centre National de la Recherche Scientifique, CEReSS  Centre d'études et de recherche sur les services de santé et la qualité de vie  AMU  Aix Marseille Université); Boyer Laurent (CEReSS  Centre d'études et de recherche sur les services de santé et la qualité de vie  AMU  Aix Marseille Université) 
Abstract:  We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a nonparametric topdown hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman's classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations. 
Keywords:  Variables ranking,Variable importance,Unsupervised learning,CUBT,Deviance 
Date:  2019–03 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal02007388&r=all 
By:  Xu Cheng (University of Pennsylvania); Winston Wei Dou (University of Pennsylvania); Zhipeng Liao (University of California, Los Angeles) 
Abstract:  This paper shows that robust inference under weak identi?cation is important to the evaluation of many in?uential macro asset pricing models, including longrun risk models, disaster risk models, and multifactor linear asset pricing models. Building on recent developments in the conditional inference literature, we provide a new speci?cation test by simulating the critical value conditional on a su?cient statistic. This su?cient statistic can be intuitively interpreted as a measure capturing the macroeconomic information decoupled from the underlying content of asset pricing theories. Macro?nance decoupling is an e?ective way to improve the power of our speci?cation test when asset pricing theories are di?cult to refute due to an imbalance in the information content about the key model parameters between macroeconomic moment restrictions and asset pricing crossequation restrictions. 
Keywords:  Asset Pricing, Conditional Inference, Disaster Risk, LongRun Risk, Factor Models, Speci?cation Test, Weak Identi?cation 
JEL:  C12 C32 C52 G12 
Date:  2020–05–24 
URL:  http://d.repec.org/n?u=RePEc:pen:papers:20019&r=all 
By:  Kaeding, Matthias 
Abstract:  We model the logcumulative baseline hazard for the Cox model via Bayesian, monotonic Psplines. This approach permits fast computation, accounting for arbitrary censorship and the inclusion of nonparametric effects. We leverage the computational efficiency to simplify effect interpretation for metric and nonmetric variables by combining the restricted mean survival time approach with partial dependence plots. This allows effect interpretation in terms of survival times. Monte Carlo simulations indicate that the proposed methods work well. We illustrate our approach using a large data set of real estate data advertisements. 
Keywords:  Bayesian survival analysis,nonparametric modeling,penalized spline: restricted mean survival time 
JEL:  C11 C14 C41 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:zbw:rwirep:850&r=all 
By:  Martin Huber 
Abstract:  The estimation of the causal effect of an endogenous treatment based on an instrumental variable (IV) is often complicated by attrition, sample selection, or nonresponse in the outcome of interest. To tackle the latter problem, the latent ignorability (LI) assumption imposes that attrition/sample selection is independent of the outcome conditional on the treatment compliance type (i.e. how the treatment behaves as a function of the instrument), the instrument, and possibly further observed covariates. As a word of caution, this note formally discusses the strong behavioral implications of LI in rather standard IV models. We also provide an empirical illustration based on the Job Corps experimental study, in which the sensitivity of the estimated program effect to LI and alternative assumptions about outcome attrition is investigated. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.01703&r=all 
By:  Marin Drlje 
Abstract:  A large literature estimates various school admission and graduation effects by employing variation in student admission scores around schools’ admission cutoffs, assuming (quasi) random school assignment close to the cutoffs. In this paper, I present evidence suggesting that the samples corresponding to typical applications of the regression discontinuity design (RDD) fail to satisfy these assumptions. I distinguish expost randomization (as in admission lotteries applicable to those at the margin of admission) from exante randomization, reflecting uncertainty about the market structure of applicants, which can be naturally quantified by resampling from the applicant population. Using data from the Croatian centralized collegeadmission system, I show that these exante admission probabilities differ dramatically between treated and nontreated students within typical RDD bandwidths. Such unbalanced admission probability distributions suggest that bandwidths (and sample sizes) should be drastically reduced to avoid selection bias. I also show that a sizeable fraction of quasirandomized assignments occur outside of the typical RDD bandwidths, suggesting that these are also inefficient. As an alternative, I propose a new estimator, the Propensity Score Discontinuity Design (PSDD), based on all observations with random assignments, which compares outcomes of applicants matched on exante admission probabilities, conditional on admission scores. 
Keywords:  RDD; PSDD; school admission effects; lottery; 
JEL:  C01 C51 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:cer:papers:wp658&r=all 
By:  Marcin Chlebus (Faculty of Economic Sciences, University of Warsaw); Maciej Stefan Świtała (Faculty of Economic Sciences, University of Warsaw) 
Abstract:  The paper takes into consideration the broad idea of topic modelling and its application. The aim of the research was to identify mutual tendencies in econometric and machine learning abstracts. Different topic models were compared in terms of their performance and interpretability. The former was measured with a newly introduced approach. Summaries collected from esteemed journals were analysed with LSA, LDA and CTM algorithms. The obtained results enable finding similar trends in both corpora. Probabilistic models – LDA and CTM – outperform the semantic alternative – LSA. It appears that econometrics and machine learning are fields that consider problems being rather homogenous at the level of concept. However, they differ in terms of used tools and dominance in particular areas. 
Keywords:  abstracts, comparison, interpretability, tendencies, topics 
JEL:  A12 C18 C38 C52 C61 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:war:wpaper:202016&r=all 
By:  Paul H\"unermund; Beyers Louw 
Abstract:  Control variables are included in regression analyses to estimate the causal effect of a treatment variable of interest on an outcome. In this note we argue that control variables are unlikely to have a causal interpretation themselves though. We therefore suggest to refrain from discussing their marginal effects in the results sections of empirical research papers. 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2005.10314&r=all 
By:  Jochmans, K. 
Abstract:  Identification of peer effects is complicated by the fact that the individuals under study may selfselect their peers. Random assignment to peer groups has proven useful to sidestep such a concern. In the absence of a formal randomization mechanism it needs to be argued that assignment is `as good as' random. This paper introduces a simple yet powerful test to do so. We provide theoretical results for this test and explain why it dominates existing alternatives. Asymptotic power calculations and an analysis of the assignment mechanism of players to playing partners in tournaments of the Professional Golfer's Association is used to illustrate these claims. Our approach can equally be used to test for the presence of peer effects. To illustrate this we test for the presence of peer effects in the classroom using kindergarten data collected within Project STAR. We find no evidence of peer effects once we control for classroom fixed effects and a set of student characteristics. 
Keywords:  asymptotic power, bias, peer effects, random assignment 
JEL:  C12 C21 
Date:  2020–04–06 
URL:  http://d.repec.org/n?u=RePEc:cam:camdae:2024&r=all 
By:  Henrik Kleven 
Abstract:  This paper reviews and generalizes the sufficient statistics approach to policy evaluation. The idea of the approach is that the welfare effect of policy changes can be expressed in terms estimable reducedform elasticities, allowing for policy evaluation without estimating the structural primitives of fully specified models. The approach relies on three assumptions: that policy changes are small, that government policy is the only source of market imperfection, and that a set of highlevel restrictions on the environment and on preferences can be used to reduce the number of elasticities to be estimated. We generalize the approach in all three dimensions. It is possible to develop transparent sufficient statistics formulas under very general conditions, but the estimation requirements increase greatly. Starting from such general formulas elucidates that feasible empirical implementations are in fact structural approaches. 
JEL:  D01 D04 D1 D6 H0 H2 H3 J08 J2 J38 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:27242&r=all 
By:  Patrick Chang; Etienne Pienaar; Tim Gebbie 
Abstract:  On different timeintervals it can be useful to empirically determine whether the measurement process being observed is fundamental and representative of actual discrete events, or whether these measurements can still be faithfully represented as random samples of some underlying continuous process. As sampling timescales become smaller for a continuoustime process one can expect to be able to continue to measure correlations, even as the sampling intervals become very small. However, with a discrete event process one can expect the correlation measurements to quickly breakdown. This is a theoretically well explored problem. Here we concern ourselves with a simulation based empirical investigation that uses the Epps effect as a discriminator between situations where the underlying system is discrete e.g. a Dtype Hawkes process, and when it can still be appropriate to represent the problem with a continuoustime random process that is being asynchronously sampled e.g. an asynchronously sampled set of correlated Brownian motions. We derive a method aimed to compensate for the Epps effect from asynchrony and then use this to discriminate. We are able to compare the correction on a simple continuous Brownian price path model with a Hawkes price model when the sampling is either a simple homogeneous Poisson Process or a Hawkes sampling process. This suggests that Epps curves can sometimes provide insight into whether discrete data are in fact observables realised from fundamental codependent discrete processes, or when they are merely samples of some correlated continuoustime process. 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2005.10568&r=all 