nep-ecm 2013-02-03 papers

on Econometrics

Issue of 2013‒02‒03
fourteen papers chosen by
Sune Karlsson
Orebro University

Inference for Multi-Dimensional High-Frequency Data: Equivalence of Methods, Central Limit Theorems, and an Application to Conditional Independence Testing By Markus Bibinger; Per A. Mykland; ;
Analysis of discrete dependent variable models with spatial correlation By Liesenfeld, Roman; Richard, Jean-François; Vogler, Jan
Testing Exclusion Restrictions in Nonseparable Triangular Models By Joeri Smits; Jeffrey S. Racine
GMM Efficiency and IPW Estimation for Nonsmooth Functions By Otávio Bartalotti
Central limit theorems and multiplier bootstrap when p is much larger than By Victor Chernozhukov; Denis Chetverikov; Kengo Kato
Fixed Bandwidth Asymptotics for Regression Discontinuity Designs By Otávio Bartalotti
Specification for Partially Identified Models defined by Moment Inequalities By Federico Bugni; Ivan Canay; Xiaoxia Shi
Infinite Order Cross-Validated Local Polynomial Regression By Peter G. Hall; Jeffrey S. Racine
Gaussian approximation of suprema of empirical processes By Victor Chernozhukov; Denis Chetverikov; Kengo Kato
The Smooth Colonel and the Reverend Find Common Ground By Nicholas M. Kiefer; Jeffrey S. Racine
Small Sample Bootstrap Inference of Level Relationships in the Presence of Autocorrelated Errors: A Large Scale Simulation Study and an Application in Energy Demand By A. Talha Yalta
Bayesian Credit Ratings (new version) By Paola Cerchiello; Paolo Giudici
The fixed effects estimator of technical efficiency By WikstrÃ¶m, Daniel
Kernel Factory: An Ensemble of Kernel Machines By M. BALLINGS; D. VAN DEN POEL

Inference for Multi-Dimensional High-Frequency Data: Equivalence of Methods, Central Limit Theorems, and an Application to Conditional Independence Testing

By:	Markus Bibinger; Per A. Mykland; ;
Abstract:	We find the asymptotic distribution of the multi-dimensional multi-scale and kernel estimators for high-frequency financial data with microstructure. Sampling times are allowed to be asynchronous. The central limit theorem is shown to have a feasible version. In the process, we show that the classes of multi-scale and kernel estimators for smoothing noise perturbation are asymptotically equivalent in the sense of having the same asymptotic distribution for corresponding kernel and weight functions. We also include the analysis for the Hayashi-Yoshida estimator in absence of microstructure. The theory leads to multi-dimensional stable central limit theorems for respective estimators and hence allows to draw statistical inference for a broad class of multivariate models and linear functions of the recorded components. This paves the way to tests and confidence intervals in risk measurement for arbitrary portfolios composed of high-frequently observed assets. As an application, we enhance the approach to cover more complex functions and in order to construct a test for investigating hypotheses that correlated assets are independent conditional on a common factor.
Keywords:	asymptotic distribution theory, asynchronous observations, conditional independence, high-frequency data, microstructure noise, multivariate limit theorems
JEL:	C14 C32 C58 G10
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:hum:wpaper:sfb649dp2013-006&r=ecm

Analysis of discrete dependent variable models with spatial correlation

By:	Liesenfeld, Roman; Richard, Jean-François; Vogler, Jan
Abstract:	In this paper we consider ML estimation for a broad class of parameter-driven models for discrete dependent variables with spatial correlation. Under this class of models, which includes spatial discrete choice models, spatial Tobit models and spatial count data models, the dependent variable is driven by a latent stochastic state variable which is specified as a linear spatial regression model. The likelihood is a high-dimensional integral whose dimension depends on the sample size. For its evaluation we propose to use efficient importance sampling (EIS). The specific spatial EIS implementation we develop exploits the sparsity of the precision (or covariance) matrix of the errors in the reduced-form state equation typically encountered in spatial settings, which keeps numerically accurate EIS likelihood evaluation computationally feasible even for large sample sizes. The proposed ML approach based upon spatial EIS is illustrated with estimation of a spatial probit for US presidential voting decisions and spatial count data models (Poisson and Negbin) for firm location choices. --
Keywords:	Count data models,Discrete choice models,Firm location choice,Importance sampling,Monte Carlo integration,Spatial econometrics
JEL:	C15 C21 C25 D22 R12
Date:	2013
URL:	http://d.repec.org/n?u=RePEc:zbw:cauewp:201301&r=ecm

Testing Exclusion Restrictions in Nonseparable Triangular Models

By:	Joeri Smits; Jeffrey S. Racine
Abstract:	In recent years, estimators for nonseparable models have been developed that rely on (an) instrumental variable(s) for identification. The exclusion restriction in triangular models can be reformulated and causally decomposed under the Settable Systems extension to the Pearl Causal Model due to Chalak & White (2012). We propose two new ways of testing the exclusion restriction, one based on testing conditional independence nonparametrically and one based on multivariate conditional mutual information. Unlike existing tests for overidentifying restrictions, the proposed tests are applicable in the just identified case. An important field of application is randomized trials with partial compliance, since for that case, the exclusion restriction is not only refutable, but also confirmable. The first approach, conditional independence testing, is illustrated by the application of the nonparametric test of equality of conditional densities of Li, Maasoumi & Racine (2009) to examples from medicine and economics.
Keywords:	instrumental variables, nonparametric identification, causal inference.
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:mcm:deptwp:2013-02&r=ecm

GMM Efficiency and IPW Estimation for Nonsmooth Functions

By:	Otávio Bartalotti (Department of Economics, Tulane University)
Abstract:	In a GMM setting this paper analyzes the problem in which we have two sets of moment conditions, where two sets of parameters enter into one set of moment conditions, while only one set of parameters enters into the other, extending Prokhorov and Schmidt's (2009) redundancy results to nonsmooth objective functions, and obtains relatively efficient estimates of interesting parameters in the presence of nuisance parameters. One-step GMM estimation for both set of parameters is asymptotically more efficient than two-step procedures. These results are applied to Wooldridge's (2007) inverse probability weighted estimator (IPW), generalizing the framework to deal with missing data in this context. Two-step estimation of beta_0 is more efficient than using known probabilities of selection, but this is dominated by one-step joint estimation. Examples for missing data quantile regression and instrumental variable quantile regression are provided.
Keywords:	generalized method of moments, nonsmooth objective functions, inverse probability weighting, missing data, quantile regression
JEL:	C13
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:tul:wpaper:1301&r=ecm

Central limit theorems and multiplier bootstrap when p is much larger than

By:	Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov; Kengo Kato
Abstract:	We derive a central limit theorem for the maximum of a sum of high dimensional random vectors. More precisely, we establish conditions under which the distribution of the maximum is approximated by the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. The key innovation of our result is that it applies even if the dimension of random vectors (p) is much larger than the sample size (n). In fact, the growth of p could be exponential in some fractional power of n. We also show that the distribution of the maximum of a sum of the Gaussian random vectors with unknown covariance matrices can be estimated by the distribution of the maximum of the (conditional) Gaussian process obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. We call this procedure the â€œmultiplier bootstrapâ€. Here too, the growth of p could be exponential in some fractional power of n. We prove that our distributional approximations, either Gaussian or conditional Gaussian, yield a high-quality approximation for the distribution of the original maximum, often with at most a polynomial approximation error. These results are of interest in numerous econometric and statistical applications. In particular, we demonstrate how our central limit theorem and the multiplier bootstrap can be used for high dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All of our results contain non-asymptotic bounds on approximation errors.
Date:	2012–12
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:45/12&r=ecm

Fixed Bandwidth Asymptotics for Regression Discontinuity Designs

By:	Otávio Bartalotti (Department of Economics, Tulane University)
Abstract:	The standard "small-h" asymptotics in the regression discontinuity (RD) literature assumes the bandwidth, h, around the discontinuity shrinks as n goes to infinity. In practice, however, the researcher has to choose an h>0. This paper derives the fixed-h asymptotic distribution of local polynomial estimators in the context of RD, better approximating the estimator's behavior and improving inference. Conditions are provided under which fixed-h and small-h approximations are the same. Feasible estimators for fixed-h standard errors are proposed and incorporate theoretical gains, improving over small-h specially in the presence of heteroskedasticity. For rectangular kernels, the fixed-h standard errors simplify to the usual heteroskedastic robust standard errors.
Keywords:	regression discontinuity design, average treatment effect, fixed bandwidth asymptotics, local polynomial estimators
JEL:	C12 C21
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:tul:wpaper:1302&r=ecm

Specification for Partially Identified Models defined by Moment Inequalities

By:	Federico Bugni (Institute for Fiscal Studies and Duke University); Ivan Canay (Institute for Fiscal Studies and Northwestern University); Xiaoxia Shi
Abstract:	This paper studies the problem of specification testing in partially indentified models defined by a finite number of moment equalities and inequalities (i.e., (in)equalities). Under the null hypothesis, there is at least one parameter value that simultaneously satisfies all of the moment (in)equalities whereas under the alternative hypothesis there is no such parameter value. While this problem has not been directly addressed in the literature (except in particular cases), several papers have suggested implementing this inferential problem by checking whether confidence intervals for the parameters of interest are empty or not. We propose two hypothesis tests that use the infimum of the sample criterion function over the parameter space as the test statistic together with two different critical values. We obtain two main results. First, we show that the two tests we propose are asymptotically size correct in a uniform sense. Second we show our tests are more powerful than the test that checks whether the confidence set for the parameters of interest is empty or not.
Keywords:	Partial Identication, Moment Inequalities, Specication Tests, Hypothesis Testing.
JEL:	C01 C12 C15
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:01/13&r=ecm

Infinite Order Cross-Validated Local Polynomial Regression

By:	Peter G. Hall; Jeffrey S. Racine
Abstract:	Many practical problems require nonparametric estimates of regression functions, and local polynomial regression has emerged as a leading approach. In applied settings practitioners often adopt either the local constant or local linear variants, or choose the order of the local polynomial to be slightly greater than the order of the maximum derivative estimate required. But such ad hoc determination of the polynomial order may not be optimal in general, while the joint determination of the polynomial order and bandwidth presents some interesting theoretical and practical challenges. In this paper we propose a data-driven approach towards the joint determination of the polynomial order and bandwidth, provide theoretical underpinnings, and demonstrate that improvements in both finite-sample efficiency and rates of convergence can thereby be obtained. In the case where the true data generating process (DGP) is in fact a polynomial whose order does not depend on the sample size, our method is capable of attaining the √n rate often associated with correctly specified parametric models, while the estimator is shown to be uniformly consistent for a much larger class of DGPs. Theoretical underpinnings are provided,finite-sample properties are examined, and an application highlights finite-sample improvements arising from the use of the proposed method.
Keywords:	model selection, efficiency, rates of convergence
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:mcm:deptwp:2013-05&r=ecm

Gaussian approximation of suprema of empirical processes

By:	Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov; Kengo Kato
Abstract:	We develop a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating empirical processes themselves in the sup-norm. We prove an abstract approximation theorem that is applicable to a wide variety of problems, primarily in statistics. Especiallly, the bound in the main approximation theorem is non-asymptotic and the theorem does not require uniform boundedness of the class of functions. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein's method for normal approximation, and some new empirical processes techniques. We study applications of this approximation theorem to local empirical processes and series estimation in nonparametric regression where the classes of functions change with the sample size and are not Donsker-type. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under considerably weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.
Date:	2012–12
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:44/12&r=ecm

The Smooth Colonel and the Reverend Find Common Ground

By:	Nicholas M. Kiefer; Jeffrey S. Racine
Abstract:	A class of kernel regression estimators is developed for a broad class of hierarchical models including the pooled regression estimator, the fixed-effect model familiar from panel data, etc. Separate shrinking is allowed for each coefficient. Regressors may be continuous or discrete. The estimator is motivated as an intuitive and appealing generalization of existing methods. It is then supported by demonstrating that it can be realized as a posterior mean in the Lindley & Smith (1972) framework. The model is extended to nonparametric hierarchical regression based on B-splines.
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:mcm:deptwp:2013-03&r=ecm

Small Sample Bootstrap Inference of Level Relationships in the Presence of Autocorrelated Errors: A Large Scale Simulation Study and an Application in Energy Demand

By: A. Talha Yalta

Date: 2013–01

URL: http://d.repec.org/n?u=RePEc:tob:wpaper:1301&r=ecm

Bayesian Credit Ratings (new version)

By:	Paola Cerchiello (Department of Economics and Management, University of Pavia); Paolo Giudici (Department of Economics and Management, University of Pavia)
Abstract:	In this contribution we aim at improving ordinal variable selection in the context of causal models. In this regard, we propose an approach that provides a formal inferential tool to compare the explanatory power of each covariate, and, therefore, to select an effective model for classification purposes. Our proposed model is Bayesian nonparametric, and, thus, keeps the amount of model specification to a minimum. We consider the case in which information from the covariates is at the ordinal level. A noticeable instance of this regards the situation in which ordinal variables result from rankings of companies that are to be evaluated according to different macro and micro economic aspects, leading to ordinal covariates that correspond to various ratings, that entail different magnitudes of the probability of default. For each given covariate, we suggest to partition the statistical units in as many groups as the number of observed levels of the covariate. We then assume individual defaults to be homogeneous within each group and heterogeneous across groups. Our aim is to compare and, therefore, select the partition structures resulting from the consideration of different explanatory covariates. The metric we choose for variable comparison is the calculation of the posterior probability of each partition. The application of our proposal to a European credit risk database shows that it performs well, leading to a coherent and clear method for variable averaging the estimated default probabilities.
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:pav:demwpp:030&r=ecm

The fixed effects estimator of technical efficiency

By:	WikstrÃ¶m, Daniel
Abstract:	Firms and organizations, public or private, often operate on markets characterized by non-competitiveness. For example agricultural activities in the western world are heavily subsidized and electricity is supplied by firms with market power. In general it is probably more difficult to find firms that act on highly competitive markets, than firms that are not. To measure different types of inefficiencies, due to this lack of competitiveness, has been an ongoing issue, since at least the 1950s when several definitions of inefficiency was proposed and since the late 1970s as stochastic frontier analysis. In all three articles presented in this thesis the stochastic frontier analysis approach is considered. Furthermore, in all three articles focus is on technical inefficiency. The ways to estimate technical inefficiency, based on stochastic frontier models, are numerous. However, focus in this thesis is on fixed effects panel data estimators. This is mainly for two reasons. First, the fixed effects analysis does not demand explicit distributional assumptions of the inefficiency and the random error of the model. Secondly, the analysis does not require the random effects assumption of independence between the firm specific inefficiency and the inputs selected by the very same firm. These two properties are exclusive for fixed effects estimation, compared to other stochastic frontier estimators. There are of course flaws attached to fixed effects analysis as well, and the contribution of this thesis is to probe some of these flaws, and to propose improvements and tools to identify the worst case scenarios. For example the fixed effects estimator is seriously upward biased in some cases, i.e. inefficiency is overestimated. This could lead to false conclusions, like e.g. that subsidies in agriculture lead to severely inefficient farmers even if these farmers in reality are quite homogenous. In this thesis estimators to reduce bias as well as mean square error are proposed and statistical diagnostics are designed to identify worst case scenarios for the fixed effects estimator as well as for other estimators. The findings can serve as important tools for the applied researcher, to obtain better approximations of technical inefficiency.
Date:	2012–11–16
URL:	http://d.repec.org/n?u=RePEc:sua:ekonwp:9101&r=ecm

Kernel Factory: An Ensemble of Kernel Machines

By:	M. BALLINGS; D. VAN DEN POEL
Abstract:	We propose an ensemble method for kernel machines. The training data is randomly split into a number of mutually exclusive partitions defined by a row and column parameter. Each partition forms an input space and is transformed by a kernel function into a kernel matrix K. Subsequently, each K is used as training data for a base binary classifier (Random Forest). This results in a number of predictions equal to the number of partitions. A weighted average combines the predictions into one final prediction. To optimize the weights, a genetic algorithm is used. This approach has the advantage of simultaneously promoting (1) diversity, (2) accuracy, and (3) computational speed. (1) Diversity is fostered because the individual K’s are based on a subset of features and observations, (2) accuracy is sought by optimizing the weights with the genetic algorithm, and (3) computational speed is obtained because the computation of each K can be parallelized. Using five times two-fold cross validation we benchmark the classification performance of Kernel Factory against Random Forest and Kernel-Induced Random Forest (KIRF). We find that Kernel Factory has significantly better performance than Kernel-Induced Random Forest. When the right kernel is specified Kernel Factory is also significantly better than Random Forest. In addition, an open-source Rsoftware package of the algorithm (kernelFactory) is available from CRAN.
Date:	2012–12
URL:	http://d.repec.org/n?u=RePEc:rug:rugwps:12/825&r=ecm

This nep-ecm issue is ©2013 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	A. Talha Yalta
Date:	2013–01
URL:	http://d.repec.org/n?u=RePEc:tob:wpaper:1301&r=ecm