
on Econometrics 
By:  Markus Bibinger; Per A. Mykland; ; 
Abstract:  We find the asymptotic distribution of the multidimensional multiscale and kernel estimators for highfrequency financial data with microstructure. Sampling times are allowed to be asynchronous. The central limit theorem is shown to have a feasible version. In the process, we show that the classes of multiscale and kernel estimators for smoothing noise perturbation are asymptotically equivalent in the sense of having the same asymptotic distribution for corresponding kernel and weight functions. We also include the analysis for the HayashiYoshida estimator in absence of microstructure. The theory leads to multidimensional stable central limit theorems for respective estimators and hence allows to draw statistical inference for a broad class of multivariate models and linear functions of the recorded components. This paves the way to tests and confidence intervals in risk measurement for arbitrary portfolios composed of highfrequently observed assets. As an application, we enhance the approach to cover more complex functions and in order to construct a test for investigating hypotheses that correlated assets are independent conditional on a common factor. 
Keywords:  asymptotic distribution theory, asynchronous observations, conditional independence, highfrequency data, microstructure noise, multivariate limit theorems 
JEL:  C14 C32 C58 G10 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:hum:wpaper:sfb649dp2013006&r=ecm 
By:  Liesenfeld, Roman; Richard, JeanFrançois; Vogler, Jan 
Abstract:  In this paper we consider ML estimation for a broad class of parameterdriven models for discrete dependent variables with spatial correlation. Under this class of models, which includes spatial discrete choice models, spatial Tobit models and spatial count data models, the dependent variable is driven by a latent stochastic state variable which is specified as a linear spatial regression model. The likelihood is a highdimensional integral whose dimension depends on the sample size. For its evaluation we propose to use efficient importance sampling (EIS). The specific spatial EIS implementation we develop exploits the sparsity of the precision (or covariance) matrix of the errors in the reducedform state equation typically encountered in spatial settings, which keeps numerically accurate EIS likelihood evaluation computationally feasible even for large sample sizes. The proposed ML approach based upon spatial EIS is illustrated with estimation of a spatial probit for US presidential voting decisions and spatial count data models (Poisson and Negbin) for firm location choices.  
Keywords:  Count data models,Discrete choice models,Firm location choice,Importance sampling,Monte Carlo integration,Spatial econometrics 
JEL:  C15 C21 C25 D22 R12 
Date:  2013 
URL:  http://d.repec.org/n?u=RePEc:zbw:cauewp:201301&r=ecm 
By:  Joeri Smits; Jeffrey S. Racine 
Abstract:  In recent years, estimators for nonseparable models have been developed that rely on (an) instrumental variable(s) for identification. The exclusion restriction in triangular models can be reformulated and causally decomposed under the Settable Systems extension to the Pearl Causal Model due to Chalak & White (2012). We propose two new ways of testing the exclusion restriction, one based on testing conditional independence nonparametrically and one based on multivariate conditional mutual information. Unlike existing tests for overidentifying restrictions, the proposed tests are applicable in the just identified case. An important field of application is randomized trials with partial compliance, since for that case, the exclusion restriction is not only refutable, but also confirmable. The first approach, conditional independence testing, is illustrated by the application of the nonparametric test of equality of conditional densities of Li, Maasoumi & Racine (2009) to examples from medicine and economics. 
Keywords:  instrumental variables, nonparametric identification, causal inference. 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:mcm:deptwp:201302&r=ecm 
By:  Otávio Bartalotti (Department of Economics, Tulane University) 
Abstract:  In a GMM setting this paper analyzes the problem in which we have two sets of moment conditions, where two sets of parameters enter into one set of moment conditions, while only one set of parameters enters into the other, extending Prokhorov and Schmidt's (2009) redundancy results to nonsmooth objective functions, and obtains relatively efficient estimates of interesting parameters in the presence of nuisance parameters. Onestep GMM estimation for both set of parameters is asymptotically more efficient than twostep procedures. These results are applied to Wooldridge's (2007) inverse probability weighted estimator (IPW), generalizing the framework to deal with missing data in this context. Twostep estimation of beta_0 is more efficient than using known probabilities of selection, but this is dominated by onestep joint estimation. Examples for missing data quantile regression and instrumental variable quantile regression are provided. 
Keywords:  generalized method of moments, nonsmooth objective functions, inverse probability weighting, missing data, quantile regression 
JEL:  C13 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:tul:wpaper:1301&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov; Kengo Kato 
Abstract:  We derive a central limit theorem for the maximum of a sum of high dimensional random vectors. More precisely, we establish conditions under which the distribution of the maximum is approximated by the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. The key innovation of our result is that it applies even if the dimension of random vectors (p) is much larger than the sample size (n). In fact, the growth of p could be exponential in some fractional power of n. We also show that the distribution of the maximum of a sum of the Gaussian random vectors with unknown covariance matrices can be estimated by the distribution of the maximum of the (conditional) Gaussian process obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. We call this procedure the â€œmultiplier bootstrapâ€. Here too, the growth of p could be exponential in some fractional power of n. We prove that our distributional approximations, either Gaussian or conditional Gaussian, yield a highquality approximation for the distribution of the original maximum, often with at most a polynomial approximation error. These results are of interest in numerous econometric and statistical applications. In particular, we demonstrate how our central limit theorem and the multiplier bootstrap can be used for high dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All of our results contain nonasymptotic bounds on approximation errors. 
Date:  2012–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:45/12&r=ecm 
By:  Otávio Bartalotti (Department of Economics, Tulane University) 
Abstract:  The standard "smallh" asymptotics in the regression discontinuity (RD) literature assumes the bandwidth, h, around the discontinuity shrinks as n goes to infinity. In practice, however, the researcher has to choose an h>0. This paper derives the fixedh asymptotic distribution of local polynomial estimators in the context of RD, better approximating the estimator's behavior and improving inference. Conditions are provided under which fixedh and smallh approximations are the same. Feasible estimators for fixedh standard errors are proposed and incorporate theoretical gains, improving over smallh specially in the presence of heteroskedasticity. For rectangular kernels, the fixedh standard errors simplify to the usual heteroskedastic robust standard errors. 
Keywords:  regression discontinuity design, average treatment effect, fixed bandwidth asymptotics, local polynomial estimators 
JEL:  C12 C21 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:tul:wpaper:1302&r=ecm 
By:  Federico Bugni (Institute for Fiscal Studies and Duke University); Ivan Canay (Institute for Fiscal Studies and Northwestern University); Xiaoxia Shi 
Abstract:  This paper studies the problem of specification testing in partially indentified models defined by a finite number of moment equalities and inequalities (i.e., (in)equalities). Under the null hypothesis, there is at least one parameter value that simultaneously satisfies all of the moment (in)equalities whereas under the alternative hypothesis there is no such parameter value. While this problem has not been directly addressed in the literature (except in particular cases), several papers have suggested implementing this inferential problem by checking whether confidence intervals for the parameters of interest are empty or not. We propose two hypothesis tests that use the infimum of the sample criterion function over the parameter space as the test statistic together with two different critical values. We obtain two main results. First, we show that the two tests we propose are asymptotically size correct in a uniform sense. Second we show our tests are more powerful than the test that checks whether the confidence set for the parameters of interest is empty or not. 
Keywords:  Partial Identication, Moment Inequalities, Specication Tests, Hypothesis Testing. 
JEL:  C01 C12 C15 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:01/13&r=ecm 
By:  Peter G. Hall; Jeffrey S. Racine 
Abstract:  Many practical problems require nonparametric estimates of regression functions, and local polynomial regression has emerged as a leading approach. In applied settings practitioners often adopt either the local constant or local linear variants, or choose the order of the local polynomial to be slightly greater than the order of the maximum derivative estimate required. But such ad hoc determination of the polynomial order may not be optimal in general, while the joint determination of the polynomial order and bandwidth presents some interesting theoretical and practical challenges. In this paper we propose a datadriven approach towards the joint determination of the polynomial order and bandwidth, provide theoretical underpinnings, and demonstrate that improvements in both finitesample efficiency and rates of convergence can thereby be obtained. In the case where the true data generating process (DGP) is in fact a polynomial whose order does not depend on the sample size, our method is capable of attaining the √n rate often associated with correctly specified parametric models, while the estimator is shown to be uniformly consistent for a much larger class of DGPs. Theoretical underpinnings are provided,finitesample properties are examined, and an application highlights finitesample improvements arising from the use of the proposed method. 
Keywords:  model selection, efficiency, rates of convergence 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:mcm:deptwp:201305&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov; Kengo Kato 
Abstract:  We develop a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating empirical processes themselves in the supnorm. We prove an abstract approximation theorem that is applicable to a wide variety of problems, primarily in statistics. Especiallly, the bound in the main approximation theorem is nonasymptotic and the theorem does not require uniform boundedness of the class of functions. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein's method for normal approximation, and some new empirical processes techniques. We study applications of this approximation theorem to local empirical processes and series estimation in nonparametric regression where the classes of functions change with the sample size and are not Donskertype. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under considerably weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples. 
Date:  2012–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:44/12&r=ecm 
By:  Nicholas M. Kiefer; Jeffrey S. Racine 
Abstract:  A class of kernel regression estimators is developed for a broad class of hierarchical models including the pooled regression estimator, the fixedeffect model familiar from panel data, etc. Separate shrinking is allowed for each coefficient. Regressors may be continuous or discrete. The estimator is motivated as an intuitive and appealing generalization of existing methods. It is then supported by demonstrating that it can be realized as a posterior mean in the Lindley & Smith (1972) framework. The model is extended to nonparametric hierarchical regression based on Bsplines. 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:mcm:deptwp:201303&r=ecm 
By:  A. Talha Yalta 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:tob:wpaper:1301&r=ecm 
By:  Paola Cerchiello (Department of Economics and Management, University of Pavia); Paolo Giudici (Department of Economics and Management, University of Pavia) 
Abstract:  In this contribution we aim at improving ordinal variable selection in the context of causal models. In this regard, we propose an approach that provides a formal inferential tool to compare the explanatory power of each covariate, and, therefore, to select an effective model for classification purposes. Our proposed model is Bayesian nonparametric, and, thus, keeps the amount of model specification to a minimum. We consider the case in which information from the covariates is at the ordinal level. A noticeable instance of this regards the situation in which ordinal variables result from rankings of companies that are to be evaluated according to different macro and micro economic aspects, leading to ordinal covariates that correspond to various ratings, that entail different magnitudes of the probability of default. For each given covariate, we suggest to partition the statistical units in as many groups as the number of observed levels of the covariate. We then assume individual defaults to be homogeneous within each group and heterogeneous across groups. Our aim is to compare and, therefore, select the partition structures resulting from the consideration of different explanatory covariates. The metric we choose for variable comparison is the calculation of the posterior probability of each partition. The application of our proposal to a European credit risk database shows that it performs well, leading to a coherent and clear method for variable averaging the estimated default probabilities. 
Date:  2013–01 
URL:  http://d.repec.org/n?u=RePEc:pav:demwpp:030&r=ecm 
By:  WikstrÃ¶m, Daniel 
Abstract:  Firms and organizations, public or private, often operate on markets characterized by noncompetitiveness. For example agricultural activities in the western world are heavily subsidized and electricity is supplied by firms with market power. In general it is probably more difficult to find firms that act on highly competitive markets, than firms that are not. To measure different types of inefficiencies, due to this lack of competitiveness, has been an ongoing issue, since at least the 1950s when several definitions of inefficiency was proposed and since the late 1970s as stochastic frontier analysis. In all three articles presented in this thesis the stochastic frontier analysis approach is considered. Furthermore, in all three articles focus is on technical inefficiency. The ways to estimate technical inefficiency, based on stochastic frontier models, are numerous. However, focus in this thesis is on fixed effects panel data estimators. This is mainly for two reasons. First, the fixed effects analysis does not demand explicit distributional assumptions of the inefficiency and the random error of the model. Secondly, the analysis does not require the random effects assumption of independence between the firm specific inefficiency and the inputs selected by the very same firm. These two properties are exclusive for fixed effects estimation, compared to other stochastic frontier estimators. There are of course flaws attached to fixed effects analysis as well, and the contribution of this thesis is to probe some of these flaws, and to propose improvements and tools to identify the worst case scenarios. For example the fixed effects estimator is seriously upward biased in some cases, i.e. inefficiency is overestimated. This could lead to false conclusions, like e.g. that subsidies in agriculture lead to severely inefficient farmers even if these farmers in reality are quite homogenous. In this thesis estimators to reduce bias as well as mean square error are proposed and statistical diagnostics are designed to identify worst case scenarios for the fixed effects estimator as well as for other estimators. The findings can serve as important tools for the applied researcher, to obtain better approximations of technical inefficiency. 
Date:  2012–11–16 
URL:  http://d.repec.org/n?u=RePEc:sua:ekonwp:9101&r=ecm 
By:  M. BALLINGS; D. VAN DEN POEL 
Abstract:  We propose an ensemble method for kernel machines. The training data is randomly split into a number of mutually exclusive partitions defined by a row and column parameter. Each partition forms an input space and is transformed by a kernel function into a kernel matrix K. Subsequently, each K is used as training data for a base binary classifier (Random Forest). This results in a number of predictions equal to the number of partitions. A weighted average combines the predictions into one final prediction. To optimize the weights, a genetic algorithm is used. This approach has the advantage of simultaneously promoting (1) diversity, (2) accuracy, and (3) computational speed. (1) Diversity is fostered because the individual K’s are based on a subset of features and observations, (2) accuracy is sought by optimizing the weights with the genetic algorithm, and (3) computational speed is obtained because the computation of each K can be parallelized. Using five times twofold cross validation we benchmark the classification performance of Kernel Factory against Random Forest and KernelInduced Random Forest (KIRF). We find that Kernel Factory has significantly better performance than KernelInduced Random Forest. When the right kernel is specified Kernel Factory is also significantly better than Random Forest. In addition, an opensource Rsoftware package of the algorithm (kernelFactory) is available from CRAN. 
Date:  2012–12 
URL:  http://d.repec.org/n?u=RePEc:rug:rugwps:12/825&r=ecm 