Econometrics
http://lists.repec.orgmailman/listinfo/nep-ecm
Econometrics
2017-10-01
Kernel-Based Inference In Time-Varying Coefficient Cointegrating Regression
http://d.repec.org/n?u=RePEc:cwl:cwldpp:3009&r=ecm
This paper studies nonlinear cointegrating models with time-varying coefficients and multiple nonstationary regressors using classic kernel smoothing methods to estimate the coefficient functions. Extending earlier work on nonstationary kernel regression to take account of practical features of the data, we allow the regressors to be cointegrated and to embody a mixture of stochastic and deterministic trends, complications which result in asymptotic degeneracy of the kernel-weighted signal matrix. To address these complications new \textsl{local} and \textsl{global rotation} techniques are introduced to transform the covariate space to accommodate multiple scenarios of induced degeneracy. Under certain regularity conditions we derive asymptotic results that differ substantially from existing kernel regression asymptotics, leading to new limit theory under multiple convergence rates. For the practically important case of endogenous nonstationary regressors we propose a fully-modified kernel estimator whose limit distribution theory corresponds to the prototypical pure (i.e., exogenous covariate) cointegration case, thereby facilitating inference using a generalized Wald-type test statistic. These results substantially generalize econometric estimation and testing techniques in the cointegration literature to accommodate time variation and complications of co-moving regressors. Finally an empirical illustration to aggregate US data on consumption, income, and interest rates is provided.
Degui Li
Peter C.B. Phillips
Jiti Gao
Cointegration, FM-kernel estimation, Generalized Wald test, Global rotation, Kernel degeneracy, Local rotation, Super-consistency, Time-varying coefficients
2017-09
Applications of James-Stein Shrinkage (I): Variance Reduction without Bias
http://d.repec.org/n?u=RePEc:arx:papers:1708.06436&r=ecm
In a linear regression model with homoscedastic Normal noise, I consider James-Stein type shrinkage in the estimation of nuisance parameters associated with control variables. For at least three control variables and exogenous treatment, I show that the standard least-squares estimator is dominated with respect to squared-error loss in the treatment effect even among unbiased estimators and even when the target parameter is low-dimensional. I construct the dominating estimator by a variant of James-Stein shrinkage in an appropriate high-dimensional Normal-means problem; it can be understood as an invariant generalized Bayes estimator with an uninformative (improper) Jeffreys prior in the target parameter.
Jann Spiess
2017-08
Latent Variable Nonparametric Cointegrating Regression
http://d.repec.org/n?u=RePEc:cwl:cwldpp:3011&r=ecm
This paper studies the asymptotic properties of empirical nonparametric regressions that partially misspecify the relationships between nonstationary variables. In particular, we analyze nonparametric kernel regressions in which a potential nonlinear cointegrating regression is misspecified through the use of a proxy regressor in place of the true regressor. Such regressions arise naturally in linear and nonlinear regressions where the regressor suffers from measurement error or where the true regressor is a latent variable. The model considered allows for endogenous regressors as the latent variable and proxy variables that cointegrate asymptotically with the true latent variable. Such a framework includes correctly specified systems as well as misspecified models in which the actual regressor serves as a proxy variable for the true regressor. The system is therefore intermediate between nonlinear nonparametric cointegrating regression (Wang and Phillips, 2009a, 2009b) and completely misspecified nonparametric regressions in which the relationship is entirely spurious (Phillips, 2009). The asymptotic results relate to recent work on dynamic misspecification in nonparametric nonstationary systems by Kasparis and Phillips (2012) and Duffy (2014). The limit theory accommodates regressor variables with autoregressive roots that are local to unity and whose errors are driven by long memory and short memory innovations, thereby encompassing applications with a wide range of economic and financial time series.
Qiying Wang
Peter C.B. Phillips
Ioannis Kasparis
Cointegrating regression, Kernel regression, Latent variable, Local time, Misspecification, Nonlinear nonparametric nonstationary regression
2017-09
$L_2$Boosting for Economic Applications
http://d.repec.org/n?u=RePEc:arx:papers:1702.03244&r=ecm
In the recent years more and more high-dimensional data sets, where the number of parameters $p$ is high compared to the number of observations $n$ or even larger, are available for applied researchers. Boosting algorithms represent one of the major advances in machine learning and statistics in recent years and are suitable for the analysis of such data sets. While Lasso has been applied very successfully for high-dimensional data sets in Economics, boosting has been underutilized in this field, although it has been proven very powerful in fields like Biostatistics and Pattern Recognition. We attribute this to missing theoretical results for boosting. The goal of this paper is to fill this gap and show that boosting is a competitive method for inference of a treatment effect or instrumental variable (IV) estimation in a high-dimensional setting. First, we present the $L_2$Boosting with componentwise least squares algorithm and variants which are tailored for regression problems which are the workhorse for most Econometric problems. Then we show how $L_2$Boosting can be used for estimation of treatment effects and IV estimation. We highlight the methods and illustrate them with simulations and empirical examples. For further results and technical details we refer to Luo and Spindler (2016, 2017) and to the online supplement of the paper.
Ye Luo
Martin Spindler
2017-02
Testing the causality of Hawkes processes with time reversal
http://d.repec.org/n?u=RePEc:arx:papers:1709.08516&r=ecm
We show that univariate and symmetric multivariate Hawkes processes are only weakly causal: the true log-likelihoods of real and reversed event time vectors are almost equal, thus parameter estimation via maximum likelihood only weakly depends on the direction of the arrow of time. In ideal (synthetic) conditions, tests of goodness of parametric fit unambiguously reject backward event times, which implies that inferring kernels from time-symmetric quantities, such as the autocovariance of the event rate, only rarely produce statistically significant fits. Finally, we find that fitting financial data with many-parameter kernels may yield significant fits for both arrows of time for the same event time vector, sometimes favouring the backward time direction. This goes to show that a significant fit of Hawkes processes to real data with flexible kernels does not imply a definite arrow of time unless one tests it.
Marcus Cordi
Damien Challet
Ioane Muni Toke
2017-09
Weighting for External Validity
http://d.repec.org/n?u=RePEc:nbr:nberwo:23826&r=ecm
External validity is a fundamental challenge in treatment effect estimation. Even when researchers credibly identify average treatment effects – for example through randomized experiments – the results may not extrapolate to the population of interest for a given policy question. If the population and sample differ only in the distribution of observed variables this problem has a well-known solution: reweight the sample to match the population. In many cases, however, the population and sample differ along dimensions unobserved by the researcher. We provide a tractable framework for thinking about external validity in such cases. Our approach relies on the fact that when the sample is drawn from the same support as the population of interest there exist weights which, if known, would allow us to reweight the sample to match the population. These weights are larger in a stochastic sense when the sample is more selected, and their correlation with a given variable reflects the intensity of selection along this dimension. We suggest natural benchmarks for assessing external validity, discuss implementation, and apply our results to data from several recent experiments.
Isaiah Andrews
Emily Oster
2017-09
Stochastic Frontier Analysis: Foundations and Advances
http://d.repec.org/n?u=RePEc:mia:wpaper:2017-10&r=ecm
This chapter reviews some of the most important developments in the econometric estimation of productivity and efficiency surrounding the stochastic frontier model. We highlight endogeneity issues, recent advances in generalized panel data stochastic frontier models, nonparametric estimation of the frontier, quantile estimation and distribution free methods. An emphasis is placed on highlighting recent research and providing broad coverage, while details are left for further reading in the abundant (although not limited to) list of references provided.
Subal C. Kumbhakar
Christopher F. Parmeter
Valentin Zelenyuk
Efficiency, Productivity, Panel Data, Endogeneity, Nonparametric, Determinants of Inefficiency, Quantile, Identification. Publication Status: Submitted
2017-09-20
Mis-classified, Binary, Endogenous Regressors: Identification and Inference
http://d.repec.org/n?u=RePEc:nbr:nberwo:23814&r=ecm
This paper studies identification and inference for the effect of a mis-classified, binary, endogenous regressor when a discrete-valued instrumental variable is available. We begin by showing that the only existing point identification result for this model is incorrect. We go on to derive the sharp identified set under mean independence assumptions for the instrument and measurement error, and that these fail to point identify the effect of interest. This motivates us to consider alternative and slightly stronger assumptions: we show that adding second and third moment independence assumptions suffices to identify the model. We then turn our attention to inference. We show that both our model, and related models from the literature that assume regressor exogeneity, suffer from weak identification when the effect of interest is small. To address this difficulty, we exploit the inequality restrictions that emerge from our derivation of the sharp identified set under mean independence only. These restrictions remain informative irrespective of the strength of identification. Combining these with the moment equalities that emerge from our identification result, we propose a robust inference procedure using tools from the moment inequality literature. Our method performs well in simulations.
Francis J. DiTraglia
Camilo García-Jimeno
2017-09
Applications of James-Stein Shrinkage (II): Bias Reduction in Instrumental Variable Estimation
http://d.repec.org/n?u=RePEc:arx:papers:1708.06443&r=ecm
In a two-stage linear regression model with Normal noise, I consider James-Stein type shrinkage in the estimation of the first-stage instrumental variable coefficients. For at least four instrumental variables and a single endogenous regressor, I show that the standard two-stage least-squares estimator is dominated with respect to bias. I construct the dominating estimator by a variant of James-Stein shrinkage in a first-stage high-dimensional Normal-means problem followed by a control-function approach in the second stage; it preserves invariances of the structural instrumental variable equations.
Jann Spiess
2017-08
Estimating Difference-in-Differences in the Presence of Spillovers
http://d.repec.org/n?u=RePEc:pra:mprapa:81604&r=ecm
I propose a method for difference-in-differences (DD) estimation in situations where the stable unit treatment value assumption is violated locally. This is relevant for a wide variety of cases where spillovers may occur between quasi-treatment and quasi-control areas in a (natural) experiment. A flexible methodology is described to test for such spillovers, and to consistently estimate treatment effects in their presence. This spillover-robust DD method results in two classes of estimands: treatment effects, and “close” to treatment effects. The methodology outlined describes a versatile and non-arbitrary procedure to determine the distance over which treatments propagate, where distance can be defined in many ways, including as a multi-dimensional measure. This methodology is illustrated by simulation, and by its application to estimates of the impact of state-level text-messaging bans on fatal vehicle accidents. Extending existing DD estimates, I document that reforms travel over roads, and have spillover effects in neighbouring non-affected counties. Text messaging laws appear to continue to alter driving behaviour as much as 30 km outside of affected jurisdictions.
Clarke, Damian
Policy evaluation, difference-in-differences, spillovers, natural experiments, SUTVA
2017-09
Machine Learning Tests for Effects on Multiple Outcomes
http://d.repec.org/n?u=RePEc:arx:papers:1707.01473&r=ecm
A core challenge in the analysis of experimental data is that the impact of some intervention is often not entirely captured by a single, well-defined outcome. Instead there may be a large number of outcome variables that are potentially affected and of interest. In this paper, we propose a data-driven approach rooted in machine learning to the problem of testing effects on such groups of outcome variables. It is based on two simple observations. First, the 'false-positive' problem that a group of outcomes is similar to the concern of 'over-fitting,' which has been the focus of a large literature in statistics and computer science. We can thus leverage sample-splitting methods from the machine-learning playbook that are designed to control over-fitting to ensure that statistical models express generalizable insights about treatment effects. The second simple observation is that the question whether treatment affects a group of variables is equivalent to the question whether treatment is predictable from these variables better than some trivial benchmark (provided treatment is assigned randomly). This formulation allows us to leverage data-driven predictors from the machine-learning literature to flexibly mine for effects, rather than rely on more rigid approaches like multiple-testing corrections and pre-analysis plans. We formulate a specific methodology and present three kinds of results: first, our test is exactly sized for the null hypothesis of no effect; second, a specific version is asymptotically equivalent to a benchmark joint Wald test in a linear regression; and third, this methodology can guide inference on where an intervention has effects. Finally, we argue that our approach can naturally deal with typical features of real-world experiments, and be adapted to baseline balance checks.
Jens Ludwig
Sendhil Mullainathan
Jann Spiess
2017-07
Regular Variation of Popular GARCH Processes Allowing for Distributional Asymmetry
http://d.repec.org/n?u=RePEc:fip:fedgfe:2017-95&r=ecm
Linear GARCH(1,1) and threshold GARCH(1,1) processes are established as regularly varying, meaning their heavy tails are Pareto like, under conditions that allow the innovations from the, respective, processes to be skewed. Skewness is considered a stylized fact for many financial returns assumed to follow GARCH-type processes. The result in this note aids in establishing the asymptotic properties of certain GARCH estimators proposed in the literature.
Todd Prono
GARCH ; Pareto tail ; Heavy tail ; Regular variation ; Threshold GARCH
2017-09-22
On Distribution and Quantile Functions, Ranks and Signs in R_d
http://d.repec.org/n?u=RePEc:eca:wpaper:2013/258262&r=ecm
Unlike the real line, the d-dimensional space Rd, for d ≥ 2, is not canonically ordered. As a consequence, such fundamental and strongly order-related univariate concepts as quantile and distribution functions, and their empirical counterparts, involving ranks and signs, do not canonically extend to the multivariate context. Palliating that lack of a canonical ordering has remained an open problem for more than half a century, and has generated an abundant literature, motivating, among others, the development of statistical depth and copula-based methods. We show here that, unlike the many definitions that have been proposed in the literature, the measure transportation-based ones introduced in Chernozhukov et al. (2017) enjoy all the properties (distribution-freeness and preservation of semiparametric efficiency) that make univariate quantiles and ranks successful tools for semiparametric statistical inference. We therefore propose a new center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we establish a Glivenko-Cantelli result. Our approach, based on results by McCann (1995), is geometric rather than analytical and, contrary to the Monge-Kantorovich one in Chernozhukov et al. (2017) (which assumes compact supports or finite second-order moments), does not require any moment assumptions. The resulting ranks and signs are shown to be strictly distribution-free, and maximal invariant under the action of transformations (namely, the gradients of convex functions, which thus are playing the role of order-preserving transformations) generating the family of absolutely continuous distributions; this, in view of a general result by Hallin and Werker (2003), implies preservation of semiparametric efficiency. The resulting quantiles are equivariant under the same transformations, which confirms the order-preserving nature of gradients of convex function.
Marc Hallin
multivariate distribution function; multivariate quantiles; multivariate ranks; multivariate signs; multivariate order-preserving transformation; glivenko-cantelli; invariance/equivariance; gradient of convex function
2017-09
A Bayesian Approach to Backtest Overfitting
http://d.repec.org/n?u=RePEc:fau:wpaper:wp2017_18&r=ecm
Quantitative investment strategies are often selected from a broad class of candidate models estimated and tested on historical data. Standard statistical technique to prevent model overfitting such as out-sample back-testing turns out to be unreliable in the situation when selection is based on results of too many models tested on the holdout sample. There is an ongoing discussion how to estimate the probability of back-test overfitting and adjust the expected performance indicators like Sharpe ratio in order to reflect properly the effect of multiple testing. We propose a consistent Bayesian approach that consistently yields the desired robust estimates based on an MCMC simulation. The approach is tested on a class of technical trading strategies where a seemingly profitable strategy can be selected in the naive approach.
Jiri Witzany
Backtest, multiple testing, bootstrapping, cross-validation, probability of backtest overfitting, investment strategy, optimization, Sharpe ratio, Bayesian probability, MCMC
2017-09
New copulas based on general partitions-of-unity and their applications to risk management (part II)
http://d.repec.org/n?u=RePEc:arx:papers:1709.07682&r=ecm
We present a constructive and self-contained approach to data driven infinite partition-of-unity copulas that were recently introduced in the literature. In particular, we consider negative binomial and Poisson copulas and present a solution to the problem of fitting such copulas to highly asymmetric data in arbitrary dimensions.
Dietmar Pfeifer
Andreas M\"andle
Olena Ragulina
2017-09
Quantile Factor Models
http://d.repec.org/n?u=RePEc:cte:werepe:25299&r=ecm
In this paper we introduce Quantile Factor Models (QFM) as a novel concept in the interface of the theory of factor models and quantile regressions (QR). The basic insight is that a few unobserved common factors may shift not just the mean but other parts of the distributions of observed variables in a panel dataset of dimension N × T. When the factors shifting the means and the quantiles of the observed variables coincide, a simple two-step procedure is proposed to estimate the common factors and the quantile factor loadings (QFL). We derive new conditions on N and T ensuring uniform consistency and weak convergence that allow to make inference of the entire QFL processes. When the two sets of factors differ, we develop an iterative procedure that estimates consistently both (potentially) quantile-dependent factors and QFL at a given quantile by minimizing a check-function criterion as in QR but with unobserved regressors. Simulation results confirm a satisfactory performance and illustrate the advantages of our QFM estimation approach in finite samples. Empirical applications of our estimation procedures to several well-known large panel datasets provide strong evidence that extra factors shifting quantiles, and not just the means, could be relevant in applied work on factor structures.
Gonzalo, Jesús
Dolado, Juan J.
Chen, Liang
Incidental parameters ;
Generated regressors ;
Quantile regression ;
Factor models
2017-09-09
Inference for Impulse Responses under Model Uncertainty
http://d.repec.org/n?u=RePEc:arx:papers:1709.09583&r=ecm
In many macroeconomic applications, impulse responses and their (bootstrap) confidence intervals are constructed by estimating a VAR model in levels - thus ignoring uncertainty regarding the true (unknown) cointegration rank. While it is well known that using a wrong cointegration rank leads to invalid (bootstrap) inference, we demonstrate that even if the rank is consistently estimated, ignoring uncertainty regarding the true rank can make inference highly unreliable for sample sizes encountered in macroeconomic applications. We investigate the effects of rank uncertainty in a simulation study, comparing several methods designed for handling model uncertainty. We propose a new method - Weighted Inference by Model Plausibility (WIMP) - that takes rank uncertainty into account in a fully data-driven way and outperforms all other methods considered in the simulation study. The WIMP method is shown to deliver intervals that are robust to rank uncertainty, yet allow for meaningful inference, approaching fixed rank intervals when evidence for a particular rank is strong. We study the potential ramifications of rank uncertainty on applied macroeconomic analysis by re-assessing the effects of fiscal policy shocks based on a variety of identification schemes that have been considered in the literature. We demonstrate how sensitive the results are to the treatment of the cointegration rank, and show how formally accounting for rank uncertainty can affect the conclusions.
Lenard Lieb
Stephan Smeekes
2017-09
Indirect Inference with Importance Sampling: An Application to Women's Wage Growth
http://d.repec.org/n?u=RePEc:iza:izadps:dp11004&r=ecm
This paper has two main parts. In the first, we describe a method that smooths the objective function in a general class of indirect inference models. Our smoothing procedure makes use of importance sampling weights in estimation of the auxiliary model on simulated data. The importance sampling weights are constructed from likelihood contributions implied by the structural model. Since this approach does not require transformations of endogenous variables in the structural model, we avoid the potential approximation errors that may arise in other smoothing approaches for indirect inference. We show that our alternative smoothing method yields consistent estimates. The second part of the paper applies the method to estimating the effect of women's fertility on their human capital accumulation. We find that the curvature in the wage profile is determined primarily by curvature in the human capital accumulation function as a function of previous human capital, as opposed to being driven primarily by age. We also find a modest effect of fertility induced nonemployment spells on human capital accumulation. We estimate that the difference in wages among prime age women would be approximately 3% higher if the relationship between fertility and working were eliminated.
Sauer, Robert M.
Taber, Christopher
indirect inference, simulation estimation, wage growth, women
2017-09
Comparing 2SLS vs 2SRI for Binary Outcomes and Binary Exposures
http://d.repec.org/n?u=RePEc:nbr:nberwo:23840&r=ecm
This study uses Monte Carlo simulations to examine the ability of the two-stage least-squares (2SLS) estimator and two-stage residual inclusion (2SRI) estimators with varying forms of residuals to estimate the local average and population average treatment effect parameters in models with binary outcome, endogenous binary treatment, and single binary instrument. The rarity of the outcome and the treatment are varied across simulation scenarios. Results show that 2SLS generated consistent estimates of the LATE and biased estimates of the ATE across all scenarios. 2SRI approaches, in general, produce biased estimates of both LATE and ATE under all scenarios. 2SRI using generalized residuals minimizes the bias in ATE estimates. Use of 2SLS and 2SRI is illustrated in an empirical application estimating the effects of long-term care insurance on a variety of binary healthcare utilization outcomes among the near-elderly using the Health and Retirement Study.
Anirban Basu
Norma Coe
Cole G. Chapman
2017-09
Sharp bounds for the Roy model
http://d.repec.org/n?u=RePEc:arx:papers:1709.09284&r=ecm
We analyze the empirical content of the Roy model, stripped down to its essential features, namely sector specific unobserved heterogeneity and self-selection on the basis of potential outcomes. We characterize sharp bounds on the joint distribution of potential outcomes and the identifying power of exclusion restrictions. The latter include variables that affect market conditions only in one sector and variables that affect sector selection only. Special emphasis is put on the case of binary outcomes, which has received little attention in the literature to date. For richer sets of outcomes, we emphasize the distinction between pointwise sharp bounds and functional sharp bounds, and its importance, when constructing sharp bounds on functional features, such as inequality measures. We analyze a Roy model of college major choice in Canada within this framework, and we take a new look at the under-representation of women in Science, Technology, Engineering or Mathematics (STEM).
Ismael Mourifie
Marc Henry
Romuald Meango
2017-09
Model Averaging and its Use in Economics
http://d.repec.org/n?u=RePEc:pra:mprapa:81568&r=ecm
The method of model averaging has become an important tool to deal with model uncertainty, in particular in empirical settings with large numbers of potential models and relatively limited numbers of observations, as are common in economics. Model averaging is a natural response to model uncertainty in a Bayesian framework, so most of the paper deals with Bayesian model averaging. In addition, frequentist model averaging methods are also discussed. Numerical methods to implement these methods are explained, and I point the reader to some freely available computational resources. The main focus is on the problem of variable selection in linear regression models, but the paper also discusses other, more challenging, settings. Some of the applied literature is reviewed with particular emphasis on applications in economics. The role of the prior assumptions in Bayesian procedures is highlighted, and some recommendations for applied users are provided
Steel, Mark F. J.
Bayesian methods; Model uncertainty; Normal linear model; Prior specification; Robustness
2017-09-19
Fixed Effect Estimation of Large T Panel Data Models
http://d.repec.org/n?u=RePEc:arx:papers:1709.08980&r=ecm
This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.
Iv\'an Fern\'andez-Val
Martin Weidner
2017-09
Inference on Estimators defined by Mathematical Programming
http://d.repec.org/n?u=RePEc:arx:papers:1709.09115&r=ecm
We propose an inference procedure for estimators defined by mathematical programming problems, focusing on the important special cases of linear programming (LP) and quadratic programming (QP). In these settings, the coefficients in both the objective function and the constraints of the mathematical programming problem may be estimated from data and hence involve sampling error. Our inference approach exploits the characterization of the solutions to these programming problems by complementarity conditions; by doing so, we can transform the problem of doing inference on the solution of a constrained optimization problem (a non-standard inference problem) into one involving inference based on a set of inequalities with pre-estimated coefficients, which is much better understood. We evaluate the performance of our procedure in several Monte Carlo simulations and an empirical application to the classic portfolio selection problem in finance.
Yu-Wei Hsieh
Xiaoxia Shi
Matthew Shum
2017-09
This time it is different! Or not?
http://d.repec.org/n?u=RePEc:ems:eureir:101764&r=ecm
We employ a simple method based on logistic weighted least squares to diagnose which past data are less or more useful for predicting the future course of a variable. A simulation experiment shows its merits. An illustration for monthly industrial production series for 17 countries suggests that earlier data are useful, for the prediction in a crisis period (2006-2011) and for the period after the crisis (2011-2016). Hence, this time, apparently it was not that different after all.
Franses, Ph.H.B.F.
Janssens, E.
Forecasting, Weighted Least Squares, Discounting, Logistic function, Industrial Production
2017-09-01