
on Econometrics 
By:  Alexandre Belloni (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT); Kengo Kato (Institute for Fiscal Studies) 
Abstract:  This work proposes new inference methods for the estimation of a regression coefficient of interest in quantile regression models. We consider highdimensional models where the number of regressors potentially exceeds the sample size but a subset of them suffice to construct a reasonable approximation of the unknown quantile regression function in the model. The proposed methods are protected against moderate model selection mistakes, which are often inevitable in the approximately sparse model considered here. The methods construct (implicitly or explicitly) an optimal instrument as a residual from a densityweighted projection of the regressor of interest on other regressors. Under regularity conditions, the proposed estimators of the quantile regression coefficient are asymptotically rootn normal, with variance equal to the semiparametric efficiency bound of the partially linear quantile regression model. In addition, the performance of the technique is illustrated through Montecarlo experiments and an empirical example, dealing with risk factors in childhood malnutrition. The numerical results confirm the theoretical findings that the proposed methods should outperform the naive postmodel selection methods in nonparametric settings. Moreover, the empirical results demonstrate soundness of the proposed methods. 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:53/14&r=ecm 
By:  Hyungsik Roger Moon (Institute for Fiscal Studies); Martin Weidner (Institute for Fiscal Studies and cemmap and UCL) 
Abstract:  We analyze linear panel regression models with interactive fixed effects and predetermined regressors, e.g. laggeddependent variables. The first order asymptotic theory of the least squares (LS) estimator of the regression coefficients is worked out in the limit where both the cross sectional dimension and the number of time periods become large. We find that there are two sources of asymptotic bias of the LS estimator: bias due to correlation or heteroscedasticity of the idiosyncratic error term, and bias due to predetermined (as opposed to strictly exogenous) regressors. A bias corrected least squares estimator is provided. We also present bias corrected versions of the three classical test statistics (Wald, LR and LM test) and show that their asymptotic distribution is a chisquaredistribution. Monte Carlo simulations show that the bias correction of the LS estimator and of the test statistics also work well for finite sample sizes. Supplementary material for this paper is available here. 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:47/14&r=ecm 
By:  Matias D. Cattaneo (University of Michigan); Michael Jansson (UC Berkeley and CREATES); Whitney K. Newey (MIT) 
Abstract:  The linear regression model is widely used in empirical work in Economics. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using highdimensional approximations, where the number of covariates are allowed to grow as fast as the sample size. We find that all of the usual versions of EickerWhite heteroskedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroskedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: (i) parametric linear models with many covariates, (ii) semiparametric semilinear models with many technical regressors, and (iii) linear panel models with many fixed effects. 
Keywords:  highdimensional models, linear regression, many regressors, heteroskedasticity, standard errors. 
JEL:  C12 C14 C21 
Date:  2015–07–09 
URL:  http://d.repec.org/n?u=RePEc:aah:create:201531&r=ecm 
By:  Karun Adusumilli; Taisuke Otsu 
Abstract:  This paper considers nonparametric instrumental variable regression when the endogenous variable is contaminated with classical measurement error. Existing methods are inconsistent in the presence of measurement error. We propose a wavelet deconvolution estimator for the structural function that modifies the generalized Fourier coefficients of the orthogonal series estimator to take into account the measurement error. We establish the convergence rates of our estimator for the cases of mildly/severely illposed models and ordinary/super smooth measurement errors. We characterize how the presence of measurement error slows down the convergence rates of the estimator. We also study the case where the measurement error density is unknown and needs to be estimated, and show that the estimation error of the measurement error density is negligible under mild conditions as far as the measurement error density is symmetric. 
Keywords:  Nonparametric instrumental variable regression, measurement error, inverse problem, deconvolution, measurement error 
JEL:  C26 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:cep:stiecm:/2015/585&r=ecm 
By:  Katarzyna Bech (Institute for Fiscal Studies); Grant Hillier (Institute for Fiscal Studies and University of Southampton) 
Abstract:  This paper presents new approaches to testing for exogeneity in nonparametric models with discrete regressors and instruments. Our interest is in learning about an unknown structural (conditional mean) function. An interesting feature of these models is that under endogeneity the identifying power of a discrete instrument depends on the number of support points of the instruments relative to that of the regressors, a result driven by the discreteness of the variables. Observing that the simple nonparametric additive error model can be interpreted as a linear regression, we present two teststatistics. For the point identifying model, the test is an adapted version of the standard WuHausman approach. This extends the work of Blundell and Horowitz (2007) to the case of discrete regressors and instruments. For the set identifying model, the WuHausman approach is not available. In this case the teststatistic is derived from a constrained minimization problem. The asymptotic distributions of the teststatistics are derived under the null and fixed and local alternatives. The tests are shown to be consistent, and a simulation study reveals that the proposed tests have satisfactory finitesample properties. 
Date:  2015–03 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:11/15&r=ecm 
By:  Silia Vitoratou; Ioannis Ntzoufras; Irini Moustaki 
Abstract:  In latent variable models parameter estimation can be implemented by using the joint or the marginal likelihood, based on independence or conditional independence assumptions. The same dilemma occurs within the Bayesian framework with respect to the estimation of the Bayesian marginal (or integrated) likelihood, which is the main tool for model comparison and averaging. In most cases, the Bayesian marginal likelihood is a high dimensional integral that cannot be computed analytically and a plethora of methods based on Monte Carlo integration (MCI) are used for its estimation. In this work, it is shown that the joint MCI approach makes subtle use of the properties of the adopted model, leading to increased error and bias in finite settings. The sources and the components of the error associated with estimators under the two approaches are identified here and provided in exact forms. Additionally, the effect of the sample covariation on the Monte Carlo estimators is examined. In particular, even under independence assumptions the sample covariance will be close to (but not exactly) zero which surprisingly has a severe effect on the estimated values and their variability. To address this problem, an index of the sample's divergence from independence is introduced as a multivariate extension of covariance. The implications addressed here are important in the majority of practical problems appearing in Bayesian inference of multiparameter models with analogous structures. 
Keywords:  Bayes factor; marginal likelihood; Monte Carlo integration 
JEL:  C1 
Date:  2014–07–24 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:57685&r=ecm 
By:  Alexandre Belloni (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT); Christian Hansen (Institute for Fiscal Studies and Chicago GSB); Damian Kozbur (Institute for Fiscal Studies) 
Abstract:  We consider estimation and inference in panel data models with additive unobserved individual specific heterogeneity in a high dimensional setting. The setting allows the number of time varying regressors to be larger than the sample size. To make informative estimation and inference feasible, we require that the overall contribution of the time varying variables after eliminating the individual specific heterogeneity can be captured by a relatively small number of the available variables whose identities are unknown. This restriction allows the problem of estimation to proceed as a variable selection problem. Importantly, we treat the individual specific heterogeneity as fixed effects which allows this heterogeneity to be related to the observed time varying variables in an unspecified way and allows that this heterogeneity may be nonzero for all individuals. Within this framework, we provide procedures that give uniformly valid inference over a fixed subset of parameters in the canonical linear fixed effects model and over coefficients on a fixed vector of endogenous variables in panel data instrumental variables models with fixed effects and many instruments. An input to developing the properties of our proposed procedures is the use of a variant of the Lasso estimator that allows for a grouped data structure where data across groups are independent and dependence within groups is unrestricted. We provide formal conditions within this structure under which the proposed Lasso variant selects a sparse model with good approximation properties. We present simulation results in support of the theoretical developments and illustrate the use of the methods in an application aimed at estimating the effect of gun prevalence on crime rates. 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:50/14&r=ecm 
By:  Chau, Tak Wai 
Abstract:  Abstract Klein and Vella (2010) and Lewbel (2012) respectively propose estimators that utilize the heteroscedasticity of the error terms to identify the coefficient of the endogenous regressor in a standard linear model, even when there are no exogenous excluded instruments. The assumptions on the form of heteroscedasticity are different for these two estimators, and whether they are robust to misspecification is an important issue because it is not straightforward how to justify which form of heteroscedasticity is true. This paper presents some simulation results for the finitesample performance of the two estimators under various forms of heteroscedasticity. The results reveal that both estimators can be substantially biased when the form of heteroscedasticity is of the wrong type, meaning that they lack robustness to misspecification of the form of heteroscedasticity. Moreover, the J statistics of the overidentification test for the Lewbel (2012) estimator has low power under the wrong form of heteroscedasticity in the cases considered. The results suggest that it is not enough for researchers to justify only the existence of heteroscedasticity when using the proposed estimators. 
Keywords:  Instrumental Variable Estimation, Endogeneity, Heteroscedasticity 
JEL:  C13 C31 C36 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:65888&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Christian Hansen (Institute for Fiscal Studies and Chicago GSB); Yuan Liao (Institute for Fiscal Studies) 
Abstract:  Common highdimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of nonzero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small nonzero parameters. We consider a generalization of these two basic models, termed here a “sparse+dense” model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalizationbased method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finitesample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein’s unbiased estimator for lava’s prediction risk. A simulation example compares the performance of lava to lasso, ridge, and elastic net in a regression example using feasible, datadependent penalty parameters and illustrates lava’s improved performance relative to these benchmarks. 
Keywords:  Highdimensional models, penalization, shrinkage, nonsparse signal recovery 
Date:  2015–02 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:05/15&r=ecm 
By:  Daniel Wilhelm (Institute for Fiscal Studies and cemmap and UCL) 
Abstract:  This paper provides a constructive argument for identification of nonparametric panel data models with measurement error in a continuous explanatory variable. The approach point identifies all structural elements of the model using only observations of the outcome and the mismeasured explanatory variable; no further external variables such as instruments are required. In the case of two time periods, restricting either the structural or the measurement error to be independent over time allows past explanatory variables or outcomes to serve as instruments. Time periods have to be linked through serial dependence in the latent explanatory variable, but the transition process is left nonparametric. The paper discusses the general identification result in the context of a nonlinear panel data regression model with additively separable fixed effects. It provides a nonparametric plugin estimator, derives its uniform rate of convergence, and presents simulation evidence for good performance in finite samples. 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:34/15&r=ecm 
By:  Denis Chetverikov (Institute for Fiscal Studies and UCLA); Daniel Wilhelm (Institute for Fiscal Studies and cemmap and UCL) 
Abstract:  The illposedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the illposedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of illposedness when restricted to the space of monotone functions. Based on this result we derive a novel nonasymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of illposedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of illposedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finitesample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data. 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:39/15&r=ecm 
By:  Manuel Arellano (Institute for Fiscal Studies and CEMFI); Stéphane Bonhomme (Institute for Fiscal Studies and University of Chicago) 
Abstract:  We introduce a class of quantile regression estimators for short panels. Our framework covers static and dynamic autoregressive models, models with general predetermined regressors, and models with multiple individual effects. We use quantile regression as a flexible tool to model the relationships between outcomes, covariates, and heterogeneity. We develop an iterative simulationbased approach for estimation, which exploits the computational simplicity of ordinary quantile regression in each iteration step. Finally, an application to measure the effect of smoking during pregnancy on children’s birthweights completes the paper. 
Keywords:  Panel data; dynamic models; nonseparable heterogeneity; quantile regression; expectationmaximization 
JEL:  C23 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:40/15&r=ecm 
By:  Shujie Ma; Jeffrey S. Racine; Aman Ullah 
Abstract:  We consider a Bspline regression approach towards ecient nonparametric modelling of a random effects (error component) model. Theoretical underpinnings are provided, finitesample performance is evaluated via Monte Carlo simulation, and an application that examines the contribution of different types of public infrastructure on private production is investigated using panel data comprising the 48 contiguous states in the US over the period 19701986. 
JEL:  C14 C23 
Date:  2015–08 
URL:  http://d.repec.org/n?u=RePEc:mcm:deptwp:201510&r=ecm 
By:  Paresh K Narayan (Deakin University); Ruipeng Liu (Deakin University) 
Abstract:  In this paper, we propose a GARCHbased unit root test that is flexible enough to account for; (a) trending variables, (b) two endogenous structural breaks, and (c) heteroskedastic data series. Our proposed model is applied to a range of timeseries, trending, and heteroskedastic energy variables. Our two main findings are: first, the proposed trendbased GARCH unit root model outperforms a GARCH model without trend; and, second, allowing for a time trend and two endogenous structural breaks are important in practice, for doing so allows us to reject the unit root null hypothesis. 
Keywords:  Timeseries; Energy; Unit Root; Trending Variables. 
URL:  http://d.repec.org/n?u=RePEc:dkn:ecomet:fe_2015_05&r=ecm 
By:  Joakim Westerlund (Deakin University); Hande Karabiyik (Lund University); Paresh K Narayan (Deakin University) 
Abstract:  The difficulty of predicting returns has recently motivated researchers to start looking for tests that are either robust or more powerful. Unfortunately, the way that these tests work typically involves trading robustness for power or vice versa. The current paper takes this as its starting point to develop a new panelbased approach to predictability that is both robust and powerful. Specifically, while the panel route to increased power is not new, the way in which the crosssection variation is exploited to achieve also robustness with respect to the predictor is. The result is two new tests that enable asymptotically standard normal and chisquared inference across a wide range of empirical relevant scenarios in which the predictor may be stationary, unit root nonstationary, or anything in between. The crosssection dependence of the predictor is also not restricted, and can be weak, strong, or indeed anything in between. What is more, this generality comes at no cost in terms of test construction. The new tests are therefore very userfriendly. 
Keywords:  Panel data; Predictive regression; Predictor persistency; Crosssection dependence. 
JEL:  C22 C23 G1 G12 
URL:  http://d.repec.org/n?u=RePEc:dkn:ecomet:fe_2015_10&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Kengo Kato (Institute for Fiscal Studies) 
Abstract:  This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by p, is possibly much larger than the sample size n. There are a variety of economic applications where the problem of testing many moment inequalities appears; a notable example is a market structure model of Ciliberto and Tamer (2009) where p = 2m+1 with m being the number of firms. We consider the test statistic given by the maximum of p Studentized (or ttype) statistics, and analyze various ways to compute critical values for the test statistic. Specifically, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for selfnormalized sums, (ii) the multiplier and empirical bootstraps, and (iii) twostep and threestep variants of (i) and (ii) by incorporating selection of uninformative inequalities that are far from being binding and novel selection of weakly informative inequalities that are potentially binding but do not provide first order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with error in size decreasing polynomially in n while allowing for p being much larger than n; indeed p can be of order exp(nc) for some c > 0. Importantly, all these results hold without any restriction on correlation structure between p Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, when p grows with n, we show that all of our tests are (minimax) optimal in the sense that they are uniformly consistent against alternatives whose "distance" from the null is larger than the threshold (2(log p)/n)1/2, while any test can only have trivial power in the worst case when the distance is smaller than the threshold. Finally, we show validity of a test based on block multiplier bootstrap in the case of dependent data under some general mixing conditions. 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:52/14&r=ecm 
By:  Jiaying Gu (Institute for Fiscal Studies); Roger Koenker (Institute for Fiscal Studies and University of Illinois) 
Abstract:  Empirical Bayes methods for Gaussian compound decision problems involving longitudinal data are considered. The new convex optimization formulation of the nonparametric (KieferWolfowitz) maximum likelihood estimator for mixture models is employed to construct nonparametric Bayes rules for compound decisions. The methods are first illustrated with some simulation examples and then with an application to models of income dynamics. Using PSID data we estimate a simple dynamic model of earnings that incorporates bivariate heterogeneity in intercept and variance of the innovation process. Profile likelihood is employed to estimate an AR(1) parameter controlling the persistence of the innovations. We find that persistence is relatively modest, ?˜ 0.48, when we permit heterogeneity in variances. Evidence of negative dependence between individual intercepts and variances is revealed by the nonparametric estimation of the mixing distribution, and has important consequences for forecasting future income trajectories. 
Date:  2014–11 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:43/14&r=ecm 
By:  Stefan Hoderlein (Institute for Fiscal Studies and Boston College); Hajo Holzmann (Institute for Fiscal Studies); Alexander Meister (Institute for Fiscal Studies) 
Abstract:  The triangular model is a very popular way to capture endogeneity. In this model, an outcome is determined by an endogenous regressor, which in turn is caused by an instrument in a first stage. In this paper, we study the triangular model with random coefficients and exogenous regressors in both equations. We establish a profound nonidentification result: the joint distribution of the random coefficients is not identified, implying that counterfactual outcomes are also not identified in general. This result continues to hold, if we confine ourselves to the joint distribution of coefficients in the outcome equation or any marginal, except the one on the endogenous regressor. Identification continues to fail, even if we focus on means of random coefficients (implying that IV is generally biased), or let the instrument enter the first stage in a monotonic fashion. Based on this insight, we derive bounds on the joint distribution of random parameters, and suggest an additional restriction that allows to point identify the distribution of random coefficients in the outcome equation. We extend this framework to cover the case where the regressors and instruments have limited support, and analyze semi and nonparametric sample counterpart estimators in finite and large samples. Finally, we give an application of the framework to consumer demand. 
Keywords:  Random Coefficients, Endogeneity, Nonparametric, Identification, Radon Transform, Demand. 
Date:  2015–06 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:33/15&r=ecm 
By:  LeYu Chen (Institute for Fiscal Studies and Academia Sinica); Sokbae Lee (Institute for Fiscal Studies and cemmap and SNU) 
Abstract:  This paper studies inference of preference parameters in semiparametric discrete choice models when these parameters are not pointidentiï¬ed and the identiï¬ed set is characterized by a class of conditional moment inequalities. Exploring the semiparametric modeling restrictions, we show that the identiï¬ed set can be equivalently formulated by moment inequalities conditional on only two continuous indexing variables. Such formulation holds regardless of the covariate dimension, thereby breaking the curse of dimensionality for nonparametric inference based on the underlying conditional moment inequalities. We also extend this dimension reducing characterization result to a variety of semiparametric models under which the sign of conditional expectation of a certain transformation of the outcome is the same as that of the indexing variable. 
Date:  2015–06 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:26/15&r=ecm 
By:  Larry G. Epstein (Institute for Fiscal Studies and Boston University); Hiroaki Kaido (Institute for Fiscal Studies and Boston University); Kyoungwon Seo (Institute for Fiscal Studies and Korea Advanced Institute of Science and Technology (KAIST)) 
Abstract:  Call an economic model incomplete if it does not generate a probabilistic prediction even given knowledge of all parameter values. We propose a method of inference about unknown parameters for such models that is robust to heterogeneity and dependence of unknown form. The key is a Central Limit Theorem for belief functions; robust confidence regions are then constructed in a fashion paralleling the classical approach. Monte Carlo simulations support tractability of the method and demonstrate its enhanced robustness relative to existing methods. 
Date:  2015–04 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:20/15&r=ecm 
By:  Oliver Linton (Institute for Fiscal Studies and cemmap and Cambridge); Katja Smetanina (Institute for Fiscal Studies) 
Abstract:  We propose an alternative Ratio Statistic for measuring predictability of stock prices. Our statistic is based on actual returns rather than logarithmic returns and is therefore better suited to capturing price predictability. It captures not only linear dependence in the same way as the variance ratio statistics of Lo and MacKinlay (1988) but also some nonlinear dependencies. We derive the asymptotic distribution of the statistics under the null hypothesis that simple gross returns are unpredictable after a constant mean adjustment. This represents a test of the weak form of the Efficient Market Hypothesis. We also consider the multivariate extension, in particular, we derive the restrictions implied by the EMH on multiperiod portfolio gross returns. We apply our methodology to test the gross return predictability of various financial series. 
Keywords:  Variance Ratio Tests, Martingale, Predictability 
JEL:  C10 C22 G10 G14 
Date:  2015–02 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:08/15&r=ecm 
By:  Arun Advani (Institute for Fiscal Studies); Bansi Malde (Institute for Fiscal Studies) 
Abstract:  In many contexts we may be interested in understanding whether direct connections between agents, such as declared friendships in a classroom or family links in a rural village, affect their outcomes. In this paper we review the literature studying econometric methods for the analysis of social networks. We begin by providing a common framework for models of social effects, a class that includes the `linearinmeans' local average model, the local aggregate model, and models where network statistics affect outcomes. We discuss identification of these models using both observational and experimental/quasiexperimental data. We then discuss models of network formation, drawing on a range of literatures to cover purely predictive models, reduced form models, and structural models, including those with a strategic element. Finally we discuss how one might collect data on networks, and the measurement error issues caused by sampling of networks, as well as measurement error more broadly. 
Keywords:  Networks, Social Effects, Peer Effects, Econometrics, Endogeneity, Measurement Error, Sampling Design 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:ifsewp:14/34&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Kengo Kato (Institute for Fiscal Studies) 
Abstract:  In this paper, we derive central limit and bootstrap theorems for probabilities that centered highdimensional vector sums hit rectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for the probabilities that a rootn rescaled sample average of Xi is in A, where X1,..., Xn are independent random vectors in Rp and A is a rectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if p=pn> infinity and p>>n; in particular, p can be as large as O(e^(Cn^c)) for some constants c,C>0. The result holds uniformly over all rectangles, or more generally, sparsely convex sets, and does not require any restrictions on the correlation among components of Xi. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend nontrivially only on a small subset of their arguments, with rectangles being a special case. 
Date:  2014–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:49/14&r=ecm 
By:  Tsagris, Michail 
Abstract:  In compositional data, an observation is a vector with nonnegative components which sum to a constant, typically 1. Data of this type arise in many areas, such as geology, archaeology, biology, economics and political science among others. The goal of this paper is to extend the taxicab metric and a newly suggested metric for compositional data by employing a power transformation. Both metrics are to be used in the knearest neighbours algorithm regardless of the presence of zeros. Examples with real data are exhibited. 
Keywords:  compositional data, entropy, kNN algorithm, metric, supervised classification 
JEL:  C18 
Date:  2014–07 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:65866&r=ecm 
By:  Alexandre Belloni (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT); Christian Hansen (Institute for Fiscal Studies and Chicago GSB) 
Abstract:  The goal of many empirical papers in economics is to provide an estimate of the causal or structural effect of a change in a treatment or policy variable, such as a government intervention or a price, on another economically interesting variable, such as unemployment or amount of a product purchased. Applied economists attempting to estimate such structural effects face the problems that economically interesting quantities like government policies are rarely randomly assigned and that the available data are often highdimensional. Failure to address either of these issues generally leads to incorrect inference about structural effects, so methodology that is appropriate for estimating and performing inference about these effects when treatment is not randomly assigned and there are many potential control variables provides a useful addition to the tools available to applied economists. 
Date:  2013–11 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:59/13&r=ecm 
By:  Seok Young Hong (Institute for Fiscal Studies); Oliver Linton (Institute for Fiscal Studies and cemmap and Cambridge); Hui Jun Zhang (Institute for Fiscal Studies) 
Abstract:  We propose several multivariate variance ratio statistics. We derive the asymptotic distribution of the statistics and scalar functions thereof under the null hypothesis that returns are unpredictable after a constant mean adjustment (i.e., under the weak form Efficient Market Hypothesis). We do not impose the no leverage assumption of Lo and MacKinlay (1988) but our asymptotic standard errors are relatively simple and in particular do not require the selection of a bandwidth parameter. We extend the framework to allow for a time varying risk premium through common systematic factors. We show the limiting behaviour of the statistic under a multivariate fads model and under a moderately explosive bubble process: these alternative hypotheses give opposite predictions with regards to the long run value of the statistics. We apply the methodology to five weekly sizesorted CRSP portfolio returns from 1962 to 2013 in three subperiods. We find evidence of a reduction of linear predictability in the most recent period, for small and medium cap stocks. The main findings are not substantially affected by allowing for a common factor time varying risk premium. 
Keywords:  Bubbles; Fads; Martingale; Momentum; Predictability 
JEL:  C10 C32 G10 G12 
Date:  2015–03 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:13/15&r=ecm 
By:  Joakim Westerlund (Deakin University); Paresh K Narayan (Deakin University); Xinwei Zheng (Deakin University) 
Abstract:  This paper proposes a simple panel data test for stock return predictability that is flexible enough to accommodate three key salient features of the data, namely, predictor persistency and endogeneity, and crosssectional dependence. Using a large panel of Chinese stock market data comprising more than one million observations, we show that most financial and macroeconomic predictors are in fact able to predict returns. We also show how the extent of the predictability varies across industries and firm sizes. 
Keywords:  Panel data; Bias; Crosssection dependence; Predictive regression; Stock return predictability; China. 
JEL:  C22 C23 G1 G12 
URL:  http://d.repec.org/n?u=RePEc:dkn:ecomet:fe_2015_11&r=ecm 
By:  Andrew Chesher (Institute for Fiscal Studies and cemmap and UCL); Adam Rosen (Institute for Fiscal Studies and cemmap and UCL) 
Abstract:  We study a generalization of the treatment effect model in which an observed discrete classifier indicates in which one of a set of counterfactual processes a decision maker is observed. The other observed outcomes are delivered by the particular counterfactual process in which the decision maker is found. Models of the counterfactual processes can be incomplete in the sense that even with knowledge of the values of observed exogenous and unobserved variables they may not deliver a unique value of the endogenous outcomes. We study the identifying power of models of this sort that incorporate (i) conditional independence restrictions under which unobserved variables and the classifier variable are stochastically independent conditional on some of the observed exogenous variables and (ii) marginal independence restrictions under which unobservable variables and a subset of the exogenous variables are independently distributed. Building on results in Chesher and Rosen (2014a), we characterize the identifying power of these models for fundamental structural relationships and probability distributions and for interesting functionals of these objects, some of which may be point identified. In one example of an application, we observe the entry decisions of firms that can choose which of a number of markets to enter and we observe various endogenous outcomes delivered in the markets they choose to enter. 
Date:  2015–06 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:22/15&r=ecm 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Alfred Galichon (Institute for Fiscal Studies and Science Po, Paris); Marc Hallin (Institute for Fiscal Studies and Université Libre de Bruxelles); Marc Henry (Institute for Fiscal Studies) 
Abstract:  We propose new concepts of statistical depth, multivariate quantiles, ranks and signs, based on canonical transportation maps between a distribution of interest on IRd and a reference distribution on the ddimensional unit ball. The new depth concept, called MongeKantorovich depth, specializes to halfspace depth in the case of elliptical distributions, but, for more general distributions, differs from the latter in the ability for its contours to account for non convex features of the distribution of interest. We propose empirical counterparts to the population versions of those MongeKantorovich depth contours, quantiles, ranks and signs, and show their consistency by establishing a uniform convergence property for empirical transport maps, which is of independent interest. 
Keywords:  Statistical depth, vector quantiles, vector ranks, multivariate signs, empirical transport maps, uniform convergence of empirical transport 
Date:  2015–01 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:04/15&r=ecm 
By:  Borgini, Riccardo; Bianco, Paola Del; Salvati, Nicola; Schmid, Timo; Tzavidis, Nikos 
Abstract:  Healthrelated quality of life assessment is important in the clinical evaluation of patients with metastatic disease that may offer useful information in understanding the clinical effectiveness of a treatment. To assess if a set of explicative variables impacts on the healthrelated quality of life, regression models are routinely adopted. However, the interest of researchers may be focussed on modelling other parts (e.g. quantiles) of this conditional distribution. In this paper we present an approach based on Mquantile regression to achieve this goal. We applied the proposed methodology to a prospective, randomized, multicentre clinical trial. In order to take into account the hierarchical nature of the data we extended the Mquantile regression model to a threelevel random effects specification and estimated it by maximum likelihood. 
Keywords:  hierarchical data,in uence function,robust estimation,quantile regression,multilevel modelling 
Date:  2015 
URL:  http://d.repec.org/n?u=RePEc:zbw:fubsbe:201519&r=ecm 
By:  Ostap Okhrin; Alexander Ristig; Jeffrey Sheen; Stefan Trück 
Abstract:  Financial contagion and systemic risk measures are commonly derived from conditional quantiles by using imposed model assumptions such as a linear parametrization. In this paper, we provide model free measures for contagion and systemic risk which are independent of the speci cation of conditional quantiles and simple to interpret. The proposed systemic risk measure relies on the contagion measure, whose tail behavior is theoretically studied. To emphasize contagion from extreme events, conditional quantiles are specied via hierarchical Archimedean copula. The parameters and structure of this copula are simultaneously estimated by imposing a nonconcave penalty on the structure. Asymptotic properties of this sparse estimator are derived and small sample properties illustrated using simulations. We apply the proposed framework to investigate the interconnectedness between American, European and Australasian stock market indices, providing new and interesting insights into the relationship between systemic risk and contagion. In particular, our ndings suggest that the systemic risk contribution from contagion in tail areas is typically lower during times of nancial turmoil, while it can be signicantly higher during periods of low volatility. 
Keywords:  Conditional quantile, Copula, Financial contagion, Spillover eect, Stepwise penalized ML estimation, Systemic risk, Tail dependence 
JEL:  C40 C46 C51 G1 G2 
Date:  2015–08 
URL:  http://d.repec.org/n?u=RePEc:hum:wpaper:sfb649dp2015038&r=ecm 
By:  Paresh K Narayan (Deakin University); Ruipeng Liu (Deakin University) 
Abstract:  In this paper we propose a generalised autoregressive conditional heteroskedasticity (GARCH) modelbased test for a unit root. The model allows for two endogenous structural breaks. We test for unit roots in 156 US stocks listed on the NYSE over the period 1980 to 2007. We find that the unit root null hypothesis is rejected in 40% of the stocks, and only in four out of the nine sectors the null is rejected for over 50% of stocks. We conclude with an economic significance analysis, showing that mostly stocks with mean reverting prices tend to outperform stocks with nonstationary prices. 
Keywords:  Efficient Market Hypothesis; GARCH; Unit Root; Structural Break; Stock Price. 
URL:  http://d.repec.org/n?u=RePEc:dkn:ecomet:fe_2015_01&r=ecm 
By:  Ãureo de Paula (Institute for Fiscal Studies and University College London); Seth RichardsShubik (Institute for Fiscal Studies); Elie Tamer (Institute for Fiscal Studies and Northwestern University) 
Abstract:  This paper provides a framework for identifying preferences in a large network under the assumption of pairwise stability of network links. Network data present difficulties for identification, especially when links between nodes in a network can be interdependent: e.g., where indirect connections matter. Given a preference specification, we use the observed proportions of various possible payoffrelevant local network structures to learn about the underlying parameters. We show how one can map the observed proportions of these local structures to sets of parameters that are consistent with the model and the data. Our main result provides necessary conditions for parameters to belong to the identified set, and this result holds for a wide class of models. We also provide sufficient conditions  and hence a characterization of the identified set  for two empirically relevant classes of specifications. An interesting feature of our approach is the use of the economic model under pairwise stability as a vehicle for effective dimension reduction. The paper then provides a quadratic programming algorithm that can be used to construct the identified sets. This algorithm is illustrated with a pair of simulation exercises. 
Date:  2015–06 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:29/15&r=ecm 
By:  Juraj Hucek (National Bank of Slovakia, Economic and Monetary Analyses Department); Alexander Karsay (National Bank of Slovakia, Economic and Monetary Analyses Department); Marian Vavra (National Bank of Slovakia, Research Department) 
Abstract:  This occasional paper considers the problem of forecasting, nowcasting, and backcasting the Slovak real GDP growth rate using approximate factor models. Three different versions of approximate factor models are proposed. Forecast comparison with other models such as bridge equation models and ARMA models is also provided. Our results reveal that factor models clearly outperform an ARMA model and can compete with bridge models currently used at the Bank. Therefore, we tend to incorporate factor models into the regular forecasting process at the Bank.Finally, we hold the view that future research should be devoted to further improvements of bridge models since these models are simple to construct, easy to understand, and widely used in central banks. 
Keywords:  factor models, principal components, bridge equations, shortterm forecasting, GDP 
JEL:  C22 C38 C52 C53 E27 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:svk:wpaper:1035&r=ecm 
By:  William Larson (Federal Housing Finance Agency) 
Abstract:  There is a debate in the literature on the best method to forecast an aggregate: (1) forecast the aggregate directly, (2) forecast the disaggregates and then aggregate, or (3) forecast the aggregate using disaggregate information. This paper contributes to this debate by suggesting that in the presence of moderatesized structural breaks in the disaggregates, approach (2) is preferred because of the low power to detect mean shifts in the disaggregates using models of aggregates. In support of this approach are two exercises. First, a simple Monte Carlo study demonstrates theoretical forecasting improvements. Second, empirical evidence is given using pseudoex ante forecasts of aggregate proven oil reserves in the United States. 
Keywords:  Model selection; Intercept correction; Forecast robustification 
JEL:  C52 C53 Q3 
Date:  2015–07 
URL:  http://d.repec.org/n?u=RePEc:gwc:wpaper:2015002&r=ecm 
By:  Felix Pretis 
Abstract:  Climate policy target variables including emissions and concentrations of greenhouse gases, as well as global mean temperatures are nonstationary time series invalidating the use of standard statistical inference procedures. Econometric cointegration analysis can be used to overcome some of these inferential difficulties, however, cointegration has been criticised in climate research for lacking a physical justification for its use. Here I show that a physical twocomponent energy balance model of global mean climate is equivalent to a cointegrated system that can be mapped to a cointegrated vector autoregression, making it directly testable, and providing a physical justification for econometric methods in climate research. Doing so opens the door to investigating the empirical impacts of shifts from both natural and human sources, and enables a close linking of databased macroeconomic models with climate systems. My approach finds statistical support of the model using global mean surface temperatures, 0700m ocean heat content and radiative forcing (e.g. from greenhouse gases). The model results show that previous empirical estimates of the temperature response to the doubling of CO2 may be misleadingly low due to model misspecification. 
Keywords:  Cointegration; VAR, Climate, Energy Balance. 
JEL:  C32 Q54 
Date:  2015–06–25 
URL:  http://d.repec.org/n?u=RePEc:oxf:wpaper:750&r=ecm 
By:  Kevin Fergusson; Eckhard Platen (Finance Discipline Group, UTS Business School, University of Technology, Sydney) 
Abstract:  The application of maximum likelihood estimation is not well studied for stochastic short rate models because of the cumbersome detail of this approach. We investigate the applicability of maximum likelihood estimation to stochastic short rate models. We restrict our consideration to three important short rate models, namely the Vasicek, CoxIngersollRoss and 3/2 short rate models, each having a closedform formula for the transition density function. The parameters of the three interest rate models are fitted to US cash rates and are found to be consistent with market assessments. 
Keywords:  Stochastic short rate; maximum likelihood estimation; Vasicek model; CoxIngersollRoss model; 3/2 model 
Date:  2015–07–01 
URL:  http://d.repec.org/n?u=RePEc:uts:rpaper:361&r=ecm 
By:  Grundl, Serafin J. (Board of Governors of the Federal Reserve System (U.S.)); Zhu, Yu (University of Leicester) 
Abstract:  This paper exploits variation in the number of bidders to separately identify the valuation distribution and the bidders' belief about the valuation distribution in firstprice auctions with independent private values. Exploiting variation in auction volume the result is extended to environments with risk averse bidders. In an illustrative application we fail to reject the null hypothesis of correct beliefs. 
Keywords:  Biased beliefs; firstprice auction; nonparametric identification; risk aversion 
Date:  2015–07–23 
URL:  http://d.repec.org/n?u=RePEc:fip:fedgfe:201556&r=ecm 
By:  Benjamin Wong; Varang Wiriyawit (Reserve Bank of New Zealand) 
Abstract:  We highlight how detrending within Structural Vector Autoregressions (SVAR) is directly linked to the shock identification. Consequences of trend misspecification are investigated using a prototypical Real Business Cycle model as the Data Generating Process. Decomposing the different sources of biases in the estimated impulse response functions, we find the biases arising directly from trend misspecification are not trivial when compared to other widely studied misspecifications. Misspecifying the trend can also distort impulse response functions of even the correctly detrended variable within the SVAR system. A possible solution hinted by our analysis is that increasing the lag order when estimating the SVAR may mitigate some of the biases associated with trend misspecification. 
Date:  2015–04 
URL:  http://d.repec.org/n?u=RePEc:nzb:nzbdps:2015/02&r=ecm 
By:  Mario Cerrato; John Crosby; Minjoo Kim; Yang Zhao 
Abstract:  We investigate the dynamic and asymmetric dependence structure between equity portfolios from the US and UK. We demonstrate the statistical significance of dynamic asymmetric copula models in modelling and forecasting market risk. First, we construct “highminuslow" equity portfolios sorted on beta, coskewness, and cokurtosis. We find substantial evidence of dynamic and asymmetric de pendence between characteristicsorted portfolios. Second, we consider a dynamic asymmetric copula model by combining the generalized hyperbolic skewed t copula with the generalized autoregressive score (GAS) model to capture both the multivariate nonnormality and the dynamic and asymmetric dependence between equity portfolios. We demonstrate its usefulness by evaluating the forecasting performance of ValueatRisk and Expected Shortfall for the highminuslow portfolios. From back testing, we find consistent and robust evidence that our dynamic asymmetric copula model provides the most accurate forecasts, indicating the importance of incorporating the dynamic and asymmetric dependence structure in risk management. 
Keywords:  asymmetry, tail dependence, dependence dynamics, dynamic skewed t copulas, VaR and ES forecasting 
JEL:  C32 C53 G17 G32 
Date:  2015–02 
URL:  http://d.repec.org/n?u=RePEc:gla:glaewp:2015_15&r=ecm 