nep-ecm New Economics Papers
on Econometrics
Issue of 2017‒05‒14
27 papers chosen by
Sune Karlsson
Örebro universitet

  1. A bootstrap method for constructing pointwise and uniform confidence bands for conditional quantile functions By Joel L. Horowitz; Anand Krishnamurthy
  2. Likelihood inference and the role of initial conditions for the dynamic panel data model By Jose Diogo Barbosa; Marcelo Moreira
  3. Robust and sparse estimation of high-dimensional precision matrices via bivariate outlier detection By Nogales Martin, Fco. Javier; Lafit, Ginette
  4. Conditional quantile processes based on series or many regressors By Alexandre Belloni; Victor Chernozhukov; Denis Chetverikov; Ivan Fernandez-Val
  5. Autoregressive Wild Bootstrap Inference for Nonparametric Trends By Friedrich, Marina; Smeekes, Stephan; Urbain, Jean-Pierre
  6. Testing many moment inequalities By Victor Chernozhukov; Denis Chetverikov; Kengo Kato
  7. A note on the Nelson Cao inequality constraints in the GJR-GARCH model: Is there a leverage effect? By Stavros Stavroyiannis
  8. Stochastic modelling of non-stationary financial assets By Joana Estevens; Paulo Rocha; Joao Boto; Pedro Lind
  9. Permutation tests for equality of distributions of functional data By Federico A. Bugni; Joel L. Horowitz
  10. Improved inference on cointegrating vectors in the presence of a near unit root using adjusted quantiles By Massimo Franchi; Søren Johansen
  11. Uncertain identification By Raffaella Giacomini; Toru Kitagawa; Alessio Volpicella
  12. An econometric model of network formation with degree heterogeneity By Bryan S. Graham
  13. Mutual Point-winning Probabilities (MPW): a New Performance Measure for Table Tennis By Christophe Ley; Yves Dominicy
  14. A quantile correlated random coefficients panel data model By Bryan S. Graham; Jinyong Hahn; Alexandre Poirier; James L. Powell
  15. Double machine learning for treatment and causal parameters By Victor Chernozhukov; Denis Chetverikov; Mert Demirer; Esther Duflo; Christian Hansen; Whitney K. Newey
  16. Valid post-selection and post-regularization inference: An elementary, general approach By Victor Chernozhukov; Christian Hansen; Martin Spindler
  17. Fixed-effect regressions on network data By Koen Jochmans; Martin Weidner
  18. On cross-validated Lasso By Denis Chetverikov; Zhipeng Liao
  19. Asymptotic properties of a Nadaraya-Watson type estimator for regression functions of in?finite order By Seok Young Hong; Oliver Linton
  20. A nonlinear principal component decomposition By Florian Gunsilius; Susanne M. Schennach
  21. Anti-concentration and honest, adaptive confidence bands By Victor Chernozhukov; Denis Chetverikov; Kengo Kato
  22. Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings By Victor Chernozhukov; Denis Chetverikov; Kengo Kato
  23. Forecasting electricity prices through robust nonlinear models By Luigi Grossi; Fany Nan
  24. Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression By Xiaohong Chen; Timothy M. Christensen
  25. The influence function of semiparametric estimators By Hidehiko Ichimura; Whitney K. Newey
  26. Fat tails and spurious estimation of consumption-based asset pricing models By Toda, Alexis Akira; Walsh, Kieran James
  27. Instrumental variables estimation for nonparametric models By Whitney K. Newey; James L. Powell

  1. By: Joel L. Horowitz (Institute for Fiscal Studies and Northwestern University); Anand Krishnamurthy (Institute for Fiscal Studies and Cornerstone Research)
    Abstract: This paper is concerned with inference about the conditional quantile function in a nonparametric quantile regression model. Any method for constructing a confidence interval or band for this function must deal with the asymptotic bias of nonparametric estimators of the function. In estimation methods such as local polynomial estimation, this is usually done through undersmoothing or explicit bias correction. The latter usually requires oversmoothing. However, there are no satisfactory empirical methods for selecting bandwidths that under- or oversmooth. This paper extends the bootstrap method of Hall and Horowitz (2013) for conditional mean functions to conditional quantile functions. The paper also shows how the bootstrap method can be used to obtain uniform confidence bands. The bootstrap method uses only bandwidths that are selected by standard methods such as cross validation and plug-in. It does not use under- or oversmoothing. The results of Monte Carlo experiments illustrate the numerical performance of the bootstrap method.
    Keywords: Quantile regression; smoothing; confidence band; bootstrap
    Date: 2017–01–12
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:01/17&r=ecm
  2. By: Jose Diogo Barbosa (Institute for Fiscal Studies); Marcelo Moreira (Institute for Fiscal Studies and Fundação Getúlio Vargas)
    Abstract: Lancaster (2002) proposes an estimator for the dynamic panel data model with homoskedastic errors and zero initial conditions. In this paper, we show this estimator is invariant to orthogonal transformations, but is inefficient because it ignores additional information available in the data. The zero initial condition is trivially satis fied by subtracting initial observations from the data. We show that di fferencing out the data further erodes efficiency compared to drawing inference conditional on the rst observations. Finally, we compare the conditional method with standard random eff ects approaches for unobserved data. Standard approaches implicitly rely on normal approximations, which may not be reliable when unobserved data is very skewed with some mass at zero values. For example, panel data on fi rms naturally depend on the fi rst period in which the fi rm enters on a new state. It seems unreasonable then to assume that the process determining unobserved data is known or stationary. We can instead make inference on structural parameters by conditioning on the initial observations.
    Keywords: Autoregressive, Panel Data, Invariance, Eciency.
    JEL: C12 C30
    Date: 2017–01–20
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:04/17&r=ecm
  3. By: Nogales Martin, Fco. Javier; Lafit, Ginette
    Abstract: Robust estimation of Gaussian Graphical models in the high-dimensional setting is becoming increasingly important since large and real data may contain outlying observations. These outliers can lead to drastically wrong inference on the intrinsic graph structure. Several procedures apply univariate transformations to make the data Gaussian distributed. However, these transformations do not work well under the presence of structural bivariate outliers. We propose a robust precision matrix estimator under the cellwise contamination mechanism that is robust against structural bivariate outliers. This estimator exploits robust pairwise weighted correlation coefficient estimates, where the weights are computed by the Mahalanobis distance with respect to an affine equivariant robust correlation coefficient estimator. We show that the convergence rate of the proposed estimator is the same as the correlation coefficient used to compute the Mahalanobis distance. We conduct numerical simulation under different contamination settings to compare the graph recovery performance of different robust estimators. Finally, the proposed method is then applied to the classification of tumors using gene expression data. We show that our procedure can effectively recover the true graph under cellwise data contamination.
    Keywords: Winsorization; Robust correlation estimation; Cellwise contamination; Gaussian graphical models
    Date: 2017–05–01
    URL: http://d.repec.org/n?u=RePEc:cte:wsrepe:24534&r=ecm
  4. By: Alexandre Belloni (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Ivan Fernandez-Val (Institute for Fiscal Studies and Boston University)
    Abstract: Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-speci fic coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the QR-series coefficient process, namely we obtain uniform strong approximations to the QR-series coefficient process by conditionally pivotal and Gaussian processes. Based on these two strong approximations, or couplings, we develop four resampling methods (pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be used for inference on the entire QR-series coefficient function. We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Speci fically, we obtain uniform rates of convergence and show how to use the four resampling methods mentioned above for inference on the functionals. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and the covariate value, and covering the pointwise case as a by-product. We demonstrate the practical utility of these results with an empirical example, where we estimate the price elasticity function and test the Slutsky condition of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.
    Date: 2016–08–30
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:46/16&r=ecm
  5. By: Friedrich, Marina (QE / Econometrics); Smeekes, Stephan (QE / Econometrics); Urbain, Jean-Pierre
    Abstract: In this paper a modified wild bootstrap method is presented to construct pointwise confidence intervals around a nonparametric deterministic trend model. We derive the asymptotic distribution of a nonparametric kernel estimator of the trend function under general conditions, which allow for serial correlation and heteroskedasticity. Asymptotic validity of the bootstrap method is established and it is shown to work well in finite samples in an extensive simulation study. The bootstrap method has the potential of providing simultaneous confidence bands for the same models along the lines of Bühlmann (1998) and can be applied without further adjustments to missing data. We illustrate this by applying the proposed method to a time series of atmospheric ethane which can be used as an indicator of atmospheric pollution and transport.
    Keywords: autoregressive wild bootstrap, nonparametric estimation, time series, simultaneous confidence bands, trend estimation
    JEL: C14 C22
    Date: 2017–05–01
    URL: http://d.repec.org/n?u=RePEc:unm:umagsb:2017010&r=ecm
  6. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Kengo Kato (Institute for Fiscal Studies)
    Abstract: This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by p, is possibly much larger than the sample size n. There are variety of economic applications where the problem of testing many moment inequalities appears; a notable example is a market structure model of Ciliberto and Tamer (2009) where p = 2m+1 with m being the number of fi rms. We consider the test statistic given by the maximum of p Studentized (or t-type) statistics, and analyze various ways to compute critical values for the test statistic. Speci cally, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier and empirical bootstraps, and (iii) two-step and three-step variants of (i) and (ii) by incorporating selection of uninformative inequalities that are far from being binding and novel selection of weakly informative inequalities that are potentially binding but do not provide fi rst order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with error in size decreasing polynomially in n while allowing for p being much larger than n; indeed p can be of order exp(nc) for some c > 0. Importantly, all these results hold without any restriction on correlation structure between p Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, when p grows with n, we show that all of our tests are (minimax) optimal in the sense that they are uniformly consistent against alternatives whose "distance" from the null is larger than the threshold (2(log p)=n)1/2, while any test can only have trivial power in the worst case when the distance is smaller than the threshold. Finally, we show validity of a test based on block multiplier bootstrap in the case of dependent data under some general mixing conditions.
    Keywords: Many moment inequalities, moderate deviation, multiplier and empirical bootstrap, non-asymptotic bound, self-normalized sum.
    Date: 2016–08–26
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:42/16&r=ecm
  7. By: Stavros Stavroyiannis
    Abstract: The majority of stylized facts of financial time series and several Value-at-Risk measures are modeled via univariate or multivariate GARCH processes. It is not rare that advanced GARCH models fail to converge for computational reasons, and a usual parsimonious approach is the GJR-GARCH model. There is a disagreement in the literature and the specialized econometric software, on which constraints should be used for the parameters, introducing indirectly the distinction between asymmetry and leverage. We show that the approach used by various software packages is not consistent with the Nelson-Cao inequality constraints. Implementing Monte Carlo simulations, despite of the results being empirically correct, the estimated parameters are not theoretically coherent with the Nelson-Cao constraints for ensuring positivity of conditional variances. On the other hand ruling out the leverage hypothesis, the asymmetry term in the GJR model can take negative values when typical constraints like the condition for the existence of the second and fourth moments, are imposed.
    Date: 2017–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1705.00535&r=ecm
  8. By: Joana Estevens; Paulo Rocha; Joao Boto; Pedro Lind
    Abstract: We model non-stationary volume-price distributions with a log-normal distribution and collect the time series of its two parameters. The time series of the two parameters are shown to be stationary and Markov-like and consequently can be modelled with Langevin equations, which are derived directly from their series of values. Having the evolution equations of the log-normal parameters, we reconstruct the statistics of the first moments of volume-price distributions which fit well the empirical data. Finally, the proposed framework is general enough to study other non-stationary stochastic variables in other research fields, namely biology, medicine and geology.
    Date: 2017–04
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1705.01145&r=ecm
  9. By: Federico A. Bugni (Institute for Fiscal Studies and Duke University); Joel L. Horowitz (Institute for Fiscal Studies and Northwestern University)
    Abstract: Economic data are often generated by stochastic processes that take place in continuous time, though observations may occur only at discrete times. For example, electricity and gas consumption take place in continuous time. Data generated by a continuous time stochastic process are called functional data. This paper is concerned with comparing two or more stochastic processes that generate functional data. The data may be produced by a randomized experiment in which there are multiple treatments. The paper presents a test of the hypothesis that the same stochastic process generates all the functional data. In contrast to existing methods, the test described here applies to both functional data and multiple treatments. The test is presented as a permutation test, which ensures that in a finite sample, the true and nominal probabilities of rejecting a correct null hypothesis are equal. The paper also presents the asymptotic distribution of the test statistic under alternative hypotheses. The results of Monte Carlo experiments and an application to an experiment on billing and pricing of natural gas illustrate the usefulness of the test.
    Keywords: Functional data, permutation test, randomized experiment, hypothesis test
    JEL: C12 C14
    Date: 2017–04–18
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:17/17&r=ecm
  10. By: Massimo Franchi (Sapienza University of Rome); Søren Johansen (Department of Economics, University of Copenhagen)
    Abstract: It is well known that inference on the cointegrating relations in a vector autoregression (CVAR) is difficult in the presence of a near unit root. The test for a given cointegration vector can have rejection probabilities under the null, which vary from the nominal size to more than 90%. This paper formulates a CVAR model allowing for many near unit roots and analyses the asymptotic properties of the Gaussian maximum likelihood estimator. Then a critical value adjustment suggested by McCloskey for the test on the cointegrating relations is implemented, and it is found by simulation that it eliminates size distortions and has reasonable power for moderate values of the near unit root parameter. The fi?ndings are illustrated with an analysis of a number of different bivariate DGPs.
    Keywords: Long-run inference, test on cointegrating relations, likelihood inference, vector autoregressive model, near unit roots, Bonferroni type adjusted quantiles.
    JEL: C32
    Date: 2017–04–24
    URL: http://d.repec.org/n?u=RePEc:kud:kuiedp:1709&r=ecm
  11. By: Raffaella Giacomini (Institute for Fiscal Studies and cemmap and UCL); Toru Kitagawa (Institute for Fiscal Studies and cemmap and University College London); Alessio Volpicella (Institute for Fiscal Studies and Queen Mary University of London)
    Abstract: Uncertainty about the choice of identifying assumptions is common in causal studies, but is often ignored in empirical practice. This paper considers uncertainty over models that impose different identifying assumptions, which, in general, leads to a mix of point- and set-identified models. We propose performing inference in the presence of such uncertainty by generalizing Bayesian model averaging. The method considers multiple posteriors for the set-identified models and combines them with a single posterior for models that are either point-identified or that impose non-dogmatic assumptions. The output is a set of posteriors (post-averaging ambiguous belief) that are mixtures of the single posterior and any element of the class of multiple posteriors, with weights equal to the posterior model probabilities. We suggest reporting the range of posterior means and the associated credible region in practice, and provide a simple algorithm to compute them. We establish that the prior model probabilities are updated when the models are "distinguishable" and/or they specify different priors for reduced-form parameters, and characterize the asymptotic behavior of the posterior model probabilities. The method provides a formal framework for conducting sensitivity analysis of empirical findings to the choice of identifying assumptions. In a standard monetary model, for example, we show that, in order to support a negative response of output to a contractionary monetary policy shock, one would need to attach a prior probability greater than 0.32 to the validity of the assumption that prices do not react contemporaneously to such a shock. The method is general and allows for dogmatic and non-dogmatic identifying assumptions, multiple point-identified models, multiple set-identified models, and nested or non-nested models.
    Keywords: Partial Identification, Sensitivity Analysis, Model Averaging, Bayesian Robustness, Ambiguity.
    Date: 2017–04–18
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:18/17&r=ecm
  12. By: Bryan S. Graham (Institute for Fiscal Studies and University of California, Berkeley)
    Abstract: I introduce a model of undirected dyadic link formation which allows for assortative matching on observed agent characteristics (homophily) as well as unrestricted agent level heterogeneity in link surplus (degree heterogeneity). Like in fixed effects panel data analyses, the joint distribution of observed and unobserved agent-level characteristics is left unrestricted. Two estimators for the (common) homophily parameter, ß0 , are developed and their properties studied under an asymptotic sequence involving a single network growing large. The first, tetrad logit (TL), estimator conditions on a sufficient statistic for the degree heterogeneity. The second, joint maximum likelihood (JML), estimator treats the degree heterogeneity {Ai0}Ni=1 as additional (incidental) parameters to be estimated. The TL estimate is consistent under both sparse and dense graph sequences, whereas consistency of the JML estimate is shown only under dense graph sequences. Supplement for CWP 08/17
    Keywords: Network formation, homophily, degree heterogeneity, scale-free networks, incidental parameters, asymptotic bias, fixed effects, conditional likelihood, dependent U-Process
    JEL: C31 C33 C35
    Date: 2017–02–10
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:08/17&r=ecm
  13. By: Christophe Ley; Yves Dominicy
    Abstract: We propose a new performance measure for table tennis players: the mutual point-winning probabilities (MPW) as server and receiver. The MPWs quantify a player's chances to win a point against a given opponent, and hence nicely complement the classical match statistics history between two players. We shall describe the MPWs, explain the statistics underpinning their calculation, and show via a Monte Carlo simulation study that our estimation procedure works well. As an illustration of the MPWs' versatile use, we use it as an alternative ranking method in two round-robin tournaments of ten respectively eleven table tennis players that we have ourselves organized.
    Keywords: bradley-terry model; maximum likelihood estimation; round-robin tournament; sport performance analysis; strength model
    Date: 2017–05
    URL: http://d.repec.org/n?u=RePEc:eca:wpaper:2013/250695&r=ecm
  14. By: Bryan S. Graham (Institute for Fiscal Studies and University of California, Berkeley); Jinyong Hahn (Institute for Fiscal Studies); Alexandre Poirier (Institute for Fiscal Studies); James L. Powell (Institute for Fiscal Studies and University of California, Berkeley)
    Abstract: We propose a generalization of the linear quantile regression model to accommodate possibilities afforded by panel data. Specifically, we extend the correlated random coefficients representation of linear quantile regression (e.g., Koenker, 2005; Section 2.6). We show that panel data allows the econometrician to (i) introduce dependence between the regressors and the random coefficients and (ii) weaken the assumption of comonotonicity across them (i.e., to enrich the structure of allowable dependence between different coefficients). We adopt a “fixed effects” approach, leaving any dependence between the regressors and the random coefficients unmodelled. We motivate different notions of quantile partial effects in our model and study their identification. For the case of discretely-valued covariates we present analog estimators and characterize their large sample properties. When the number of time periods (T) exceeds the number of random coefficients (P), identification is regular, and our estimates are v N - consistent. When T = P, our identification results make special use of the subpopulation of stayers - units whose regressor values change little over time - in a way which builds on the approach of Graham and Powell (2012). In this just-identified case we study asymptotic sequences which allow the frequency of stayers in the population to shrink with the sample size. One purpose of these “discrete bandwidth asymptotics” is to approximate settings where covariates are continuously-valued and, as such, there is only an infinitesimal fraction of exact stayers, while keeping the convenience of an analysis based on discrete covariates. When the mass of stayers shrinks with N, identification is irregular and our estimates converge at a slower than v N rate, but continue to have limiting normal distributions. We apply our methods to study the effects of collective bargaining coverage on earnings using the National Longitudinal Survey of Youth 1979 (NLSY79). Consistent with prior work (e.g., Chamberlain, 1982; Vella and Verbeek, 1998), we find that using panel data to control for unobserved worker heteroegeneity results in sharply lower estimates of union wage premia. We estimate a median union wage premium of about 9 percent, but with, in a more novel finding, substantial heterogeneity across workers. The 0.1 quantile of union effects is insignificantly different from zero, whereas the 0.9 quantile effect is of over 30 percent. Our empirical analysis further suggests that, on net, unions have an equalizing effect on the distribution of wages. Supplement for CWP34/16
    Keywords: Panel Data, Quantile Regression, Fixed Effects, Difference-in-Differences, Union Wage Premium, Discrete Bandwidth Asymptotics, Decomposition Analysis
    JEL: C14 C21 C23 J31 J51
    Date: 2016–08–25
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:34/16&r=ecm
  15. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Mert Demirer (Institute for Fiscal Studies); Esther Duflo (Institute for Fiscal Studies); Christian Hansen (Institute for Fiscal Studies and Chicago GSB); Whitney K. Newey (Institute for Fiscal Studies and MIT)
    Abstract: Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coffiecients, average treatment e ffects, average lifts, and demand or supply elasticities. In fact, estimators of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly. For example, the resulting estimators may formally have inferior rates of convergence with respect to the sample size n caused by regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Speci ficially, we can form an efficient score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The efficient score may then be used to build an efficient estimator of the target parameter which typically will converge at the fastest possible 1/v n rate and be approximately unbiased and normal, allowing simple construction of valid con fidence intervals for parameters of interest. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. Such double ML estimators achieve the fastest rates of convergence and exhibit robust good behavior with respect to a broader class of probability distributions than naive "single" ML estimators. In order to avoid overfi tting, following [3], our construction also makes use of the K-fold sample splitting, which we call cross- fitting. The use of sample splitting allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forests, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregates of these methods (e.g. a hybrid of a random forest and lasso). We illustrate the application of the general theory through application to the leading cases of estimation and inference on the main parameter in a partially linear regression model and estimation and inference on average treatment eff ects and average treatment e ffects on the treated under conditional random assignment of the treatment. These applications cover randomized control trials as a special case. We then use the methods in an empirical application which estimates the e ffect of 401(k) eligibility on accumulated financial assets.
    Keywords: Neyman, orthogonalization, cross-fi t, double machine learning, debiased machine learning, orthogonal score, efficient score, post-machine-learning and post-regularization inference, random forest, lasso, deep learning, neural nets, boosted trees, efficiency, optimality.
    Date: 2016–09–27
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:49/16&r=ecm
  16. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Christian Hansen (Institute for Fiscal Studies and Chicago GSB); Martin Spindler (Institute for Fiscal Studies)
    Abstract: Here we present an expository, general analysis of valid post-selection or post-regularization inference about a low-dimensional target parameter in the presence of a very high-dimensional nuisance parameter which is estimated using selection or regularization methods. Our analysis provides a set of high-level conditions under which inference for the low-dimensional parameter based on testing or point estimation methods will be regular despite selection or regularization biases occurring in estimation of the high-dimensional nuisance parameter. The results may be applied to establish uniform validity of post-selection or post-regularization inference procedures for low-dimensional target parameters over large classes of models. The high-level conditions allow one to clearly see the types of structure needed for achieving valid post-regularization inference and encompass many existing results. A key element of the structure we employ and discuss in detail is the use of orthogonal or "immunized" estimating equations that are locally insensitive to small mistakes in estimation of the high-dimensional nuisance parameter. As an illustration, we use the high-level conditions to provide readily veri able sucient conditions for a class of ane-quadratic models that include the usual linear model and linear instrumental variables model as special cases. As a further application and illustration, we use these results to provide an analysis of post-selection inference in a linear instrumental variables model with many regressors and many instruments. We conclude with a review of other developments in post-selection inference and note that many of the developments can be viewed as special cases of the general encompassing framework of orthogonal estimating equations provided in this paper.
    Keywords: Neyman, orthogonalization, C( ) statistics, optimal instrument, optimal score, optimal moment, post-selection and post-regularization inference, eciency, optimality
    Date: 2016–08–25
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:36/16&r=ecm
  17. By: Koen Jochmans (Institute for Fiscal Studies and Sciences Po); Martin Weidner (Institute for Fiscal Studies and cemmap and UCL)
    Abstract: This paper studies inference on fixed eff ects in a linear regression model estimated from network data. We derive bounds on the variance of the fixed-e ffect estimator that uncover the importance of the smallest non-zero eigenvalue of the (normalized) Laplacian of the network and of the degree structure of the network. The eigenvalue is a measure of connectivity, with smaller values indicating less-connected networks. These bounds yield conditions for consistent estimation and convergence rates, and allow to evaluate the accuracy of first-order approximations to the variance of the fixed-eff ect estimator. Supplement for CWP32/16
    Keywords: fixed effects, graph, Laplacian, network data, variance bound
    JEL: C23 C55
    Date: 2016–08–08
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:32/16&r=ecm
  18. By: Denis Chetverikov (Institute for Fiscal Studies and UCLA); Zhipeng Liao (Institute for Fiscal Studies)
    Abstract: In this paper, we derive a rate of convergence of the Lasso estimator when the penalty parameter ? for the estimator is chosen using K-fold cross-validation; in particular, we show that in the model with Gaussian noise and under fairly general assumptions on the candidate set of values of ?, the prediction norm of the estimation error of the cross-validated Lasso estimator is with high probability bounded from above up-to a constant by (s log p/n)1/2 (log7/8n) as long as p log n/n = o(1) and some other mild regularity conditions are satisfi ed where n is the sample size of available data, p is the number of covariates, and s is the number of non-zero coefficients in the model. Thus, the cross-validated Lasso estimator achieves the fastest possible rate of convergence up-to the logarithmic factor log7/8 n. In addition, we derive a sparsity bound for the cross-validated Lasso estimator; in particular, we show that under the same conditions as above, the number of non-zero coefficients of the estimator is with high probability bounded from above up-to a constant by s log5 n. Finally, we show that our proof technique generates non-trivial bounds on the prediction norm of the estimation error of the cross-validated Lasso estimator even if p is much larger than n and the assumption of Gaussian noise fails; in particular, the prediction norm of the estimation error is with high-probability bounded from above up-to a constant by (s log2(pn) / n)1/4 under mild regularity conditions.
    Keywords: Cross-Validated Lasso
    Date: 2016–09–27
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:47/16&r=ecm
  19. By: Seok Young Hong (Institute for Fiscal Studies); Oliver Linton (Institute for Fiscal Studies and University of Cambridge)
    Abstract: We consider a class of nonparametric time series regression models in which the regressor takes values in a sequence space and the data are stationary and weakly dependent. We propose an infi?nite dimensional Nadaraya-Watson type estimator with a bandwidth sequence that shrinks the e¤ects of long lags. We investigate its asymptotic properties in detail under both static and dynamic regressions contexts. First we show pointwise consistency of the estimator under a set of mild regularity conditions. We establish a CLT for the estimator at a point under stronger conditions as well as for a feasibly studentized version of the estimator, thereby allowing pointwise inference to be conducted. We establish the uniform consistency over a compact set of logarithmically increasing dimension. We specify the explicit rates of convergence in terms of the Lambert W function, and show that the optimal rate that balances the squared bias and variance is of logarithmic order, the precise rate depending on the smoothness of the regression function and the dependence of the data in a non-trivial way.
    Keywords: Functional Regression; Nadaraya-Watson estimator; Curse of in?nite dimen- sionality; Near Epoch Dependence.
    Date: 2016–11–23
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:53/16&r=ecm
  20. By: Florian Gunsilius (Institute for Fiscal Studies); Susanne M. Schennach (Institute for Fiscal Studies and Brown University)
    Abstract: The idea of summarizing the information contained in a large number of variables by a small number of "factors" or "principal components" has been widely adopted in economics and statistics. This paper introduces a generalization of the widely used principal component analysis (PCA) to nonlinear settings, thus providing a new tool for dimension reduction and exploratory data analysis or representation. The distinguishing features of the method include (i) the ability to always deliver truly independent factors (as opposed to the merely uncorrelated factors of PCA); (ii) the reliance on the theory of optimal transport and Brenier maps to obtain a robust and ef?cient computational algorithm and (iii) the use of a new multivariate additive entropy decomposition to determine the principal nonlinear components that capture most of the information content of the data.
    Keywords: Principal Component Analysis, Nonlinear Principal Components, Factor Models
    Date: 2017–03–28
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:16/17&r=ecm
  21. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Kengo Kato (Institute for Fiscal Studies)
    Abstract: Modern construction of uniform con?dence bands for nonpara-metric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Giné and Nickl (2010). This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier boot-strap procedure for constructing honest con?dence bands for non-parametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our ap-proach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical pro-cess does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the con?dence bands. Finally, of independent inter-est is our introduction of a new, practical version of Lepski’s method, which computes the optimal, non-conservative resolution levels via a Gaussian multiplier bootstrap method.
    Date: 2016–08–26
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:43/16&r=ecm
  22. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Denis Chetverikov (Institute for Fiscal Studies and UCLA); Kengo Kato (Institute for Fiscal Studies)
    Abstract: We derive strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes. The bounds of these approximations are non-asymptotic, which allows us to work with classes of functions whose complexity increases with the sample size. The construction of couplings is not of the Hungarian type and is instead based on the Slepian-Stein methods and Gaussian comparison inequalities. The increasing complexity of classes of functions and non-centrality of the processes make the results useful for applications in modern nonparametric statistics (Giné and Nickl [14]), in particular allowing us to study the power properties of nonparametric tests using Gaussian and bootstrap approximations.
    Keywords: coupling, empirical process, multiplier bootstrap process, empirical bootstrap process, Gaussian approximation, supremum
    Date: 2016–08–25
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:38/16&r=ecm
  23. By: Luigi Grossi (Department of Economics (University of Verona)); Fany Nan (Joint Research Centre of EU (Ispra))
    Abstract: In this paper a robust approach to modelling electricity spot prices is introduced. Differently from what has been recently done in the literature on electricity price forecasting, where the attention has been mainly drawn by the prediction of spikes, the focus of this contribution is on the robust estimation of nonlinear SETARX models. In this way, parameters estimates are not, or very lightly, influenced by the presence of extreme observations and the large majority of prices, which are not spikes, could be better forecasted. A Monte Carlo study is carried out in order to select the best weighting function for GM-estimators of SETAR processes. A robust procedure to select and estimate nonlinear processes for electricity prices is introduced, including robust tests for stationarity and nonlinearity and robust information criteria. The application of the procedure to the Italian electricity market reveals the forecasting superiority of the robust GM-estimator based on the polynomial weighting function on the non-robust Least Squares estimator. Finally, the introduction of external regressors in the robust estimation of SETARX processes contributes to the improvement of the forecasting ability of the model.
    Keywords: Electricity price, Nonlinear time series, Price forecasting, Robust GM-stimator, Spikes, Threshold models
    Date: 2017–05
    URL: http://d.repec.org/n?u=RePEc:ver:wpaper:06/2017&r=ecm
  24. By: Xiaohong Chen (Institute for Fiscal Studies and Yale University); Timothy M. Christensen (Institute for Fiscal Studies)
    Abstract: This paper makes several important contributions to the literature about nonparametric instrumental variables (NPIV) estimation and inference on a structural function h0 and its functionals. First, we derive sup-norm convergence rates for computationally simple sieve NPIV (series 2SLS) estimators of h0 and its derivatives. Second, we derive a lower bound that describes the best possible (minimax) sup-norm rates of estimating h0 and its derivatives, and show that the sieve NPIV estimator can attain the minimax rates when h0 is approximated via a spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal root-mean-squared rates for severely ill-posed problems, and are only a logarithmic factor slower than the optimal root-mean-squared rates for mildly ill-posed problems. Third, we use our sup-norm rates to establish the uniform Gaussian process strong approximations and the score bootstrap uniform con dence bands (UCBs) for collections of nonlinear functionals of h0 under primitive conditions, allowing for mildly and severely ill-posed problems. Fourth, as applications, we obtain the first asymptotic pointwise and uniform inference results for plug-in sieve t-statistics of exact consumer surplus (CS) and deadweight loss (DL) welfare functionals under low-level conditions when demand is estimated via sieve NPIV. Empiricists could read our real data application of UCBs for exact CS and DL functionals of gasoline demand that reveals interesting patterns and is applicable to other markets.
    Keywords: Series 2SLS; Optimal sup-norm convergence rates; Uniform Gaussian process strong approximation; Score bootstrap uniform con dence bands; Nonlinear welfare functionals; Nonparametric demand with endogeneity.
    JEL: C13 C14 C36
    Date: 2017–02–13
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:09/17&r=ecm
  25. By: Hidehiko Ichimura (Institute for Fiscal Studies and University of Tokyo); Whitney K. Newey (Institute for Fiscal Studies and MIT)
    Abstract: There are many economic parameters that depend on nonparametric first steps. Examples include games, dynamic discrete choice, average consumer surplus, and treatment effects. Often estimators of these parameters are asymptotically equivalent to a sample average of an object referred to as the influence function. The influence function is useful in formulating regularity conditions for asymptotic normality, for bias reduction, in efficiency comparisons, and for analyzing robustness. We show that the influence function of a semiparametric estimator is the limit of a Gateaux derivative with respect to a smooth deviation as the deviation approaches a point mass. This result generalizes the classic Von Mises (1947) and Hampel (1974) calculation to apply to estimators that depend on smooth nonparametic first steps. We characterize the influence function of M and GMM-estimators.We apply the Gateaux derivative to derive the influence function with a first step nonparametric two stage least squares estimator based on orthogonality conditions. We also use the influence function to analyze high level and primitive regularity conditions for asymptotic normality. We give primitive regularity conditions for linear functionals of series regression that are the weakest known, except for a log term, when the regression function is smooth enough.
    Keywords: Influence function, semiparametric estimation, NPIV
    JEL: C13 C14 C20 C26 C36
    Date: 2017–01–26
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:06/17&r=ecm
  26. By: Toda, Alexis Akira; Walsh, Kieran James
    Keywords: Social and Behavioral Sciences
    Date: 2017–01–01
    URL: http://d.repec.org/n?u=RePEc:cdl:ucsdec:qt8df3x7gw&r=ecm
  27. By: Whitney K. Newey (Institute for Fiscal Studies and MIT); James L. Powell (Institute for Fiscal Studies and University of California, Berkeley)
    Abstract: Whitney Newey and James Powell, founding CeMMAP Fellows, wrote an influential paper on instrumental variable estimation of an additive error non-parametrically specified structural equation, presented at the December 1988 North American Winter Meetings of the Econometric Society. A version circulated in 1989 but the results were not published until 14 years later: “Instrumental Variable Estimation of Nonparametric Models”, Econometrica, 71(5), (September 2003), 1565-1578. The 1989 working paper is often referred to, but hard to find. For the sake of completeness CeMMAP is pleased to publish the 1989 paper.
    Keywords: Instrumental variables estimation for nonparametric models
    Date: 2017–02–03
    URL: http://d.repec.org/n?u=RePEc:ifs:cemmap:07/17&r=ecm

This nep-ecm issue is ©2017 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.