
on Econometrics 
By:  Aknouche, Abdelhakim; Francq, Christian 
Abstract:  General parametric forms are assumed for the conditional mean λ_{t}(θ₀) and variance υ_{t}(ξ₀) of a time series. These conditional moments can for instance be derived from count time series, Autoregressive Conditional Duration (ACD) or Generalized Autoregressive Score (GAS) models. In this paper, our aim is to estimate the conditional mean parameter θ₀, trying to be as agnostic as possible about the conditional distribution of the observations. QuasiMaximum Likelihood Estimators (QMLEs) based on the linear exponential family fulfill this goal, but they may be inefficient and have complicated asymptotic distributions when θ₀ contains zero coefficients. We thus study alternative weighted least square estimators (WLSEs), which enjoy the same consistency property as the QMLEs when the conditional distribution is misspecified, but have simpler asymptotic distributions when components of θ₀ are null and gain in efficiency when υ_{t} is well specified. We compare the asymptotic properties of the QMLEs and WLSEs, and determine a data driven strategy for finding an asymptotically optimal WLSE. Simulation experiments and illustrations on realized volatility forecasting are presented. 
Keywords:  Autoregressive Conditional Duration model; Exponential, Poisson, Negative Binomial QMLE; INtegervalued AR; INtegervalued GARCH; Weighted LSE. 
JEL:  C13 C14 C18 C25 C52 C53 C58 
Date:  2019–12–01 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:97382&r=all 
By:  Alexandre Belloni (Institute for Fiscal Studies); Federico A. Bugni (Institute for Fiscal Studies and Duke University); Victor Chernozhukov (Institute for Fiscal Studies and MIT) 
Abstract:  This paper considers inference for a function of a parameter vector in a partially identi?ed model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identi?ed parameter vector associated with a treatment e?ect or a policy variable of interest. Our inference method compares a MinMax test statistic (minimum over parameters satisfying H0 and maximum over moment inequalities) against critical values that are based on bootstrap approximations or analytical bounds. We show that this method controls asymptotic size uniformly over a large class of data generating processes despite the partially identi?ed many moment inequality setting. The ?nite sample analysis allows us to obtain explicit rates of convergence on the size control. Our results are based on combining nonasymptotic approximations and new highdimensional central limit theorems for the MinMax of the components of random matrices, which may be of independent interest. Unlike the previous literature on functional inference in partially identi?ed models, our results do not rely on weak convergence results based on Donsker’s class assumptions and, in fact, our test statistic may not even converge in distribution. Our bootstrap approximation requires the choice of a tuning parameter sequence that can avoid the excessive concentration of our test statistic. To this end, we propose an asymptotically valid datadriven method to select this tuning parameter sequence. This method generalizes the selection of tuning parameter sequences to problems outside the Donsker’s class assumptions and may also be of independent interest. Our procedures based on selfnormalized moderate deviation bounds are relatively more conservative but easier to implement. 
Date:  2019–06–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:28/19&r=all 
By:  Zijian Zeng; Meng Li 
Abstract:  We develop a Bayesian median autoregressive (BayesMAR) model for time series forecasting. The proposed method utilizes timevarying quantile regression at the median, favorably inheriting the robustness of median regression in contrast to the widely used meanbased methods. Motivated by a working Laplace likelihood approach in Bayesian quantile regression, BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting. We estimate model parameters by Markov chain Monte Carlo. Bayesian model averaging (BMA) is used to account for model uncertainty including the uncertainty in the autoregressive order, in addition to a Bayesian model selection approach. The proposed methods are illustrated using simulation and real data applications. An application to U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and often superior predictive performances than the selected meanbased alternatives under various loss functions. The proposed methods are generic and can be used to complement a rich class of methods that builds on the AR models. 
Date:  2020–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2001.01116&r=all 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Kaspar Wüthrich (Institute for Fiscal Studies and UCSD); Yinchu Zhu (Institute for Fiscal Studies) 
Abstract:  This paper studies inference on treatment effects in aggregate panel data settings with a single treated unit and many control units. We propose new methods for making inference on average treatment effects in settings where both the number of pretreatment and the number of posttreatment periods are large. We use linear models to approximate the counterfactual mean outcomes in the absence of the treatment. The counterfactuals are estimated using constrained Lasso, an essentially tuning free regression approach that nests differenceindifferences and synthetic control as special cases. We propose a Kfold crossfitting procedure to remove the bias induced by regularization. To avoid the estimation of the long run variance, we construct a selfnormalized tstatistic. The test statistic has an asymptotically pivotal distribution (a student tdistribution with K  1 degrees of freedom), which makes our procedure very easy to implement. Our approach has several theoretical advantages. First, it does not rely on any sparsity assumptions. Second, it is fully robust against misspecification of the linear model. Third, it is more efficient than differenceinmeans and differenceindifferences estimators. The proposed method demonstrates an excellent performance in simulation experiments, and is taken to a data application, where we reevaluate the economic consequences of terrorism. 
Date:  2019–06–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:32/19&r=all 
By:  Raffaella Giacomini (Institute for Fiscal Studies and cemmap and UCL); Toru Kitagawa (Institute for Fiscal Studies and cemmap and University College London); Matthew Read (Institute for Fiscal Studies) 
Abstract:  We develop methods for robust Bayesian inference in structural vector autoregressions (SVARs) where the impulse responses or forecast error variance decompositions of interest are setidenti?ed using external instruments (or ‘proxy SVARs’). Existing Bayesian approaches to inference in proxy SVARs require researchers to specify a single prior over the model’s parameters. When parameters are setidenti?ed, a component of the prior is never updated by the data. Giacomini and Kitagawa (2018) propose a method for robust Bayesian inference in setidentifed models that delivers inference about the identi?ed set for the parameter of interest. We extend this approach to proxy SVARs, which allows researchers to relax potentially controversial pointidentifying restrictions without having to specify an unrevisable prior. We also explore the e?ect of instrument strength on posterior inference. We illustrate our approach by revisiting Mertens and Ravn (2013) and relaxing the assumption that they impose to obtain point identi?cation. 
Date:  2019–07–23 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:38/19&r=all 
By:  Raffaella Giacomini (Institute for Fiscal Studies and cemmap and UCL); Toru Kitagawa (Institute for Fiscal Studies and cemmap and University College London); Harald Uhlig (Institute for Fiscal Studies and University of Chicago) 
Abstract:  To perform Bayesian analysis of a partially identified structural model, two distinct approaches exist: standard Bayesian inference, which assumes a single prior for the structural parameters, including the nonidentified ones; and multipleprior Bayesian inference, which assumes full ambiguity for the nonidentified parameters. The prior inputs considered by these two extreme approaches can often be a poor representation of the researcher’s prior knowledge in practice. This paper fills the large gap between the two approaches by proposing a multipleprior Bayesian analysis that can simultaneously incorporate a probabilistic belief for the nonidentified parameters and a concern about misspecification of this belief. Our proposal introduces a benchmark prior representing the researcher’s partially credible probabilistic belief for nonidentified parameters, and a set of priors formed in its KullbackLeibler (KL) neighborhood, whose radius controls the “degree of ambiguity.” We obtain point estimators and optimal decisions involving nonidentified parameters by solving a conditional gammaminimax problem, which we show is analytically tractable and easy to solve numerically. We derive the remarkably simple analytical properties of the proposed procedure in the limiting situations where the radius of the KL neighborhood and/or the sample size are large. Our procedure can also be used to perform global sensitivity analysis. 
Date:  2019–05–28 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:24/19&r=all 
By:  Andrii Babii; Eric Ghysels; Jonas Striaukas 
Abstract:  Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HACtype estimators of the residual longrun variances to conduct proper inference. This article introduces structured machine learning regressions for highdimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparsegroup LASSO estimator. We derive a new FukNagaev inequality for a class of $\tau$dependent processes with heavier than Gaussian tails, nesting $\alpha$mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the longrun variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.06307&r=all 
By:  Xi Chen; Victor Chernozhukov; Ye Luo; Martin Spindler 
Abstract:  In this paper we develop a datadriven smoothing technique for highdimensional and nonlinear panel data models. We allow for individual specific (nonlinear) functions and estimation with econometric or machine learning methods by using weighted observations from other individuals. The weights are determined by a datadriven way and depend on the similarity between the corresponding functions and are measured based on initial estimates. The key feature of such a procedure is that it clusters individuals based on the distance / similarity between them, estimated in a first stage. Our estimation method can be combined with various statistical estimation procedures, in particular modern machine learning methods which are in particular fruitful in the highdimensional case and with complex, heterogeneous data. The approach can be interpreted as a \textquotedblleft softclustering\textquotedblright\ in comparison to traditional\textquotedblleft\ hard clustering\textquotedblright that assigns each individual to exactly one group. We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator. Finally, we analyze a big data set from didichuxing.com, a leading company in transportation industry, to analyze and predict the gap between supply and demand based on a large set of covariates. Our estimator clearly performs much better in outofsample prediction compared to existing linear panel data estimators. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.12867&r=all 
By:  Joris Pinkse; Karl Schurter 
Abstract:  We introduce several new estimation methods that leverage shape constraints in auction models to estimate various objects of interest, including the distribution of a bidder's valuations, the bidder's ex ante expected surplus, and the seller's counterfactual revenue. The basic approach applies broadly in that (unlike most of the literature) it works for a wide range of auction formats and allows for asymmetric bidders. Though our approach is not restrictive, we focus our analysis on firstprice, sealedbid auctions with independent private valuations. We highlight two nonparametric estimation strategies, one based on a least squares criterion and the other on a maximum likelihood criterion. We also provide the first direct estimator of the strategy function. We establish several theoretical properties of our methods to guide empirical analysis and inference. In addition to providing the asymptotic distributions of our estimators, we identify ways in which methodological choices should be tailored to the objects of their interest. For objects like the bidders' ex ante surplus and the seller's counterfactual expected revenue with an additional symmetric bidder, we show that our inputparameterfree estimators achieve the semiparametric efficiency bound. For objects like the bidders' inverse strategy function, we provide an easily implementable boundarycorrected kernel smoothing and transformation method in order to ensure the squared error is integrable over the entire support of the valuations. An extensive simulation study illustrates our analytical results and demonstrates the respective advantages of our leastsquares and maximum likelihood estimators in finite samples. Compared to estimation strategies based on kernel density estimation, the simulations indicate that the smoothed versions of our estimators enjoy a large degree of robustness to the choice of an input parameter. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.07466&r=all 
By:  Jiahe Lin; George Michailidis 
Abstract:  A factoraugmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures leadlag correlations amongst a set of observed variables $X$ and latent factors $F$, and a calibration equation that relates another set of observed variables $Y$ with $F$ and $X$. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining leadlag relationships between a large number of time series, while incorporating information from another highdimensional set of variables. Hence, in this paper we investigate the FAVAR model under highdimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.04146&r=all 
By:  Mike Tsionas; Marwan Izzeldin; Lorenzo Trapani 
Abstract:  This paper provides a simple, yet reliable, alternative to the (Bayesian) estimation of large multivariate VARs with time variation in the conditional mean equations and/or in the covariance structure. With our new methodology, the original multivariate, n dimensional model is treated as a set of n univariate estimation problems, and crossdependence is handled through the use of a copula. Thus, only univariate distribution functions are needed when estimating the individual equations, which are often available in closed form, and easy to handle with MCMC (or other techniques). Estimation is carried out in parallel for the individual equations. Thereafter, the individual posteriors are combined with the copula, so obtaining a joint posterior which can be easily resampled. We illustrate our approach by applying it to a large timevarying parameter VAR with 25 macroeconomic variables. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.12527&r=all 
By:  Florian Gunsilius (Institute for Fiscal Studies and MIT); Susanne M. Schennach (Institute for Fiscal Studies and Brown University) 
Abstract:  The idea of summarizing the information contained in a large number of variables by a small number of “factors” or “principal components” has been broadly adopted in economics and statistics. This paper introduces a generalization of the widely used principal component analysis (PCA) to nonlinear settings, thus providing a new tool for dimension reduction and exploratory data analysis or representation. The distinguishing features of the method include (i) the ability to always deliver truly independent factors (as opposed to the merely uncorrelated factors of PCA); (ii) the reliance on the theory of optimal transport and Brenier maps to obtain a robust and efficient computational algorithm; (iii) the use of a new multivariate additive entropy decomposition to determine the principal nonlinear components that capture most of the information content of the data and (iv) formally nesting PCA as a special case, for linear Gaussian factor models. We illustrate the method’s effectiveness in an application to the prediction of excess bond returns from a large number of macro factors. 
Date:  2019–09–23 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:46/19&r=all 
By:  Manuel Denzer (Johannes Gutenberg University Mainz) 
Abstract:  This paper reviews and compares different estimators used in the past to estimate a binary response model (BRM) with a binary endogenous explanatory variable (EEV) to give practical insights to applied econometricians. It also gives a guidance how the average structural function (ASF) can be used in such a setting to estimate average partial effects (APEs). In total, the (relative) performance of six different linear parametric, nonlinear parametric as well as nonlinear semiparametric estimators is compared in specific scenarios like the prevalence of weak instruments. A simulation study shows that the nonlinear parametric estimator dominates in a majority of scenarios even when the corresponding parametric assumptions are not fulfilled. Moreover, while the semiparametric nonlinear estimator might be seen as a suitable alternative for estimating coefficients, it suffers from weaknesses in estimating partial effects. These insights are confirmed by an empirical illustration of the individual decision to supply labor. 
Keywords:  Binary choice, Binomial response, Binary Endogenous Explanatory Variable, Average Structural Function 
JEL:  C25 C26 
Date:  2019–12–10 
URL:  http://d.repec.org/n?u=RePEc:jgu:wpaper:1916&r=all 
By:  Victor Chernozhukov (Institute for Fiscal Studies and MIT); Christian Hansen (Institute for Fiscal Studies and Chicago GSB); Yuan Liao (Institute for Fiscal Studies); Yinchu Zhu (Institute for Fiscal Studies) 
Abstract:  We study a panel data model with general heterogeneous e?ects, where slopes are allowed to be varying across both individuals and times. The key assumption for dimension reduction is that the heterogeneous slopes can be expressed as a factor structure so that the highdimensional slope matrix is of lowrank, so can be estimated using lowrank regularized regression. Our paper makes an important theoretical contribution on the “postSVT (singular value thresholding) inference”. Formally, we show that the postSVT inference can be conducted via three steps: (1) apply the nuclearnorm penalized estimation;(2) extract eigenvectors from the estimated lowrank matrices, and (3) run least squares to iteratively estimate the individual and time e?ect components in the slope matrix. To properly control for the e?ect of the penalized lowrank estimation, we argue that this procedure should be embedded with “partial out the mean structure” and “sample splitting”. The resulting estimators are asymptotically normal and admit valid inferences. Empirically, we apply the proposed methods to estimate the countylevel minimum wage e?ects on the employment. 
Date:  2019–06–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:31/19&r=all 
By:  Holger Dette; Weichi Wu 
Abstract:  We develop an estimator for the highdimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in nonstationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The finite sample properties of the new methodology are illustrated by means of a simulation study and a financial indices study. 
Date:  2020–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2001.00419&r=all 
By:  Antonios Antypas; Guglielmo Maria Caporale; Nikolaos Kourogenis; Nikitas Pittis 
Abstract:  We introduce a methodology which deals with possibly integrated variables in the specification of the betas of conditional asset pricing models. In such a case, any model which is directly derived by a polynomial approximation of the functional form of the conditional beta will inherit a nonstationary right hand side. Our approach uses the cointegrating relationships between the integrated variables in order to maintain the stationarity of the right hand side of the estimated model, thus, avoiding the issues that arise in the case of an unbalanced regression. We present an example where our methodology is applied to the returns of fundsoffunds which are based on the Morningstar mutual fund ranking system. The results provide evidence that the residuals of possible cointegrating relationships between integrated variables in the specification of the conditional betas may reveal significant information concerning the dynamics of the betas. 
Keywords:  conditional CAPM, timevarying beta, cointegration, Morningstar starrating system 
JEL:  G10 G23 C10 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:ces:ceswps:_7969&r=all 
By:  Anand Deo; Sandeep Juneja 
Abstract:  We consider discrete default intensity based and logit type reduced form models for conditional default probabilities for corporate loans where we develop simple closed form approximations to the maximum likelihood estimator (MLE) when the underlying covariates follow a stationary Gaussian process. In a practically reasonable asymptotic regime where the default probabilities are small, say 13% annually, the number of firms and the time period of data available is reasonably large, we rigorously show that the proposed estimator behaves similarly or slightly worse than the MLE when the underlying model is correctly specified. For more realistic case of model misspecification, both estimators are seen to be equally good, or equally bad. Further, beyond a point, both are moreorless insensitive to increase in data. These conclusions are validated on empirical and simulated data. The proposed approximations should also have applications outside finance, where logittype models are used and probabilities of interest are small. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.12611&r=all 
By:  Negera Deresa; Ingrid Van Keilegom 
Abstract:  When modeling survival data, it is common to assume that the (logtransformed) survival time (T) is conditionally independent of the (logtransformed) censoring time (C) given a set of covariates. There are numerous situations in which this assumption is not realistic, and a number of correction procedures have been developed for different models. However, in most cases, either some prior knowledge about the association between T and C is required, or some auxiliary information or data is/are supposed to be available. When this is not the case, the application of many existing methods turns out to be limited. The goal of this paper is to overcome this problem by developing a flexible parametric model, that is a type of transformed linear model. We show that the association between T and C is identifiable in this model. The performance of the proposed method is investigated both in an asymptotic way and through finite sample simulations. We also develop a formal goodnessoffit test approach to assess the quality of the fitted model. Finally, the approach is applied to data coming from a study on liver transplants. 
Date:  2019–01 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:632067&r=all 
By:  JeanPierre Florens; Léopold Simar; Ingrid Van Keilegom 
Abstract:  © 2019, © 2019 American Statistical Association. Consider the model Y = X + ε with X = τ + Z, where τ is an unknown constant (the boundary of X), Z is a random variable defined on R + , ε is a symmetric error, and ε and Z are independent. Based on an iid sample of Y, we aim at identifying and estimating the boundary τ when the law of ε is unknown (apart from symmetry) and in particular its variance is unknown. We propose an estimation procedure based on a minimal distance approach and by making use of Laguerre polynomials. Asymptotic results as well as finite sample simulations are shown. The paper also proposes an extension to stochastic frontier analysis, where the model is conditional to observed variables. The model becomes Y = τ(w1,w2) + Z + ε, where Y is a cost, w 1 are the observed outputs and w 2 represents the observed values of other conditioning variables, so Z is the cost inefficiency. Some simulations illustrate again how the approach works in finite samples, and the proposed procedure is illustrated with data coming from post offices in France. 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630770&r=all 
By:  Azeem M. Shaikh (Institute for Fiscal Studies and University of Chicago); (Institute for Fiscal Studies); Joseph P. Romano (Institute for Fiscal Studies and Stanford University) 
Abstract:  This paper studies inference for the average treatment effect in randomized controlled trials where treatment status is determined according to a "matched pairs" design. By a "matched pairs" design, we mean that units are sampled i.i.d. from the population of interest, paired according to observed, baseline covariates and fi nally, within each pair, one unit is selected at random for treatment. This type of design is used routinely throughout the sciences, but results about its implications for inference about the average treatment effect are not available. The main requirement underlying our analysis is that pairs are formed so that units within pairs are suitably "close" in terms of the baseline covariates, and we develop novel results to ensure that pairs are formed in a way that satis es this condition. Under this assumption, we show that, for the problem of testing the null hypothesis that the average treatment effect equals a prespeci ed value in such settings, the commonly used twosample ttest and "matched pairs" ttest are conservative in the sense that these tests have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the standard errors of these tests leads to a test that is asymptotically exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. We also study the behavior of randomization tests that arise naturally in these types of settings. When implemented appropriately, we show that this approach also leads to a test that is asymptotically exact in the sense described previously, but additionally has fi nitesample rejection probability no greater than the nominal level for certain distributions satisfying the null hypothesis. A simulation study con rms the practical relevance of our theoretical results. 
Keywords:  Experiment, matched pairs, matched pairs ttest, permutation test, randomized controlled trial, treatment assignment, twosample ttest 
Date:  2019–04–25 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:19/19&r=all 
By:  Dennis Kristensen (Institute for Fiscal Studies and University College London); Patrick K. Mogensen (Institute for Fiscal Studies); JongMyun Moon (Institute for Fiscal Studies and University College London); Bertel Schjerning (Institute for Fiscal Studies and University of Copenhagen) 
Abstract:  We propose to combine smoothing, simulations and sieve approximations to solve for either the integrated or expected value function in a general class of dynamic discrete choice (DDC) models. We use importance sampling to approximate the Bellman operators defining the two functions. The random Bellman operators, and therefore also the corresponding solutions, are generally nonsmooth which is undesirable. To circumvent this issue, we introduce a smoothed version of the random Bellman operator and solve for the corresponding smoothed value function using sieve methods. We show that one can avoid using sieves by generalizing and adapting the “selfapproximating” method of Rust (1997b) to our setting. We provide an asymptotic theory for the approximate solutions and show that they converge with vNrate, where N is number of Monte Carlo draws, towards Gaussian processes. We examine their performance in practice through a set of numerical experiments and find that both methods perform well with the sieve method being particularly attractive in terms of computational speed and accuracy. 
Keywords:  Dynamic discrete choice; numerical solution; Monte Carlo; sieves 
Date:  2019–04–03 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:15/19&r=all 
By:  BuilGil, David (University of Manchester); Solymosi, Reka; Moretti, Angelo 
Abstract:  Open and crowdsourced data are becoming prominent in social sciences research. Crowdsourcing projects harness information from large crowds of citizens who voluntarily participate into one collaborative project, and allow new insights into people’s attitudes and perceptions. However, these are usually affected by a series of biases that limit their representativeness (i.e. selfselection bias, unequal participation, underrepresentation of certain areas and times). In this chapter we present a twostep method aimed to produce reliable small area estimates from crowdsourced data when no auxiliary information is available at the individual level. A nonparametric bootstrap, aimed to compute pseudosampling weights and bootstrap weighted estimates, is followed by an arealevel model based small area estimation approach, which borrows strength from related areas based on a set of covariates, to improve the small area estimates. In order to assess the method, a simulation study and an application to safety perceptions in Greater London are conducted. The simulation study shows that the arealevel modelbased small area estimator under the nonparametric bootstrap improves (in terms of bias and variability) the small area estimates in the majority of areas. The application produces estimates of safety perceptions at a small geographical level in Greater London from Place Pulse 2.0 data. In the application, estimates are validated externally by comparing these to reliable survey estimates. Further simulation experiments and applications are needed to examine whether this method also improves the small area estimates when the sample biases are larger, smaller or show different distributions. A measure of reliability also needs to be developed to estimate the error of the small area estimates under the nonparametric bootstrap. 
Date:  2019–10–02 
URL:  http://d.repec.org/n?u=RePEc:osf:socarx:8hgjt&r=all 
By:  Jiahe Lin; George Michailidis 
Abstract:  We consider the estimation of approximate factor models for time series data, where strong serial and crosssectional correlations amongst the idiosyncratic component are present. This setting comes up naturally in many applications, but existing approaches in the literature rely on the assumption that such correlations are weak, leading to misspecification of the number of factors selected and consequently inaccurate inference. In this paper, we explicitly incorporate the dependent structure present in the idiosyncratic component through lagged values of the observed multivariate time series. We formulate a constrained optimization problem to estimate the factor space and the transition matrices of the lagged values {\em simultaneously}, wherein the constraints reflect the low rank nature of the common factors and the sparsity of the transition matrices. We establish theoretical properties of the obtained estimates, and introduce an easytoimplement computational procedure for empirical work. The performance of the model and the implementation procedure is evaluated on synthetic data and compared with competing approaches, and further illustrated on a data set involving weekly logreturns of 75 US large financial institutions for the 20012016 period. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.04123&r=all 
By:  Mathur, Maya B; VanderWeele, Tyler 
Abstract:  We propose new metrics comparing the observed number of hypothesis test rejections ($\widehat{\theta}$) at an unpenalized $\alpha$level to the distribution of rejections that would be expected if all tested null hypotheses held (the "global null"). Specifically, we propose reporting a "null interval'' for the number of $\alpha$level rejections expected to occur in 95% of samples under the global null, the difference between $\widehat{\theta}$ and the upper limit of the null interval (the "excess hits"), and a onesided joint test based on $\widehat{\theta}$ of the global null. For estimation, we describe resampling algorithms that asymptotically recover the sampling distribution under the global null. These methods accommodate arbitrarily correlated test statistics and do not require highdimensional analyses. In a simulation study, we assess properties of the proposed metrics under varying correlation structures as well as their power for outcomewide inference relative to existing FWER methods. We provide an R package, NRejections. Ultimately, existing procedures for multiple hypothesis testing typically penalize inference in each test, which is useful to temper interpretation of individual findings; yet on their own, these procedures do not fully characterize global evidence strength across the multiple tests. Our new metrics help remedy this limitation. 
Date:  2018–09–01 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:k9g3b&r=all 
By:  Fries, Sébastien 
Abstract:  Noncausal, or anticipative, alphastable processes generate trajectories featuring locally explosive episodes akin to speculative bubbles in financial time series data. For (X_t) a twosided infinite alphastable moving average (MA), conditional moments up to integer order four are shown to exist provided (X_t) is anticipative enough. The functional forms of these moments at any forecast horizon under any admissible parameterisation are obtained by extending the literature on arbitrary bivariate alphastable random vectors. The dynamics of noncausal processes simplifies during explosive episodes and allows to express ex ante crash odds at any horizon in terms of the MA coefficients and of the tail index alpha. The results are illustrated in a synthetic portfolio allocation framework and an application to the Nasdaq and S&P500 series is provided. 
Keywords:  Noncausal processes, Multivariate stable distributions, Conditional dependence, Extremal dependence, Explosive bubbles, Prediction, Crash odds, Portfolio allocation 
JEL:  C22 C53 C58 
Date:  2018–05 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:97353&r=all 
By:  Semenova, Daria; Temirkaeva, Maria 
Abstract:  Today, treatment effect estimation at the individual level isa vital problem in many areas of science and business. For example, inmarketing, estimates of the treatment effect are used to select the mostefficient promomechanics; in medicine, individual treatment effects areused to determine the optimal dose of medication for each patient and soon. At the same time, the question on choosing the best method, i.e., themethod that ensures the smallest predictive error (for instance, RMSE)or the highest total (average) value of the effect, remains open. Accordingly, in this paper we compare the effectiveness of machine learningmethods for estimation of individual treatment effects. The comparisonis performed on the Criteo Uplift Modeling Dataset. In this paper weshow that the combination of the Logistic Regression method and theDifference Score method as well as Uplift Random Forest method provide the best correctness of Individual Treatment Effect prediction onthe top 30% observations of the test dataset. 
Keywords:  Individual Treatment Effect; ITE; Machine Learning; Random Forest; XGBoost; SVM·Random; Experiments; A/B testing; Uplift Random Forest 
JEL:  C10 M30 
Date:  2019–09–23 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:97309&r=all 
By:  Mathur, Maya B; VanderWeele, Tyler 
Abstract:  We provide two simple metrics that could be reported routinely in randomeffects metaanalyses to convey evidence strength for scientifically meaningful effects under effect heterogeneity (i.e., a nonzero estimated variance of the true effect distribution). First, given a chosen threshold of meaningful effect size, metaanalyses could report the estimated proportion of true effect sizes above this threshold. Second, metaanalyses could estimate the proportion of effect sizes below a second, possibly symmetric, threshold in the opposite direction from the estimated mean. These metrics could help identify if: (1) there are few effects of scientifically meaningful size despite a "statistically significant" pooled point estimate; (2) there are some large effects despite an apparently null point estimate; or (3) strong effects in the direction opposite the pooled estimate regularly also occur (and thus, potential effect modifiers should be examined). These metrics should be presented with confidence intervals, which can be obtained analytically or, under weaker assumptions, using biascorrected and accelerated (BCa) bootstrapping. Additionally, these metrics inform relative comparison of evidence strength across related metaanalyses. We illustrate with applied examples and provide an R package to compute the metrics and confidence intervals. 
Date:  2018–11–14 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:v37j6&r=all 
By:  Bo E. Honoré (Institute for Fiscal Studies and Princeton); Thomas Jorgensen (Institute for Fiscal Studies and University of Copenhagen); Áureo de Paula (Institute for Fiscal Studies and University College London) 
Abstract:  This paper introduces measures for how each moment contributes to the precision of the parameter estimates in GMM settings. For example, one of the measures asks what would happen to the variance of the parameter estimates if a particular moment was dropped from the estimation. The measures are all easy to compute. We illustrate the usefulness of the measures through two simple examples as well as an application to a model of joint retirement planning of couples. We estimate the model using the UKBHPS, and we ?nd evidence of complementarities in leisure. Our sensitivity measures illustrate that the precision of the estimate of the complementarity is primarily driven by the distribution of the di?erences in planned retirement dates. The estimated econometric model can be interpreted as a bivariate ordered choice model that allows for simultaneity. This makes the model potentially useful in other applications. 
Date:  2019–07–05 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:36/19&r=all 
By:  van Erp, Sara (Tilburg University); Oberski, Daniel L.; Mulder, Joris 
Abstract:  In linear regression problems with many predictors, penalized regression techniques are often used to guard against overfitting and to select variables relevant for predicting the outcome. Classical regression techniques find coefficients that minimize a squared residual; penalized regression adds a penalty term to this residual to limit the coefficients’ sizes, thereby preventing over fitting. Many classical penalization techniques have a Bayesian counterpart, which result in the same solutions when a specific prior distribution is used in combination with posterior mode estimates. Compared to classical penalization techniques, the Bayesian penalization techniques perform similarly or even better, and they offer additional advantages such as readily available uncertainty estimates, automatic estimation of the penalty parameter, and more flexibility in terms of penalties that can be considered. As a result, Bayesian penalization is becoming increasingly popular. The aim of this paper is to provide a comprehensive overview of the literature on Bayesian penalization. We will compare different priors for penalization that have been proposed in the literature in terms of their characteristics, shrinkage behavior, and performance in terms of prediction and variable selection in order to aid researchers to navigate the many prior options. 
Date:  2018–01–31 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:cg8fq&r=all 
By:  Nguyen, Loc PhD, MD, MBA 
Abstract:  Multivariate hypothesis testing becomes more and more necessary when data is in the process of changing from scalar and univariate format to multivariate format, especially financial and biological data is often constituted of ndimension vectors. Likelihood ratio test is the best method that applies the test on mean of multivariate sample with known or unknown covariance matrix but it is impossible to use likelihood ratio test in case of incomplete data when the data incompletion gets popular because of many reasons in reality. Therefore, this research proposes a new approach that gives an ability to apply likelihood ratio test into incomplete data. Instead of replacing missing values in incomplete sample by estimated values, this approach classifies incomplete sample into groups and each group is represented by a potential or partial distribution. All partial distributions are unified into a mixture model which is optimized via expectation maximization (EM) algorithm. Finally, likelihood ratio test is performed on mixture model instead of incomplete sample. This research provides a thorough description of proposed approach and mathematical proof that is necessary to such approach. The comparison of mixture model approach and filling missing values approach is also discussed in this research. 
Date:  2018–01–17 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:kmg3r&r=all 
By:  Manuel Arellano; Stephane Bonhomme 
Abstract:  We propose an optimaltransportbased matching method to nonparametrically estimate linear models with independent latent variables. The method consists in generating pseudoobservations from the latent variables, so that the Euclidean distance between the model's predictions and their matched counterparts in the data is minimized. We show that our nonparametric estimator is consistent, and we document that it performs well in simulated data. We apply this method to study the cyclicality of permanent and transitory income shocks in the Panel Study of Income Dynamics. We find that the dispersion of income shocks is approximately acyclical, whereas the skewness of permanent shocks is procyclical. By comparison, we find that the dispersion and skewness of shocks to hourly wages vary little with the business cycle. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.13081&r=all 
By:  Yingyao Hu (Institute for Fiscal Studies and Johns Hopkins University); Yi Xin (Institute for Fiscal Studies) 
Abstract:  This paper develops identi?cation and estimation methods for dynamic structural models when agents’ actions are unobserved by econometricians. We provide conditions under which choice probabilities and latent state transition rules are nonparametrically identi?ed with a continuous state variable in a singleagent dynamic discrete choice model. Our identi?cation results extend to (1) models with serially correlated unobserved heterogeneity and continuous choices, (2) cases in which only discrete state variables are available, and (3) dynamic discrete games. We apply our method to study moral hazard problems in US gubernatorial elections. We ?nd that the probabilities of shirking increase as the governors approach the end of their terms. 
Date:  2019–06–18 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:35/19&r=all 
By:  Gergely Ganics (Banco de España); Florens Odendahl (Banque de France) 
Abstract:  We incorporate external information extracted from the European Central Bank’s Survey of Professional Forecasters into the predictions of a Bayesian VAR, using entropic tilting and soft conditioning. The resulting conditional forecasts significantly improve the plain BVAR point and density forecasts. Importantly, we do not restrict the forecasts at a specific quarterly horizon, but their possible paths over several horizons jointly, as the survey information comes in the form of one and twoyearahead expectations. Besides improving the accuracy of the variable that we target, the spillover effects on “otherthantargeted” variables are relevant in size and statistically significant. We document that the baseline BVAR exhibits an upward bias for GDP growth after the financial crisis, and our results provide evidence that survey forecasts can help mitigate the effects of structural breaks on the forecasting performance of a popular macroeconometric model. Furthermore, we provide evidence of unstable VAR dynamics, especially during and after the recent Great Recession. 
Keywords:  Survey of Professional Forecasters, density forecasts, entropic tilting, soft conditioning 
JEL:  C32 C53 E37 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:bde:wpaper:1948&r=all 
By:  Antonia Reinecke; HansJörg Schmerer 
Abstract:  This paper develops an approach that allows constructing regional proxies of government effectiveness at a highly disaggregated level. Our idea builds on the well documented interdependence between institutions and exports, which allows estimating the latent government effectiveness using methods of structural equation modeling. Unobserved institutional quality for each individual region is predicted using the regression outcomes. The impact of this novel identification strategy is tested using various panel applications. Results show that the magnitude of the effect of institutional quality can be biased in estimates that neglect regional differences in the effectiveness of institutions. 
Keywords:  trade, institutions, Chinese plant data, latent variable, structural equation modelling 
JEL:  D02 F10 P16 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:ces:ceswps:_7979&r=all 
By:  Arthur Lewbel (Institute for Fiscal Studies and Boston College); Lars Nesheim (Institute for Fiscal Studies and cemmap and UCL) 
Abstract:  We propose a demand model where consumers simultaneously choose a few di?erent goods from a large menu of available goods, and choose how much to consume of each good. The model nests multinomial discrete choice and continuous demand systems as special cases. Goods can be substitutes or complements. Random coe?cients are employed to capture the wide variation in the composition of consumption baskets. Nonnegativity constraints produce corners that account for di?erent consumers purchasing di?erent numbers of types of goods. We show semiparametric identi?cation of the model. We apply the model to the demand for fruit in the United Kingdom. We estimate the model’s parameters using UK scanner data for 2008 from the Kantar World Panel. Using our parameter estimates, we estimate a matrix of demand elasticities for 27 categories of fruit and analyze a range of tax and policy change scenarios. 
Date:  2019–09–23 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:45/19&r=all 
By:  Wayne Yuan Gao; Ming Li; Sheng Xu 
Abstract:  This paper considers a semiparametric model of dyadic network formation under nontransferable utilities (NTU). Such dyadic links arise frequently in realworld social interactions that require bilateral consent but by their nature induce additive nonseparability. In our model we show how unobserved individual heterogeneity in the network formation model can be canceled out without requiring additive separability. The approach uses a new method we call logical differencing. The key idea is to construct an observable event involving the intersection of two mutually exclusive restrictions on the fixed effects, while these restrictions are as necessary conditions of weak multivariate monotonicity. Based on this identification strategy we provide consistent estimators of the network formation model under NTU. Finitesample performance of our method is analyzed in a simulation study, and an empirical illustration using the risksharing network data from Nyakatoke demonstrates that our proposed method is able to obtain economically intuitive estimates. 
Date:  2020–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2001.00691&r=all 
By:  Mert Demirer (Institute for Fiscal Studies); Vasilis Syrgkanis (Institute for Fiscal Studies); Greg Lewis (Institute for Fiscal Studies); Victor Chernozhukov (Institute for Fiscal Studies and MIT) 
Abstract:  We consider offpolicy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semiparametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust offpolicy estimate for this setting and show that offpolicy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semiparametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation. 
Date:  2019–06–12 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:34/19&r=all 
By:  Benjamin Colling; Ingrid Van Keilegom 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630751&r=all 
By:  Gergely Ganics (Banco de España); Barbara Rossi (ICREA – Univ. Pompeu Fabra, Barcelona GSE, and CREI); Tatevik Sekhposyan (Texas A&M University) 
Abstract:  Surveys of Professional Forecasters produce precise and timely point forecasts for key macroeconomic variables. However, the accompanying density forecasts are not as widely utilized, and there is no consensus about their quality. This is partly because such surveys are often conducted for “fixed events”. For example, in each quarter panelists are asked to forecast output growth and inflation for the current calendar year and the next, implying that the forecast horizon changes with each survey round. The fixedevent nature limits the usefulness of survey density predictions for policymakers and market participants, who often wish to characterize uncertainty a fixed number of periods ahead (“fixedhorizon”). Is it possible to obtain fixedhorizon density forecasts using the available fixedevent ones? We propose a density combination approach that weights fixedevent density forecasts according to a uniformity of the probability integral transform criterion, aiming at obtaining a correctly calibrated fixedhorizon density forecast. Using data from the US Survey of Professional Forecasters, we show that our combination method produces competitive density forecasts relative to widely used alternatives based on historical forecast errors or Bayesian VARs. Thus, our proposed fixedhorizon predictive densities are a new and useful tool for researchers and policy makers. 
Keywords:  Survey of Professional Forecasters, density forecasts, forecast combination, predictive density, probability integral transform, uncertainty, realtime 
JEL:  C13 C32 C53 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:bde:wpaper:1947&r=all 
By:  Anneleen Verhasselt; Alvaro J Flórez; Ingrid Van Keilegom; Geert Molenberghs 
Date:  2019–01 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:632541&r=all 
By:  Mkael Escobar; Ingrid Van Keilegom 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630759&r=all 
By:  Hohsuk Noh; Ingrid Van Keilegom 
Date:  2019–01 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:632085&r=all 
By:  Justin Chown; Cédric Heuchenne; Ingrid Van Keilegom 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630665&r=all 
By:  Alisa Yusupova; Nicos G. Pavlidis; Efthymios G. Pavlidis 
Abstract:  Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a stateoftheart nonparametric model combination algorithm from the prediction with expert advice literature, which offers finitetime performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.04661&r=all 
By:  Irène Gijbels; Ingrid Van Keilegom; Yue Zhao 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630763&r=all 
By:  Mickaël De Backer; Anouar El Ghouch; Ingrid Van Keilegom 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630769&r=all 
By:  Jianfei Cao; Shirley Lu 
Abstract:  We introduce a synthetic control methodology to study policies with staggered adoption. Many policies, such as the board gender quota, are replicated by other policy setters at different time frames. Our method estimates the dynamic average treatment effects on the treated using variation introduced by the staggered adoption of policies. Our method gives asymptotically unbiased estimators of many interesting quantities and delivers asymptotically valid inference. By using the proposed method and national labor data in Europe, we find evidence that quota regulation on board diversity leads to a decrease in parttime employment, and an increase in fulltime employment for female professionals. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.06320&r=all 
By:  Francesca Molinari (Institute for Fiscal Studies and Cornell University) 
Abstract:  Econometrics has traditionally revolved around point identi cation. Much effort has been devoted to finding the weakest set of assumptions that, together with the available data, deliver point identifi cation of population parameters, finite or infi nite dimensional that these might be. And point identifi cation has been viewed as a necessary prerequisite for meaningful statistical inference. The research program on partial identifi cation has begun to slowly shift this focus in the early 1990s, gaining momentum over time and developing into a widely researched area of econometrics. Partial identifi cation has forcefully established that much can be learned from the available data and assumptions imposed because of their credibility rather than their ability to yield point identifi cation. Within this paradigm, one obtains a set of values for the parameters of interest which are observationally equivalent given the available data and maintained assumptions. I refer to this set as the parameters' sharp identifi cation region. Econometrics with partial identi fication is concerned with: (1) obtaining a tractable characterization of the parameters' sharp identi fication region; (2) providing methods to estimate it; (3) conducting test of hypotheses and making con fidence statements about the partially identi fied parameters. Each of these goals poses challenges that differ from those faced in econometrics with point identifi cation. This chapter discusses these challenges and some of their solution. It reviews advances in partial identifi cation analysis both as applied to learning (functionals of) probability distributions that are welldefi ned in the absence of models, as well as to learning parameters that are welldefi ned only in the context of particular models. The chapter highlights a simple organizing principle: the source of the identi fication problem can often be traced to a collection of random variables that are consistent with the available data and maintained assumptions. This collection may be part of the observed data or be a model implication. In either case, it can be formalized as a random set. Random set theory is then used as a mathematical framework to unify a number of special results and produce a general methodology to conduct econometrics with partial identi fication. 
Date:  2019–05–31 
URL:  http://d.repec.org/n?u=RePEc:ifs:cemmap:25/19&r=all 
By:  Valentin Patilea; Ingrid Van Keilegom 
Date:  2019–01 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:632076&r=all 
By:  Candida Geerdens; Paul Janssen; Ingrid Van Keilegom 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ete:kbiper:630753&r=all 