Econometrics
http://lists.repec.org/mailman/listinfo/nep-ecm
Econometrics
2020-01-13
Two-stage weighted least squares estimator of the conditional mean of observation-driven time series models
http://d.repec.org/n?u=RePEc:pra:mprapa:97382&r=ecm
General parametric forms are assumed for the conditional mean λ_{t}(θ₀) and variance υ_{t}(ξ₀) of a time series. These conditional moments can for instance be derived from count time series, Autoregressive Conditional Duration (ACD) or Generalized Autoregressive Score (GAS) models. In this paper, our aim is to estimate the conditional mean parameter θ₀, trying to be as agnostic as possible about the conditional distribution of the observations. Quasi-Maximum Likelihood Estimators (QMLEs) based on the linear exponential family fulfill this goal, but they may be inefficient and have complicated asymptotic distributions when θ₀ contains zero coefficients. We thus study alternative weighted least square estimators (WLSEs), which enjoy the same consistency property as the QMLEs when the conditional distribution is misspecified, but have simpler asymptotic distributions when components of θ₀ are null and gain in efficiency when υ_{t} is well specified. We compare the asymptotic properties of the QMLEs and WLSEs, and determine a data driven strategy for finding an asymptotically optimal WLSE. Simulation experiments and illustrations on realized volatility forecasting are presented.
Aknouche, Abdelhakim
Francq, Christian
Autoregressive Conditional Duration model; Exponential, Poisson, Negative Binomial QMLE; INteger-valued AR; INteger-valued GARCH; Weighted LSE.
2019-12-01
Subvector inference in PI models with many moment inequalities
http://d.repec.org/n?u=RePEc:ifs:cemmap:28/19&r=ecm
This paper considers inference for a function of a parameter vector in a partially identi?ed model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identi?ed parameter vector associated with a treatment e?ect or a policy variable of interest. Our inference method compares a MinMax test statistic (minimum over parameters satisfying H0 and maximum over moment inequalities) against critical values that are based on bootstrap approximations or analytical bounds. We show that this method controls asymptotic size uniformly over a large class of data generating processes despite the partially identi?ed many moment inequality setting. The ?nite sample analysis allows us to obtain explicit rates of convergence on the size control. Our results are based on combining non-asymptotic approximations and new high-dimensional central limit theorems for the MinMax of the components of random matrices, which may be of independent interest. Unlike the previous literature on functional inference in partially identi?ed models, our results do not rely on weak convergence results based on Donsker’s class assumptions and, in fact, our test statistic may not even converge in distribution. Our bootstrap approximation requires the choice of a tuning parameter sequence that can avoid the excessive concentration of our test statistic. To this end, we propose an asymptotically valid data-driven method to select this tuning parameter sequence. This method generalizes the selection of tuning parameter sequences to problems outside the Donsker’s class assumptions and may also be of independent interest. Our procedures based on self-normalized moderate deviation bounds are relatively more conservative but easier to implement.
Alexandre Belloni
Federico A. Bugni
Victor Chernozhukov
2019-06-12
Bayesian Median Autoregression for Robust Time Series Forecasting
http://d.repec.org/n?u=RePEc:arx:papers:2001.01116&r=ecm
We develop a Bayesian median autoregressive (BayesMAR) model for time series forecasting. The proposed method utilizes time-varying quantile regression at the median, favorably inheriting the robustness of median regression in contrast to the widely used mean-based methods. Motivated by a working Laplace likelihood approach in Bayesian quantile regression, BayesMAR adopts a parametric model bearing the same structure of autoregressive (AR) models by altering the Gaussian error to Laplace, leading to a simple, robust, and interpretable modeling strategy for time series forecasting. We estimate model parameters by Markov chain Monte Carlo. Bayesian model averaging (BMA) is used to account for model uncertainty including the uncertainty in the autoregressive order, in addition to a Bayesian model selection approach. The proposed methods are illustrated using simulation and real data applications. An application to U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and often superior predictive performances than the selected mean-based alternatives under various loss functions. The proposed methods are generic and can be used to complement a rich class of methods that builds on the AR models.
Zijian Zeng
Meng Li
2020-01
Inference on average treatment effects in aggregate panel data settings
http://d.repec.org/n?u=RePEc:ifs:cemmap:32/19&r=ecm
This paper studies inference on treatment effects in aggregate panel data settings with a single treated unit and many control units. We propose new methods for making inference on average treatment effects in settings where both the number of pre-treatment and the number of post-treatment periods are large. We use linear models to approximate the counterfactual mean outcomes in the absence of the treatment. The counterfactuals are estimated using constrained Lasso, an essentially tuning free regression approach that nests difference-in-differences and synthetic control as special cases. We propose a K-fold cross-fitting procedure to remove the bias induced by regularization. To avoid the estimation of the long run variance, we construct a self-normalized t-statistic. The test statistic has an asymptotically pivotal distribution (a student t-distribution with K - 1 degrees of freedom), which makes our procedure very easy to implement. Our approach has several theoretical advantages. First, it does not rely on any sparsity assumptions. Second, it is fully robust against misspecification of the linear model. Third, it is more efficient than difference-in-means and difference-in-differences estimators. The proposed method demonstrates an excellent performance in simulation experiments, and is taken to a data application, where we re-evaluate the economic consequences of terrorism.
Victor Chernozhukov
Kaspar Wüthrich
Yinchu Zhu
2019-06-12
Robust Bayesian Inference in Proxy SVARs
http://d.repec.org/n?u=RePEc:ifs:cemmap:38/19&r=ecm
We develop methods for robust Bayesian inference in structural vector autoregressions (SVARs) where the impulse responses or forecast error variance decompositions of interest are set-identi?ed using external instruments (or ‘proxy SVARs’). Existing Bayesian approaches to inference in proxy SVARs require researchers to specify a single prior over the model’s parameters. When parameters are set-identi?ed, a component of the prior is never updated by the data. Giacomini and Kitagawa (2018) propose a method for robust Bayesian inference in set-identifed models that delivers inference about the identi?ed set for the parameter of interest. We extend this approach to proxy SVARs, which allows researchers to relax potentially controversial point-identifying restrictions without having to specify an unrevisable prior. We also explore the e?ect of instrument strength on posterior inference. We illustrate our approach by revisiting Mertens and Ravn (2013) and relaxing the assumption that they impose to obtain point identi?cation.
Raffaella Giacomini
Toru Kitagawa
Matthew Read
2019-07-23
Estimation Under Ambiguity
http://d.repec.org/n?u=RePEc:ifs:cemmap:24/19&r=ecm
To perform Bayesian analysis of a partially identified structural model, two distinct approaches exist: standard Bayesian inference, which assumes a single prior for the structural parameters, including the non-identified ones; and multiple-prior Bayesian inference, which assumes full ambiguity for the non-identified parameters. The prior inputs considered by these two extreme approaches can often be a poor representation of the researcher’s prior knowledge in practice. This paper fills the large gap between the two approaches by proposing a multiple-prior Bayesian analysis that can simultaneously incorporate a probabilistic belief for the non-identified parameters and a concern about misspecification of this belief. Our proposal introduces a benchmark prior representing the researcher’s partially credible probabilistic belief for non-identified parameters, and a set of priors formed in its Kullback-Leibler (KL) neighborhood, whose radius controls the “degree of ambiguity.” We obtain point estimators and optimal decisions involving non-identified parameters by solving a conditional gamma-minimax problem, which we show is analytically tractable and easy to solve numerically. We derive the remarkably simple analytical properties of the proposed procedure in the limiting situations where the radius of the KL neighborhood and/or the sample size are large. Our procedure can also be used to perform global sensitivity analysis.
Raffaella Giacomini
Toru Kitagawa
Harald Uhlig
2019-05-28
Estimation and HAC-based Inference for Machine Learning Time Series Regressions
http://d.repec.org/n?u=RePEc:arx:papers:1912.06307&r=ecm
Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HAC-type estimators of the residual long-run variances to conduct proper inference. This article introduces structured machine learning regressions for high-dimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparse-group LASSO estimator. We derive a new Fuk-Nagaev inequality for a class of $\tau$-dependent processes with heavier than Gaussian tails, nesting $\alpha$-mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the long-run variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.
Andrii Babii
Eric Ghysels
Jonas Striaukas
2019-12
Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data
http://d.repec.org/n?u=RePEc:arx:papers:1912.12867&r=ecm
In this paper we develop a data-driven smoothing technique for high-dimensional and non-linear panel data models. We allow for individual specific (non-linear) functions and estimation with econometric or machine learning methods by using weighted observations from other individuals. The weights are determined by a data-driven way and depend on the similarity between the corresponding functions and are measured based on initial estimates. The key feature of such a procedure is that it clusters individuals based on the distance / similarity between them, estimated in a first stage. Our estimation method can be combined with various statistical estimation procedures, in particular modern machine learning methods which are in particular fruitful in the high-dimensional case and with complex, heterogeneous data. The approach can be interpreted as a \textquotedblleft soft-clustering\textquotedblright\ in comparison to traditional\textquotedblleft\ hard clustering\textquotedblright that assigns each individual to exactly one group. We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator. Finally, we analyze a big data set from didichuxing.com, a leading company in transportation industry, to analyze and predict the gap between supply and demand based on a large set of covariates. Our estimator clearly performs much better in out-of-sample prediction compared to existing linear panel data estimators.
Xi Chen
Victor Chernozhukov
Ye Luo
Martin Spindler
2019-12
Estimation of Auction Models with Shape Restrictions
http://d.repec.org/n?u=RePEc:arx:papers:1912.07466&r=ecm
We introduce several new estimation methods that leverage shape constraints in auction models to estimate various objects of interest, including the distribution of a bidder's valuations, the bidder's ex ante expected surplus, and the seller's counterfactual revenue. The basic approach applies broadly in that (unlike most of the literature) it works for a wide range of auction formats and allows for asymmetric bidders. Though our approach is not restrictive, we focus our analysis on first--price, sealed--bid auctions with independent private valuations. We highlight two nonparametric estimation strategies, one based on a least squares criterion and the other on a maximum likelihood criterion. We also provide the first direct estimator of the strategy function. We establish several theoretical properties of our methods to guide empirical analysis and inference. In addition to providing the asymptotic distributions of our estimators, we identify ways in which methodological choices should be tailored to the objects of their interest. For objects like the bidders' ex ante surplus and the seller's counterfactual expected revenue with an additional symmetric bidder, we show that our input--parameter--free estimators achieve the semiparametric efficiency bound. For objects like the bidders' inverse strategy function, we provide an easily implementable boundary--corrected kernel smoothing and transformation method in order to ensure the squared error is integrable over the entire support of the valuations. An extensive simulation study illustrates our analytical results and demonstrates the respective advantages of our least--squares and maximum likelihood estimators in finite samples. Compared to estimation strategies based on kernel density estimation, the simulations indicate that the smoothed versions of our estimators enjoy a large degree of robustness to the choice of an input parameter.
Joris Pinkse
Karl Schurter
2019-12
Regularized Estimation of High-dimensional Factor-Augmented Autoregressive (FAVAR) Models
http://d.repec.org/n?u=RePEc:arx:papers:1912.04146&r=ecm
A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables $X$ and latent factors $F$, and a calibration equation that relates another set of observed variables $Y$ with $F$ and $X$. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators.
Jiahe Lin
George Michailidis
2019-12
Bayesian estimation of large dimensional time varying VARs using copulas
http://d.repec.org/n?u=RePEc:arx:papers:1912.12527&r=ecm
This paper provides a simple, yet reliable, alternative to the (Bayesian) estimation of large multivariate VARs with time variation in the conditional mean equations and/or in the covariance structure. With our new methodology, the original multivariate, n dimensional model is treated as a set of n univariate estimation problems, and cross-dependence is handled through the use of a copula. Thus, only univariate distribution functions are needed when estimating the individual equations, which are often available in closed form, and easy to handle with MCMC (or other techniques). Estimation is carried out in parallel for the individual equations. Thereafter, the individual posteriors are combined with the copula, so obtaining a joint posterior which can be easily resampled. We illustrate our approach by applying it to a large time-varying parameter VAR with 25 macroeconomic variables.
Mike Tsionas
Marwan Izzeldin
Lorenzo Trapani
2019-12
Independent nonlinear component analysis
http://d.repec.org/n?u=RePEc:ifs:cemmap:46/19&r=ecm
The idea of summarizing the information contained in a large number of variables by a small number of “factors” or “principal components” has been broadly adopted in economics and statistics. This paper introduces a generalization of the widely used principal component analysis (PCA) to nonlinear settings, thus providing a new tool for dimension reduction and exploratory data analysis or representation. The distinguishing features of the method include (i) the ability to always deliver truly independent factors (as opposed to the merely uncorre-lated factors of PCA); (ii) the reliance on the theory of optimal transport and Brenier maps to obtain a robust and efficient computational algorithm; (iii) the use of a new multivariate additive entropy decomposition to determine the principal nonlinear components that capture most of the information content of the data and (iv) formally nesting PCA as a special case, for linear Gaussian factor models. We illustrate the method’s effectiveness in an application to the prediction of excess bond returns from a large number of macro factors.
Florian Gunsilius
Susanne M. Schennach
2019-09-23
Estimating Causal Effects in Binary Response Models with Binary Endogenous Explanatory Variables - A Comparison of Possible Estimators
http://d.repec.org/n?u=RePEc:jgu:wpaper:1916&r=ecm
This paper reviews and compares different estimators used in the past to estimate a binary response model (BRM) with a binary endogenous explanatory variable (EEV) to give practical insights to applied econometricians. It also gives a guidance how the average structural function (ASF) can be used in such a setting to estimate average partial effects (APEs). In total, the (relative) performance of six different linear parametric, non-linear parametric as well as non-linear semi-parametric estimators is compared in specific scenarios like the prevalence of weak instruments. A simulation study shows that the non-linear parametric estimator dominates in a majority of scenarios even when the corresponding parametric assumptions are not fulfilled. Moreover, while the semi-parametric non-linear estimator might be seen as a suitable alternative for estimating coefficients, it suffers from weaknesses in estimating partial effects. These insights are confirmed by an empirical illustration of the individual decision to supply labor.
Manuel Denzer
Binary choice, Binomial response, Binary Endogenous Explanatory Variable, Average Structural Function
2019-12-10
Inference for heterogeneous effects using low-rank estimations
http://d.repec.org/n?u=RePEc:ifs:cemmap:31/19&r=ecm
We study a panel data model with general heterogeneous e?ects, where slopes are allowed to be varying across both individuals and times. The key assumption for dimension reduction is that the heterogeneous slopes can be expressed as a factor structure so that the high-dimensional slope matrix is of low-rank, so can be estimated using low-rank regularized regression. Our paper makes an important theoretical contribution on the “post-SVT (singular value thresholding) inference”. Formally, we show that the post-SVT inference can be conducted via three steps: (1) apply the nuclear-norm penalized estimation;(2) extract eigenvectors from the estimated low-rank matrices, and (3) run least squares to iteratively estimate the individual and time e?ect components in the slope matrix. To properly control for the e?ect of the penalized low-rank estimation, we argue that this procedure should be embedded with “partial out the mean structure” and “sample splitting”. The resulting estimators are asymptotically normal and admit valid inferences. Empirically, we apply the proposed methods to estimate the county-level minimum wage e?ects on the employment.
Victor Chernozhukov
Christian Hansen
Yuan Liao
Yinchu Zhu
2019-06-12
Prediction in locally stationary time series
http://d.repec.org/n?u=RePEc:arx:papers:2001.00419&r=ecm
We develop an estimator for the high-dimensional covariance matrix of a locally stationary process with a smoothly varying trend and use this statistic to derive consistent predictors in non-stationary time series. In contrast to the currently available methods for this problem the predictor developed here does not rely on fitting an autoregressive model and does not require a vanishing trend. The finite sample properties of the new methodology are illustrated by means of a simulation study and a financial indices study.
Holger Dette
Weichi Wu
2020-01
Estimation of Conditional Asset Pricing Models with Integrated Variables in the Beta Specification
http://d.repec.org/n?u=RePEc:ces:ceswps:_7969&r=ecm
We introduce a methodology which deals with possibly integrated variables in the specification of the betas of conditional asset pricing models. In such a case, any model which is directly derived by a polynomial approximation of the functional form of the conditional beta will inherit a nonstationary right hand side. Our approach uses the cointegrating relationships between the integrated variables in order to maintain the stationarity of the right hand side of the estimated model, thus, avoiding the issues that arise in the case of an unbalanced regression. We present an example where our methodology is applied to the returns of funds-of-funds which are based on the Morningstar mutual fund ranking system. The results provide evidence that the residuals of possible cointegrating relationships between integrated variables in the specification of the conditional betas may reveal significant information concerning the dynamics of the betas.
Antonios Antypas
Guglielmo Maria Caporale
Nikolaos Kourogenis
Nikitas Pittis
conditional CAPM, time-varying beta, cointegration, Morningstar star-rating system
2019
Credit Risk: Simple Closed Form Approximate Maximum Likelihood Estimator
http://d.repec.org/n?u=RePEc:arx:papers:1912.12611&r=ecm
We consider discrete default intensity based and logit type reduced form models for conditional default probabilities for corporate loans where we develop simple closed form approximations to the maximum likelihood estimator (MLE) when the underlying covariates follow a stationary Gaussian process. In a practically reasonable asymptotic regime where the default probabilities are small, say 1-3% annually, the number of firms and the time period of data available is reasonably large, we rigorously show that the proposed estimator behaves similarly or slightly worse than the MLE when the underlying model is correctly specified. For more realistic case of model misspecification, both estimators are seen to be equally good, or equally bad. Further, beyond a point, both are more-or-less insensitive to increase in data. These conclusions are validated on empirical and simulated data. The proposed approximations should also have applications outside finance, where logit-type models are used and probabilities of interest are small.
Anand Deo
Sandeep Juneja
2019-12
Flexible parametric model for survival data subject to dependent censoring
http://d.repec.org/n?u=RePEc:ete:kbiper:632067&r=ecm
When modeling survival data, it is common to assume that the (log-transformed) survival time (T) is conditionally independent of the (log-transformed) censoring time (C) given a set of covariates. There are numerous situations in which this assumption is not realistic, and a number of correction procedures have been developed for different models. However, in most cases, either some prior knowledge about the association between T and C is required, or some auxiliary information or data is/are supposed to be available. When this is not the case, the application of many existing methods turns out to be limited. The goal of this paper is to overcome this problem by developing a flexible parametric model, that is a type of transformed linear model. We show that the association between T and C is identifiable in this model. The performance of the proposed method is investigated both in an asymptotic way and through finite sample simulations. We also develop a formal goodness-of-fit test approach to assess the quality of the fitted model. Finally, the approach is applied to data coming from a study on liver transplants.
Negera Deresa
Ingrid Van Keilegom
2019-01
Estimation of the boundary of a variable observed with symmetric error
http://d.repec.org/n?u=RePEc:ete:kbiper:630770&r=ecm
© 2019, © 2019 American Statistical Association. Consider the model Y = X + ε with X = τ + Z, where τ is an unknown constant (the boundary of X), Z is a random variable defined on R + , ε is a symmetric error, and ε and Z are independent. Based on an iid sample of Y, we aim at identifying and estimating the boundary τ when the law of ε is unknown (apart from symmetry) and in particular its variance is unknown. We propose an estimation procedure based on a minimal distance approach and by making use of Laguerre polynomials. Asymptotic results as well as finite sample simulations are shown. The paper also proposes an extension to stochastic frontier analysis, where the model is conditional to observed variables. The model becomes Y = τ(w1,w2) + Z + ε, where Y is a cost, w 1 are the observed outputs and w 2 represents the observed values of other conditioning variables, so Z is the cost inefficiency. Some simulations illustrate again how the approach works in finite samples, and the proposed procedure is illustrated with data coming from post offices in France.
Jean-Pierre Florens
Léopold Simar
Ingrid Van Keilegom
2018-12
Inference in Experiments with Matched Pairs
http://d.repec.org/n?u=RePEc:ifs:cemmap:19/19&r=ecm
This paper studies inference for the average treatment effect in randomized controlled trials where treatment status is determined according to a "matched pairs" design. By a "matched pairs" design, we mean that units are sampled i.i.d. from the population of interest, paired according to observed, baseline covariates and fi nally, within each pair, one unit is selected at random for treatment. This type of design is used routinely throughout the sciences, but results about its implications for inference about the average treatment effect are not available. The main requirement underlying our analysis is that pairs are formed so that units within pairs are suitably "close" in terms of the baseline covariates, and we develop novel results to ensure that pairs are formed in a way that satis es this condition. Under this assumption, we show that, for the problem of testing the null hypothesis that the average treatment effect equals a pre-speci ed value in such settings, the commonly used two-sample t-test and "matched pairs" t-test are conservative in the sense that these tests have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the standard errors of these tests leads to a test that is asymptotically exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. We also study the behavior of randomization tests that arise naturally in these types of settings. When implemented appropriately, we show that this approach also leads to a test that is asymptotically exact in the sense described previously, but additionally has fi nite-sample rejection probability no greater than the nominal level for certain distributions satisfying the null hypothesis. A simulation study con rms the practical relevance of our theoretical results.
Azeem M. Shaikh
Joseph P. Romano
Experiment, matched pairs, matched pairs t-test, permutation test, randomized controlled trial, treatment assignment, two-sample t-test
2019-04-25
Solving dynamic discrete choice models using smoothing and sieve methods
http://d.repec.org/n?u=RePEc:ifs:cemmap:15/19&r=ecm
We propose to combine smoothing, simulations and sieve approximations to solve for either the integrated or expected value function in a general class of dynamic discrete choice (DDC) models. We use importance sampling to approximate the Bellman operators defining the two functions. The random Bellman operators, and therefore also the corresponding solutions, are generally non-smooth which is undesirable. To circumvent this issue, we introduce a smoothed version of the random Bellman operator and solve for the corresponding smoothed value function using sieve methods. We show that one can avoid using sieves by generalizing and adapting the “self-approximating” method of Rust (1997b) to our setting. We provide an asymptotic theory for the approximate solutions and show that they converge with vN-rate, where N is number of Monte Carlo draws, towards Gaussian processes. We examine their performance in practice through a set of numerical experiments and find that both methods perform well with the sieve method being particularly attractive in terms of computational speed and accuracy.
Dennis Kristensen
Patrick K. Mogensen
Jong-Myun Moon
Bertel Schjerning
Dynamic discrete choice; numerical solution; Monte Carlo; sieves
2019-04-03
Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data. Simulation study and application to perceived safety
http://d.repec.org/n?u=RePEc:osf:socarx:8hgjt&r=ecm
Open and crowdsourced data are becoming prominent in social sciences research. Crowdsourcing projects harness information from large crowds of citizens who voluntarily participate into one collaborative project, and allow new insights into people’s attitudes and perceptions. However, these are usually affected by a series of biases that limit their representativeness (i.e. self-selection bias, unequal participation, underrepresentation of certain areas and times). In this chapter we present a two-step method aimed to produce reliable small area estimates from crowdsourced data when no auxiliary information is available at the individual level. A non-parametric bootstrap, aimed to compute pseudosampling weights and bootstrap weighted estimates, is followed by an area-level model based small area estimation approach, which borrows strength from related areas based on a set of covariates, to improve the small area estimates. In order to assess the method, a simulation study and an application to safety perceptions in Greater London are conducted. The simulation study shows that the area-level model-based small area estimator under the non-parametric bootstrap improves (in terms of bias and variability) the small area estimates in the majority of areas. The application produces estimates of safety perceptions at a small geographical level in Greater London from Place Pulse 2.0 data. In the application, estimates are validated externally by comparing these to reliable survey estimates. Further simulation experiments and applications are needed to examine whether this method also improves the small area estimates when the sample biases are larger, smaller or show different distributions. A measure of reliability also needs to be developed to estimate the error of the small area estimates under the non-parametric bootstrap.
Buil-Gil, David
Solymosi, Reka
Moretti, Angelo
2019-10-02
Approximate Factor Models with Strongly Correlated Idiosyncratic Errors
http://d.repec.org/n?u=RePEc:arx:papers:1912.04123&r=ecm
We consider the estimation of approximate factor models for time series data, where strong serial and cross-sectional correlations amongst the idiosyncratic component are present. This setting comes up naturally in many applications, but existing approaches in the literature rely on the assumption that such correlations are weak, leading to mis-specification of the number of factors selected and consequently inaccurate inference. In this paper, we explicitly incorporate the dependent structure present in the idiosyncratic component through lagged values of the observed multivariate time series. We formulate a constrained optimization problem to estimate the factor space and the transition matrices of the lagged values {\em simultaneously}, wherein the constraints reflect the low rank nature of the common factors and the sparsity of the transition matrices. We establish theoretical properties of the obtained estimates, and introduce an easy-to-implement computational procedure for empirical work. The performance of the model and the implementation procedure is evaluated on synthetic data and compared with competing approaches, and further illustrated on a data set involving weekly log-returns of 75 US large financial institutions for the 2001-2016 period.
Jiahe Lin
George Michailidis
2019-12
New metrics for multiple testing with correlated outcomes
http://d.repec.org/n?u=RePEc:osf:osfxxx:k9g3b&r=ecm
We propose new metrics comparing the observed number of hypothesis test rejections ($\widehat{\theta}$) at an unpenalized $\alpha$-level to the distribution of rejections that would be expected if all tested null hypotheses held (the "global null"). Specifically, we propose reporting a "null interval'' for the number of $\alpha$-level rejections expected to occur in 95% of samples under the global null, the difference between $\widehat{\theta}$ and the upper limit of the null interval (the "excess hits"), and a one-sided joint test based on $\widehat{\theta}$ of the global null. For estimation, we describe resampling algorithms that asymptotically recover the sampling distribution under the global null. These methods accommodate arbitrarily correlated test statistics and do not require high-dimensional analyses. In a simulation study, we assess properties of the proposed metrics under varying correlation structures as well as their power for outcome-wide inference relative to existing FWER methods. We provide an R package, NRejections. Ultimately, existing procedures for multiple hypothesis testing typically penalize inference in each test, which is useful to temper interpretation of individual findings; yet on their own, these procedures do not fully characterize global evidence strength across the multiple tests. Our new metrics help remedy this limitation.
Mathur, Maya B
VanderWeele, Tyler
2018-09-01
Conditional moments of noncausal alpha-stable processes and the prediction of bubble crash odds
http://d.repec.org/n?u=RePEc:pra:mprapa:97353&r=ecm
Noncausal, or anticipative, alpha-stable processes generate trajectories featuring locally explosive episodes akin to speculative bubbles in financial time series data. For (X_t) a two-sided infinite alpha-stable moving average (MA), conditional moments up to integer order four are shown to exist provided (X_t) is anticipative enough. The functional forms of these moments at any forecast horizon under any admissible parameterisation are obtained by extending the literature on arbitrary bivariate alpha-stable random vectors. The dynamics of noncausal processes simplifies during explosive episodes and allows to express ex ante crash odds at any horizon in terms of the MA coefficients and of the tail index alpha. The results are illustrated in a synthetic portfolio allocation framework and an application to the Nasdaq and S&P500 series is provided.
Fries, Sébastien
Noncausal processes, Multivariate stable distributions, Conditional dependence, Extremal dependence, Explosive bubbles, Prediction, Crash odds, Portfolio allocation
2018-05
The Comparison of Methods for IndividualTreatment Effect Detection
http://d.repec.org/n?u=RePEc:pra:mprapa:97309&r=ecm
Today, treatment effect estimation at the individual level isa vital problem in many areas of science and business. For example, inmarketing, estimates of the treatment effect are used to select the mostefficient promo-mechanics; in medicine, individual treatment effects areused to determine the optimal dose of medication for each patient and soon. At the same time, the question on choosing the best method, i.e., themethod that ensures the smallest predictive error (for instance, RMSE)or the highest total (average) value of the effect, remains open. Accord-ingly, in this paper we compare the effectiveness of machine learningmethods for estimation of individual treatment effects. The comparisonis performed on the Criteo Uplift Modeling Dataset. In this paper weshow that the combination of the Logistic Regression method and theDifference Score method as well as Uplift Random Forest method pro-vide the best correctness of Individual Treatment Effect prediction onthe top 30% observations of the test dataset.
Semenova, Daria
Temirkaeva, Maria
Individual Treatment Effect; ITE; Machine Learning; Random Forest; XGBoost; SVM·Random; Experiments; A/B testing; Uplift Random Forest
2019-09-23
New metrics for meta-analyses of heterogeneous effects
http://d.repec.org/n?u=RePEc:osf:osfxxx:v37j6&r=ecm
We provide two simple metrics that could be reported routinely in random-effects meta-analyses to convey evidence strength for scientifically meaningful effects under effect heterogeneity (i.e., a nonzero estimated variance of the true effect distribution). First, given a chosen threshold of meaningful effect size, meta-analyses could report the estimated proportion of true effect sizes above this threshold. Second, meta-analyses could estimate the proportion of effect sizes below a second, possibly symmetric, threshold in the opposite direction from the estimated mean. These metrics could help identify if: (1) there are few effects of scientifically meaningful size despite a "statistically significant" pooled point estimate; (2) there are some large effects despite an apparently null point estimate; or (3) strong effects in the direction opposite the pooled estimate regularly also occur (and thus, potential effect modifiers should be examined). These metrics should be presented with confidence intervals, which can be obtained analytically or, under weaker assumptions, using bias-corrected and accelerated (BCa) bootstrapping. Additionally, these metrics inform relative comparison of evidence strength across related meta-analyses. We illustrate with applied examples and provide an R package to compute the metrics and confidence intervals.
Mathur, Maya B
VanderWeele, Tyler
2018-11-14
Sensitivity of Estimation Precision to Moments with an Application to a Model of Joint Retirement Planning of Couples
http://d.repec.org/n?u=RePEc:ifs:cemmap:36/19&r=ecm
This paper introduces measures for how each moment contributes to the precision of the parameter estimates in GMM settings. For example, one of the measures asks what would happen to the variance of the parameter estimates if a particular moment was dropped from the estimation. The measures are all easy to compute. We illustrate the usefulness of the measures through two simple examples as well as an application to a model of joint retirement planning of couples. We estimate the model using the UK-BHPS, and we ?nd evidence of complementarities in leisure. Our sensitivity measures illustrate that the precision of the estimate of the complementarity is primarily driven by the distribution of the di?erences in planned retirement dates. The estimated econometric model can be interpreted as a bivariate ordered choice model that allows for simultaneity. This makes the model potentially useful in other applications.
Bo E. Honoré
Thomas Jorgensen
Áureo de Paula
2019-07-05
Shrinkage priors for Bayesian penalized regression.
http://d.repec.org/n?u=RePEc:osf:osfxxx:cg8fq&r=ecm
In linear regression problems with many predictors, penalized regression techniques are often used to guard against overfitting and to select variables relevant for predicting the outcome. Classical regression techniques find coefficients that minimize a squared residual; penalized regression adds a penalty term to this residual to limit the coefficients’ sizes, thereby preventing over- fitting. Many classical penalization techniques have a Bayesian counterpart, which result in the same solutions when a specific prior distribution is used in combination with posterior mode estimates. Compared to classical penalization techniques, the Bayesian penalization techniques perform similarly or even better, and they offer additional advantages such as readily available uncertainty estimates, automatic estimation of the penalty parameter, and more flexibility in terms of penalties that can be considered. As a result, Bayesian penalization is becoming increasingly popular. The aim of this paper is to provide a comprehensive overview of the literature on Bayesian penalization. We will compare different priors for penalization that have been proposed in the literature in terms of their characteristics, shrinkage behavior, and performance in terms of prediction and variable selection in order to aid researchers to navigate the many prior options.
van Erp, Sara
Oberski, Daniel L.
Mulder, Joris
2018-01-31
A Maximum Likelihood Mixture Approach for Multivariate Hypothesis Testing in case of Incomplete Data
http://d.repec.org/n?u=RePEc:osf:osfxxx:kmg3r&r=ecm
Multivariate hypothesis testing becomes more and more necessary when data is in the process of changing from scalar and univariate format to multivariate format, especially financial and biological data is often constituted of n-dimension vectors. Likelihood ratio test is the best method that applies the test on mean of multivariate sample with known or unknown covariance matrix but it is impossible to use likelihood ratio test in case of incomplete data when the data incompletion gets popular because of many reasons in reality. Therefore, this research proposes a new approach that gives an ability to apply likelihood ratio test into incomplete data. Instead of replacing missing values in incomplete sample by estimated values, this approach classifies incomplete sample into groups and each group is represented by a potential or partial distribution. All partial distributions are unified into a mixture model which is optimized via expectation maximization (EM) algorithm. Finally, likelihood ratio test is performed on mixture model instead of incomplete sample. This research provides a thorough description of proposed approach and mathematical proof that is necessary to such approach. The comparison of mixture model approach and filling missing values approach is also discussed in this research.
Nguyen, Loc PhD, MD, MBA
2018-01-17
Recovering Latent Variables by Matching
http://d.repec.org/n?u=RePEc:arx:papers:1912.13081&r=ecm
We propose an optimal-transport-based matching method to nonparametrically estimate linear models with independent latent variables. The method consists in generating pseudo-observations from the latent variables, so that the Euclidean distance between the model's predictions and their matched counterparts in the data is minimized. We show that our nonparametric estimator is consistent, and we document that it performs well in simulated data. We apply this method to study the cyclicality of permanent and transitory income shocks in the Panel Study of Income Dynamics. We find that the dispersion of income shocks is approximately acyclical, whereas the skewness of permanent shocks is procyclical. By comparison, we find that the dispersion and skewness of shocks to hourly wages vary little with the business cycle.
Manuel Arellano
Stephane Bonhomme
2019-12
Identi?cation and estimation of dynamic structural models with unobserved choices
http://d.repec.org/n?u=RePEc:ifs:cemmap:35/19&r=ecm
This paper develops identi?cation and estimation methods for dynamic structural models when agents’ actions are unobserved by econometricians. We provide conditions under which choice probabilities and latent state transition rules are nonparametrically identi?ed with a continuous state variable in a single-agent dynamic discrete choice model. Our identi?cation results extend to (1) models with serially correlated unobserved heterogeneity and continuous choices, (2) cases in which only discrete state variables are available, and (3) dynamic discrete games. We apply our method to study moral hazard problems in US gubernatorial elections. We ?nd that the probabilities of shirking increase as the governors approach the end of their terms.
Yingyao Hu
Yi Xin
2019-06-18
Bayesian VAR forecasts, survey information and structural change in the euro area
http://d.repec.org/n?u=RePEc:bde:wpaper:1948&r=ecm
We incorporate external information extracted from the European Central Bank’s Survey of Professional Forecasters into the predictions of a Bayesian VAR, using entropic tilting and soft conditioning. The resulting conditional forecasts significantly improve the plain BVAR point and density forecasts. Importantly, we do not restrict the forecasts at a specific quarterly horizon, but their possible paths over several horizons jointly, as the survey information comes in the form of one- and two-year-ahead expectations. Besides improving the accuracy of the variable that we target, the spillover effects on “other-than-targeted” variables are relevant in size and statistically significant. We document that the baseline BVAR exhibits an upward bias for GDP growth after the financial crisis, and our results provide evidence that survey forecasts can help mitigate the effects of structural breaks on the forecasting performance of a popular macroeconometric model. Furthermore, we provide evidence of unstable VAR dynamics, especially during and after the recent Great Recession.
Gergely Ganics
Florens Odendahl
Survey of Professional Forecasters, density forecasts, entropic tilting, soft conditioning
2019-12
Estimating the Local Effectiveness of Institutions: A Latent-Variable Approach
http://d.repec.org/n?u=RePEc:ces:ceswps:_7979&r=ecm
This paper develops an approach that allows constructing regional proxies of government effectiveness at a highly dis-aggregated level. Our idea builds on the well documented interdependence between institutions and exports, which allows estimating the latent government effectiveness using methods of structural equation modeling. Unobserved institutional quality for each individual region is predicted using the regression outcomes. The impact of this novel identification strategy is tested using various panel applications. Results show that the magnitude of the effect of institutional quality can be biased in estimates that neglect regional differences in the effectiveness of institutions.
Antonia Reinecke
Hans-Jörg Schmerer
trade, institutions, Chinese plant data, latent variable, structural equation modelling
2019
Sparse demand systems: corners and complements
http://d.repec.org/n?u=RePEc:ifs:cemmap:45/19&r=ecm
We propose a demand model where consumers simultaneously choose a few di?erent goods from a large menu of available goods, and choose how much to consume of each good. The model nests multinomial discrete choice and continuous demand systems as special cases. Goods can be substitutes or complements. Random coe?cients are employed to capture the wide variation in the composition of consumption baskets. Non-negativity constraints produce corners that account for di?erent consumers purchasing di?erent numbers of types of goods. We show semiparametric identi?cation of the model. We apply the model to the demand for fruit in the United Kingdom. We estimate the model’s parameters using UK scanner data for 2008 from the Kantar World Panel. Using our parameter estimates, we estimate a matrix of demand elasticities for 27 categories of fruit and analyze a range of tax and policy change scenarios.
Arthur Lewbel
Lars Nesheim
2019-09-23
Logical Differencing in Dyadic Network Formation Models with Nontransferable Utilities
http://d.repec.org/n?u=RePEc:arx:papers:2001.00691&r=ecm
This paper considers a semiparametric model of dyadic network formation under nontransferable utilities (NTU). Such dyadic links arise frequently in real-world social interactions that require bilateral consent but by their nature induce additive non-separability. In our model we show how unobserved individual heterogeneity in the network formation model can be canceled out without requiring additive separability. The approach uses a new method we call logical differencing. The key idea is to construct an observable event involving the intersection of two mutually exclusive restrictions on the fixed effects, while these restrictions are as necessary conditions of weak multivariate monotonicity. Based on this identification strategy we provide consistent estimators of the network formation model under NTU. Finite-sample performance of our method is analyzed in a simulation study, and an empirical illustration using the risk-sharing network data from Nyakatoke demonstrates that our proposed method is able to obtain economically intuitive estimates.
Wayne Yuan Gao
Ming Li
Sheng Xu
2020-01
Semi-Parametric Efficient Policy Learning with Continuous Actions
http://d.repec.org/n?u=RePEc:ifs:cemmap:34/19&r=ecm
We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation.
Mert Demirer
Vasilis Syrgkanis
Greg Lewis
Victor Chernozhukov
2019-06-12
Estimation of a semiparametric transformation model: a novel approach based on least squares minimization
http://d.repec.org/n?u=RePEc:ete:kbiper:630751&r=ecm
Benjamin Colling
Ingrid Van Keilegom
2018-12
From fixed-event to fixed-horizon density forecasts: obtaining measures of multi-horizon uncertainty from survey density forecasts
http://d.repec.org/n?u=RePEc:bde:wpaper:1947&r=ecm
Surveys of Professional Forecasters produce precise and timely point forecasts for key macroeconomic variables. However, the accompanying density forecasts are not as widely utilized, and there is no consensus about their quality. This is partly because such surveys are often conducted for “fixed events”. For example, in each quarter panelists are asked to forecast output growth and inflation for the current calendar year and the next, implying that the forecast horizon changes with each survey round. The fixed-event nature limits the usefulness of survey density predictions for policymakers and market participants, who often wish to characterize uncertainty a fixed number of periods ahead (“fixed-horizon”). Is it possible to obtain fixed-horizon density forecasts using the available fixed-event ones? We propose a density combination approach that weights fixed-event density forecasts according to a uniformity of the probability integral transform criterion, aiming at obtaining a correctly calibrated fixed-horizon density forecast. Using data from the US Survey of Professional Forecasters, we show that our combination method produces competitive density forecasts relative to widely used alternatives based on historical forecast errors or Bayesian VARs. Thus, our proposed fixed-horizon predictive densities are a new and useful tool for researchers and policy makers.
Gergely Ganics
Barbara Rossi
Tatevik Sekhposyan
Survey of Professional Forecasters, density forecasts, forecast combination, predictive density, probability integral transform, uncertainty, real-time
2019-12
The impact of incomplete data on quantile regression for longitudinal data
http://d.repec.org/n?u=RePEc:ete:kbiper:632541&r=ecm
Anneleen Verhasselt
Alvaro J Flórez
Ingrid Van Keilegom
Geert Molenberghs
2019-01
Non-parametric cure rate estimation under insufficient follow-up using extremes
http://d.repec.org/n?u=RePEc:ete:kbiper:630759&r=ecm
Mkael Escobar
Ingrid Van Keilegom
2018-12
On relaxing the distributional assumption of stochastic frontier models
http://d.repec.org/n?u=RePEc:ete:kbiper:632085&r=ecm
Hohsuk Noh
Ingrid Van Keilegom
2019-01
The nonparametric location-scale mixture cure model
http://d.repec.org/n?u=RePEc:ete:kbiper:630665&r=ecm
Justin Chown
Cédric Heuchenne
Ingrid Van Keilegom
2018-12
Adaptive Dynamic Model Averaging with an Application to House Price Forecasting
http://d.repec.org/n?u=RePEc:arx:papers:1912.04661&r=ecm
Dynamic model averaging (DMA) combines the forecasts of a large number of dynamic linear models (DLMs) to predict the future value of a time series. The performance of DMA critically depends on the appropriate choice of two forgetting factors. The first of these controls the speed of adaptation of the coefficient vector of each DLM, while the second enables time variation in the model averaging stage. In this paper we develop a novel, adaptive dynamic model averaging (ADMA) methodology. The proposed methodology employs a stochastic optimisation algorithm that sequentially updates the forgetting factor of each DLM, and uses a state-of-the-art non-parametric model combination algorithm from the prediction with expert advice literature, which offers finite-time performance guarantees. An empirical application to quarterly UK house price data suggests that ADMA produces more accurate forecasts than the benchmark autoregressive model, as well as competing DMA specifications.
Alisa Yusupova
Nicos G. Pavlidis
Efthymios G. Pavlidis
2019-12
Inference for covariate-adjusted semiparametric Gaussian copula model using residual ranks
http://d.repec.org/n?u=RePEc:ete:kbiper:630763&r=ecm
Irène Gijbels
Ingrid Van Keilegom
Yue Zhao
2018-12
Linear censored quantile regression: a novel minimum-distance approach
http://d.repec.org/n?u=RePEc:ete:kbiper:630769&r=ecm
Mickaël De Backer
Anouar El Ghouch
Ingrid Van Keilegom
2018-12
Synthetic Control Inference for Staggered Adoption: Estimating the Dynamic Effects of Board Gender Diversity Policies
http://d.repec.org/n?u=RePEc:arx:papers:1912.06320&r=ecm
We introduce a synthetic control methodology to study policies with staggered adoption. Many policies, such as the board gender quota, are replicated by other policy setters at different time frames. Our method estimates the dynamic average treatment effects on the treated using variation introduced by the staggered adoption of policies. Our method gives asymptotically unbiased estimators of many interesting quantities and delivers asymptotically valid inference. By using the proposed method and national labor data in Europe, we find evidence that quota regulation on board diversity leads to a decrease in part-time employment, and an increase in full-time employment for female professionals.
Jianfei Cao
Shirley Lu
2019-12
Econometrics with Partial Identification
http://d.repec.org/n?u=RePEc:ifs:cemmap:25/19&r=ecm
Econometrics has traditionally revolved around point identi cation. Much effort has been devoted to finding the weakest set of assumptions that, together with the available data, deliver point identifi cation of population parameters, finite or infi nite dimensional that these might be. And point identifi cation has been viewed as a necessary prerequisite for meaningful statistical inference. The research program on partial identifi cation has begun to slowly shift this focus in the early 1990s, gaining momentum over time and developing into a widely researched area of econometrics. Partial identifi cation has forcefully established that much can be learned from the available data and assumptions imposed because of their credibility rather than their ability to yield point identifi cation. Within this paradigm, one obtains a set of values for the parameters of interest which are observationally equivalent given the available data and maintained assumptions. I refer to this set as the parameters' sharp identifi cation region. Econometrics with partial identi fication is concerned with: (1) obtaining a tractable characterization of the parameters' sharp identi fication region; (2) providing methods to estimate it; (3) conducting test of hypotheses and making con fidence statements about the partially identi fied parameters. Each of these goals poses challenges that differ from those faced in econometrics with point identifi cation. This chapter discusses these challenges and some of their solution. It reviews advances in partial identifi cation analysis both as applied to learning (functionals of) probability distributions that are well-defi ned in the absence of models, as well as to learning parameters that are well-defi ned only in the context of particular models. The chapter highlights a simple organizing principle: the source of the identi fication problem can often be traced to a collection of random variables that are consistent with the available data and maintained assumptions. This collection may be part of the observed data or be a model implication. In either case, it can be formalized as a random set. Random set theory is then used as a mathematical framework to unify a number of special results and produce a general methodology to conduct econometrics with partial identi fication.
Francesca Molinari
2019-05-31
A general approach for cure models in survival analysis
http://d.repec.org/n?u=RePEc:ete:kbiper:632076&r=ecm
Valentin Patilea
Ingrid Van Keilegom
2019-01
Goodness-of-fit test for a parametric survival function with cure fraction
http://d.repec.org/n?u=RePEc:ete:kbiper:630753&r=ecm
Candida Geerdens
Paul Janssen
Ingrid Van Keilegom
2018-12