Econometrics
http://lists.repec.org/mailman/listinfo/nep-ecm
Econometrics
2018-02-12
Bayesian Inference in Spatial Sample Selection Models
http://d.repec.org/n?u=RePEc:pra:mprapa:82829&r=ecm
In this study, we consider Bayesian methods for the estimation of a sample selection model with spatially correlated disturbance terms. We design a set of Markov chain Monte Carlo (MCMC) algorithms based on the method of data augmentation. The natural parameterization for the covariance structure of our model involves an unidentified parameter that complicates posterior analysis. The unidentified parameter -- the variance of the disturbance term in the selection equation -- is handled in different ways in these algorithms to achieve identification for other parameters. The Bayesian estimator based on these algorithms can account for the selection bias and the full covariance structure implied by the spatial correlation. We illustrate the implementation of these algorithms through a simulation study.
Dogan, Osman
Taspinar, Suleyman
Spatial dependence, Spatial sample selection model, Bayesian analysis, Data augmentation
2016-12-16
Inference with difference-in-differences with a small number of groups: a review, simulation study and empirical application using SHARE data
http://d.repec.org/n?u=RePEc:qub:charms:1801&r=ecm
Difference-in-differences (DID) estimation has become increasingly popular as an approach to evaluate the effect of a group-level policy on individual-level outcomes. Several statistical methodologies have been proposed to correct for the within-group correlation of model errors resulting from the clustering of data. Little is known about how well these corrections perform with the often small number of groups observed in health research using longitudinal data. First, we review the most commonly used modelling solutions in DID estimation for panel data, including generalized estimating equations (GEE), permutation tests, clustered standard errors (CSE), wild cluster bootstrapping, and aggregation. Second, we compare the empirical coverage rates and power of these methods using a Monte Carlo simulation study in scenarios in which we vary the degree of error correlation, the group size balance, and the proportion of treated groups. Third, we provide an empirical example using the Survey of Health, Ageing and Retirement in Europe (SHARE). When the number of groups is small, CSE are systematically biased downwards in scenarios when data are unbalanced or when there is a low proportion of treated groups. This can result in over-rejection of the null even when data are composed of up to 50 groups. Aggregation, permutation tests, bias-adjusted GEE and wild cluster bootstrap produce coverage rates close to the nominal rate for almost all scenarios, though GEE may suffer from low power. In DID estimation with a small number of groups, analysis using aggregation, permutation tests, wild cluster bootstrap, or bias-adjusted GEE is recommended.
Slawa Rokicki
Jessica Cohen
Gunther Fink
Joshua Salomon
Mary Beth Landrum
Difference-in-differences; Clustered standard errors; Inference; Monte Carlo simulation; GEE
2018-01
GMM Gradient Tests for Spatial Dynamic Panel Data Models
http://d.repec.org/n?u=RePEc:pra:mprapa:82830&r=ecm
In this study, we formulate the adjusted gradient tests when the alternative model used to construct tests deviates from the true data generating process for a spatial dynamic panel data model (SDPD). Following Bera et. al. (2010), we introduce these adjusted gradient tests along with the standard ones within a GMM framework. These tests can be used to detect the presence of (i) the contemporaneous spatial lag terms, (ii) the time lag term, and (iii) the spatial time lag terms in an higher order SDPD model. These adjusted tests have two advantages: (i) their null asymptotic distribution is a central chi-squared distribution irrespective of the mis-specified alternative model, and (ii) their test statistics are computationally simple and require only the ordinary least-squares (OLS) estimates from a non-spatial two-way panel data model. We investigate the finite sample size and power properties of these tests through Monte Carlo studies. Our results indicates that the adjusted gradient tests have good finite sample properties.
Taspinar, Suleyman
Dogan, Osman
Bera, Anil K.
Spatial Dynamic Panel Data Model, SDPD, GMM, Robust LM Tests, GMM Gradient Tests, Inference
2017
Forbidden zones for the expectation of a random variable. New version 1
http://d.repec.org/n?u=RePEc:pra:mprapa:84248&r=ecm
A forbidden zones theorem is deduced in the present article. Its consequences and applications are preliminary considered. The following statement is proven: if some non-zero lower bound exists for the variance of a random variable, that takes on values in a finite interval, then non-zero bounds or forbidden zones exist for its expectation near the boundaries of the interval. The article is motivated by the need of rigorous theoretical support for the practical analysis that has been performed for the influence of scattering and noise in the behavioral economics, decision sciences, utility and prospect theories. If a noise can be one of possible causes of the above lower bound on the variance, then it can cause or broaden out such forbidden zones. So the theorem can provide new possibilities for mathematical description of the influence of such a noise. The considered forbidden zones can evidently lead to some biases in measurements.
Harin, Alexander
probability; variance; noise; utility theory; prospect theory; behavioral economics; decision sciences; measurement;
2018-01-29
Nonparametric imputation by data depth
http://d.repec.org/n?u=RePEc:crs:wpaper:2017-72&r=ecm
The presented methodology for single imputation of missing values borrows the idea from data depth — a measure of centrality defined for an arbitrary point of the space with respect to a probability distribution or a data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. On each single iteration, imputation is narrowed down to optimization of quadratic, linear, or quasiconcave function being solved analytically, by linear programming, or the Nelder-Mead method, respectively. Being able to grasp the underlying data topology, the procedure is distribution free, allows to impute close to the data, preserves prediction possibilities different to local imputation methods (k-nearest neighbors, random forest), and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that its particular case — when using Mahalanobis depth — has direct connection to well known treatments for multivariate normal model, such as iterated regression or regularized PCA. The methodology is extended to the multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies positively contrast the procedure with existing popular alternatives. The method has been implemented as an R-package.
Pavlo Mozharovskyi
Julie Josse
François Husson
Elliptical symmetry, Outliers, Tukey depth, Zonoid depth, Nonparametric imputation, Convex optimization
2017-12-14
A New Baseline Model for Estimating Willingness to Pay from Discrete Choice Models
http://d.repec.org/n?u=RePEc:war:wpaper:2018-04&r=ecm
We show a substantive problem exists with the widely-used ratio of coefficients approach to calculating willingness to pay (WTP) from choice models. The correctly calculated standard error for WTP using this approach is shown to always be infinity. A variant of this problem has long been recognized for mixed logit models. We show it occurs even in simple models like the conditional logit used as a baseline reference specification. It occurs because the standard error for the cost parameter implies some possibility that the true parameter value is arbitrarily close to zero. We propose a simple yet elegant way to overcome this problem by reparameterizing the coefficient of the (negative) cost variable to enforce the theoretically correct (and empirically almost always found) positive coefficient using an exponential transformation of the original parameter. This reparameterization enforces the desired restriction that no-part of the confidence region for original cost parameter spans zero. With it the confidence interval for WTP is now finite and well behaved. Our proposed model is straightforward to implement using readily available software. Its log-likelihood value is the same as the usual baseline discrete choice model and we recommend its use as the new standard baseline reference model.
Richard T. Carson
Mikołaj Czajkowski
conditional logit, confidence intervals, contingent valuation delta method, discrete choice experiment, Krinsky-Robb, multinomial logit, probit, welfare measures
2018
Errors-in-Variables Models with Many Proxies
http://d.repec.org/n?u=RePEc:usi:wpaper:774&r=ecm
This paper introduces a novel method to estimate linear models when explanatory variables are observed with error and many proxies are available. The empirical Euclidean likelihood principle is used to combine the information that comes from the various mismeasured variables. We show that the proposed estimator is consistent and asymptotically normal. In a Monte Carlo study we show that our method is able to efficiently use the information in the available proxies, both in terms of precision of the estimator and in terms of statistical power. An application to the effect of police on crime suggests that measurement errors in the police variable induce substantial attenuation bias. Our approach, on the other hand, yields large estimates in absolute value with high precision, in accordance with the results put forward by the recent literature.
Federico Crudu
data combination, empirical Euclidean likelihood, errors-in-variables, instrumental variables.
2017-01
A Second Order Cumulant Spectrum Based Test for Strict Stationarity
http://d.repec.org/n?u=RePEc:arx:papers:1801.06727&r=ecm
This article develops a statistical test for the null hypothesis of strict stationarity of a discrete time stochastic process. When the null hypothesis is true, the second order cumulant spectrum is zero at all the discrete Fourier frequency pairs present in the principal domain of the cumulant spectrum. The test uses a frame (window) averaged sample estimate of the second order cumulant spectrum to build a test statistic that has an asymptotic complex standard normal distribution. We derive the test statistic, study the size and power properties of the test, and demonstrate its implementation with intraday stock market return data. The test has conservative size properties and good power to detect varying variance and unit root in the presence of varying variance.
Douglas Patterson
Melvin Hinich
Denisa Roberts
2018-01
Structural analysis with mixed-frequency data: A MIDAS-SVAR model of US capital flows
http://d.repec.org/n?u=RePEc:arx:papers:1802.00793&r=ecm
We develop a new VAR model for structural analysis with mixed-frequency data. The MIDAS-SVAR model allows to identify structural dynamic links exploiting the information contained in variables sampled at different frequencies. It also provides a general framework to test homogeneous frequency-based representations versus mixed-frequency data models. A set of Monte Carlo experiments suggests that the test performs well both in terms of size and power. The MIDAS-SVAR is then used to study how monetary policy and financial market volatility impact on the dynamics of gross capital inflows to the US. While no relation is found when using standard quarterly data, exploiting the variability present in the series within the quarter shows that the effect of an interest rate shock is greater the longer the time lag between the month of the shock and the end of the quarter
Emanuele Bacchiocchi
Andrea Bastianin
Alessandro Missale
Eduardo Rossi
2018-02
Testing the Number of Regimes in Markov Regime Switching Models
http://d.repec.org/n?u=RePEc:arx:papers:1801.06862&r=ecm
Markov regime switching models have been used in numerous empirical studies in economics and finance. However, the asymptotic distribution of the likelihood ratio test statistic for testing the number of regimes in Markov regime switching models has been an unresolved problem. This paper derives the asymptotic distribution of the likelihood ratio test statistic for testing the null hypothesis of $M_0$ regimes against the alternative hypothesis of $M_0 + 1$ regimes for any $M_0 \geq 1$ both under the null hypothesis and under local alternatives. We show that the contiguous alternatives converge to the null hypothesis at a rate of $n^{-1/8}$ in regime switching models with normal density. The asymptotic validity of the parametric bootstrap is also established.
Hiroyuki Kasahara
Katsumi Shimotsu
2018-01
Estimating linear functionals of a sparse family of Poisson means Price Discrimination
http://d.repec.org/n?u=RePEc:crs:wpaper:2017-19&r=ecm
Assume that we observe a sample of size n composed of p-dimensional signals, each signal having independent entries drawn from a scaled Poisson distribution with an unknown intensity. We are interested in estimating the sum of the n unknown intensity vectors, under the assumption that most of them coincide with a given "background" signal. The number s of p-dimensional signals different from the background signal plays the role of sparsity and the goal is to leverage this sparsity assumption in order to improve the quality of estimation as compared to the naive estimator that computes the sum of the observed signals. We first introduce the group hard thresholding estimator and analyze its mean squared error measured by the squared Euclidean norm. We establish a nonasymptotic upper bound showing that the risk is at most of the order of thetha^2(sp + s^2 * sqrt(p)) log^3/2(np). We then establish lower bounds on the minimax risk over a properly defined class of collections of s-sparse signals. These lower bounds match with the upper bound, up to logarithmic terms, when the dimension p is fixed or of larger order than s^2. In the case where the dimension p increases but remains of smaller order than s^2, our results show a gap between the lower and the upper bounds, which can be up to order sqrt(p).
Olivier Collier
Arnak Dalalyan
Nonasymptotic minimax estimation, linear functional, group-sparsity, thresholding, Poisson processes
2017-12-05
Constructing confidence sets for the matrix completion problem
http://d.repec.org/n?u=RePEc:crs:wpaper:2017-41&r=ecm
In the present note we consider the problem of constructing honest and adaptive confidence sets for the matrix completion problem. For the Bernoulli model with known variance of the noise we provide a realizable method for constructing con dence sets that adapt to the unknown rank of the true matrix.
Alexandra Carpentier
Olga Klopp
Matthias Löffler
low rank recovery, confidence sets, adaptivity, matrix completion
2017-12-08
Machine learning for time series forecasting - a simulation study
http://d.repec.org/n?u=RePEc:zbw:iwqwdp:022018&r=ecm
We present a comprehensive simulation study to assess and compare the performance of popular machine learning algorithms for time series prediction tasks. Specifically, we consider the following algorithms: multilayer perceptron (MLP), logistic regression, naïve Bayes, k-nearest neighbors, decision trees, random forests, and gradient-boosting trees. These models are applied to time series from eight data generating processes (DGPs) - reflecting different linear and nonlinear dependencies (base case). Additional complexity is introduced by adding discontinuities and varying degrees of noise. Our findings reveal that advanced machine learning models are capable of approximating the optimal forecast very closely in the base case, with nonlinear models in the lead across all DGPs - particularly the MLP. By contrast, logistic regression is remarkably robust in the presence of noise, thus yielding the most favorable accuracy metrics on raw data, prior to preprocessing. When introducing adequate preprocessing techniques, such as first differencing and local outlier factor, the picture is reversed, and the MLP as well as other nonlinear techniques once again become the modeling techniques of choice.
Fischer, Thomas
Krauss, Christopher
Treichel, Alex
2018
Nonseparable Sample Selection Models with Censored Selection Rules
http://d.repec.org/n?u=RePEc:arx:papers:1801.08961&r=ecm
We consider identification and estimation of nonseparable sample selection models with censored selection rules. We employ a control function approach and discuss different objects of interest based on (1) local effects conditional on the control function, and (2) global effects obtained from integration over ranges of values of the control function. We provide conditions under which these objects are appropriate for the total population. We also present results regarding the estimation of counterfactual distributions. We derive conditions for identification for these different objects and suggest strategies for estimation. We also provide the associated asymptotic theory. These strategies are illustrated in an empirical investigation of the determinants of female wages and wage growth in the United Kingdom.
Iv\'an Fern\'andez-Val
Aico van Vuuren
Francis Vella
2018-01
Nonfractional Memory: Filtering, Antipersistence, and Forecasting
http://d.repec.org/n?u=RePEc:arx:papers:1801.06677&r=ecm
The fractional difference operator remains to be the most popular mechanism to generate long memory due to the existence of efficient algorithms for their simulation and forecasting. Nonetheless, there is no theoretical argument linking the fractional difference operator with the presence of long memory in real data. In this regard, one of the most predominant theoretical explanations for the presence of long memory is cross-sectional aggregation of persistent micro units. Yet, the type of processes obtained by cross-sectional aggregation differs from the one due to fractional differencing. Thus, this paper develops fast algorithms to generate and forecast long memory by cross-sectional aggregation. Moreover, it is shown that the antipersistent phenomenon that arises for negative degrees of memory in the fractional difference literature is not present for cross-sectionally aggregated processes. Pointedly, while the autocorrelations for the fractional difference operator are negative for negative degrees of memory by construction, this restriction does not apply to the cross-sectional aggregated scheme. We show that this has implications for long memory tests in the frequency domain, which will be misspecified for cross-sectionally aggregated processes with negative degrees of memory. Finally, we assess the forecast performance of high-order $AR$ and $ARFIMA$ models when the long memory series are generated by cross-sectional aggregation. Our results are of interest to practitioners developing forecasts of long memory variables like inflation, volatility, and climate data, where aggregation may be the source of long memory.
J. Eduardo Vera-Vald\'es
2018-01
Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs?
http://d.repec.org/n?u=RePEc:wiw:wiwwuw:wuwp260&r=ecm
We assess the relationship between model size and complexity in the time-varying parameter VAR framework via thorough predictive exercises for the Euro Area, the United Kingdom and the United States. It turns out that sophisticated dynamics through drifting coefficients are important in small data sets while simpler models tend to perform better in sizeable data sets. To combine best of both worlds, novel shrinkage priors help to mitigate the curse of dimensionality, resulting in competitive forecasts for all scenarios considered. Furthermore, we discuss dynamic model selection to improve upon the best performing individual model for each point in time.
Martin Feldkircher
Florian Huber
Gregor Kastner
Global-local shrinkage priors, density predictions, hierarchical modeling, stochastic volatility, dynamic model selection
2018-01