|
on Econometrics |
By: | James G. MacKinnon (Queen's University); Morten Ørregaard Nielsen (Aarhus University); Matthew D. Webb (Carleton University) |
Abstract: | We provide new and computationally attractive methods, based on jackknifing by cluster, to obtain cluster-robust variance matrix estimators (CRVEs) for linear regression models estimated by least squares. These estimators have previously been computationally infeasible except for small samples. We also propose several new variants of the wild cluster bootstrap, which involve the new CRVEs, jackknife-based bootstrap data-generating processes, or both. Extensive simulation experiments suggest that the new methods can provide much more reliable inferences than existing ones in cases where the latter are not trustworthy, such as when the number of clusters is small and/or cluster sizes vary substantially. |
Keywords: | bootstrap, clustered data, grouped data, cluster-robust variance estimator, CRVE, cluster sizes, wild cluster bootstrap |
JEL: | C10 C12 C21 C23 |
Date: | 2022–04 |
URL: | http://d.repec.org/n?u=RePEc:qed:wpaper:1485&r= |
By: | Shuowen Chen |
Abstract: | Fixed effect estimators of nonlinear panel data models suffer from the incidental parameter problem. This leads to two undesirable consequences in applied research: (1) point estimates are subject to large biases, and (2) confidence intervals have incorrect coverages. This paper proposes a simulation-based method for bias reduction. The method simulates data using the model with estimated individual effects, and finds values of parameters by equating fixed effect estimates obtained from observed and simulated data. The asymptotic framework provides consistency, bias correction, and asymptotic normality results. An application and simulations to female labor force participation illustrates the finite-sample performance of the method. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.10683&r= |
By: | Krebs, Johannes; Rademacher, Daniel; von Sachs, Rainer |
Abstract: | In this paper we treat statistical inference for an intrinsic wavelet estimator of curves of symmetric positive definite (SPD) matrices in a log-Euclidean manifold. This estimator preserves positive-definiteness and enjoys permutation-equivariance, which is particularly relevant for covariance matrices. Our second-generation wavelet estimator is based on average-interpolation and allows the same powerful properties, including fast algorithms, known from nonparametric curve estimation with wavelets in standard Euclidean set-ups. The core of our work is the proposition of confidence sets for our high-level wavelet estimator in a non-Euclidean geometry. We derive asymptotic normality of this estimator, including explicit expressions of its asymptotic variance. This opens the door for constructing asymptotic confidence regions which we compare with our proposed bootstrap scheme for inference. Detailed numerical simulations confirm the appropriateness of our suggested inference schemes. |
Keywords: | Asymptotic normality ; Average interpolation ; Covariance matrices ; Intrinsic polynomials ; log-Euclidean manifold ; SPD matrices ; Matrix-valued curves ; Nonparametric inference ; Second generation wavelets |
Date: | 2022–02–14 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2022004&r= |
By: | Pesaran, M. H.; Pick, A.; Timmermann, A. |
Abstract: | We develop novel forecasting methods for panel data with heterogeneous parameters and examine them together with existing approaches. We conduct a systematic comparison of their predictive accuracy in settings with different cross-sectional (N) and time (T) dimensions and varying degrees of parameter heterogeneity. We investigate conditions under which panel forecasting methods can perform better than forecasts based on individual estimates and demonstrate how gains in predictive accuracy depend on the degree of parameter heterogeneity, whether heterogeneity is correlated with the regressors, the goodness of fit of the model, and, particularly, the time dimension of the data set. We propose optimal combination weights for forecasts based on pooled and individual estimates and develop a novel forecast poolability test that can be used as a pretesting tool. Through a set of Monte Carlo simulations and three empirical applications to house prices, CPI inflation, and stock returns, we show that no single forecasting approach dominates uniformly. However, forecast combination and shrinkage methods provide better overall forecasting performance and offer more attractive risk profiles compared to individual, pooled, and random effects methods. |
Keywords: | Forecasting, Panel data, Heterogeneity, Forecast evaluation, Forecast combination, Shrinkage, Pooling |
JEL: | C33 C53 |
Date: | 2022–03–21 |
URL: | http://d.repec.org/n?u=RePEc:cam:camdae:2219&r= |
By: | Nicolás Ronderos Pulido |
Abstract: | This Article proposes an iterative process to reduce bias of the instrumental variable estimator using large samples. One can achieve bias reduction by modifying the data of an instrument. Success in reducing bias depends on the magnitude of exogeneity in the instrument. The common exogeneity between the instrument and the problem variable must be greater than the common endogeneity between the same variables. If the instrument is weakly exogenous, the iterative process will not affect the estimates. Empirically, the iterative process requires searching for parameters that allow minimizing bias. This paper presents a convergence algorithm to search for parameters. The search algorithm and statistical inference are based on bootstrapping techniques. The results are presented in a simulation context. |
Keywords: | instrumental variables, bias correction |
JEL: | C26 C40 |
Date: | 2022–04–27 |
URL: | http://d.repec.org/n?u=RePEc:col:000108:020049&r= |
By: | Karoline Bax; Emanuele Taufer; Sandra Paterlini |
Abstract: | The Markowitz model is still the cornerstone of modern portfolio theory. In particular, when focusing on the minimum-variance portfolio, the covariance matrix or better its inverse, the so-called precision matrix, is the only input required. So far, most scholars worked on improving the estimation of the input, however little attention has been given to the limitations of the inverse covariance matrix when capturing the dependence structure in a non-Gaussian setting. While the precision matrix allows to correctly understand the conditional dependence structure of random vectors in a Gaussian setting, the inverse of the covariance matrix might not necessarily result in a reliable source of information when Gaussianity fails. In this paper, exploiting the local dependence function, different definitions of the generalized precision matrix (GPM), which holds for a general class of distributions, are provided. In particular, we focus on the multivariate t-Student distribution and point out that the interaction in random vectors does not depend only on the inverse of the covariance matrix, but also on additional elements. We test the performance of the proposed GPM using a minimum-variance portfolio set-up by considering S\&P 100 and Fama and French industry data. We show that portfolios relying on the GPM often generate statistically significant lower out-of-sample variances than state-of-art methods. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.13740&r= |
By: | Taoufik Bouezmarni; Mohamed Doukali; Abderrahim Taamouti |
Abstract: | COVID-19 has created an unprecedented global health crisis that caused millions of infections and deaths worldwide. Many, however, argue that pre-existing social inequalities have led to inequalities in infection and death rates across social classes, with the most-deprived classes are worst hit. In this paper, we derive semi/non-parametric estimators of Health Concentration Curve (HC) that can quantify inequalities in COVID-19 infections and deaths and help identify the social classes that are most at risk of infection and dying from the virus. We express HC in terms of copula function that we use to build our estimators of HC. For the semi-parametric estimator, a parametric copula is used to model the dependence between health and socio-economic variables. The copula function is estimated using maximum pseudo-likelihood estimator after replacing the cumulative distribution of health variable by its empirical analogue. For the non-parametric estimator, we replace the copula function by a Bernstein copula estimator. Furthermore, we use the above estimators of HC to derive copula-based estimators of health Gini coeffcient. We establish the consistency and the asymptotic normality of HC’s estimators. Using different data-generating processes and sample sizes, a Monte-Carlo simulation exercise shows that the semiparametric estimator outperforms the smoothed nonparametric estimator, and that the latter does better than the empirical estimator in terms of Integrated Mean Squared Error. Finally, we run an extensive empirical study to illustrate the importance of HC’s estimators for investigating inequality in COVID-19 infections and deaths in the U.S. The empirical results show that the inequalities in state’s socio-economic variables like poverty, race/ethnicity, and economic prosperity are behind the observed inequalities in the U.S.’s COVID-19 infections and deaths. To quote this document: Bouezmarni T., Doukali M. and Taamouti A. (2022). Copula-based estimation of health concentration curves with an application to COVID-19 (2022s-07, CIRANO). https://doi.org/10.54932/MTKJ3339 COVID-19 has created an unprecedented global health crisis that caused millions of infections and deaths worldwide. Many, however, argue that pre-existing social inequalities have led to inequalities in infection and death rates across social classes, with the most-deprived classes are worst hit. In this paper, we derive semi/non-parametric estimators of Health Concentration Curve (HC) that can quantify inequalities in COVID-19 infections and deaths and help identify the social classes that are most at risk of infection and dying from the virus. We express HC in terms of copula function that we use to build our estimators of HC. For the semi-parametric estimator, a parametric copula is used to model the dependence between health and socio-economic variables. The copula function is estimated using maximum pseudo-likelihood estimator after replacing the cumulative distribution of health variable by its empirical analogue. For the non-parametric estimator, we replace the copula function by a Bernstein copula estimator. Furthermore, we use the above estimators of HC to derive copula-based estimators of health Gini coeffcient. We establish the consistency and the asymptotic normality of HC’s estimators. Using different data-generating processes and sample sizes, a Monte-Carlo simulation exercise shows that the semiparametric estimator outperforms the smoothed nonparametric estimator, and that the latter does better than the empirical estimator in terms of Integrated Mean Squared Error. Finally, we run an extensive empirical study to illustrate the importance of HC’s estimators for investigating inequality in COVID-19 infections and deaths in the U.S. The empirical results show that the inequalities in state’s socio-economic variables like poverty, race/ethnicity, and economic prosperity are behind the observed inequalities in the U.S.’s COVID-19 infections and deaths. Pour citer ce document: Bouezmarni T., Doukali M. and Taamouti A. (2022). Copula-based estimation of health concentration curves with an application to COVID-19 (2022s-07, CIRANO). https://doi.org/10.54932/MTKJ3339 |
Keywords: | Health concentration curve,Gini coeffcient, inequality,copula,semi/non-parametric estimators,COVID-19 infections and deaths, Courbe de concentration de la santé,coefficient de Gini,inégalité,copule,estimateurs semi-/non paramétriques,infections et décès COVID-19 |
JEL: | C13 C14 I14 |
Date: | 2022–04–04 |
URL: | http://d.repec.org/n?u=RePEc:cir:cirwor:2022s-07&r= |
By: | Marko Mlikota; Frank Schorfheide |
Abstract: | Modern macroeconometrics often relies on time series models for which it is time-consuming to evaluate the likelihood function. We demonstrate how Bayesian computations for such models can be drastically accelerated by reweighting and mutating posterior draws from an approximating model that allows for fast likelihood evaluations, into posterior draws from the model of interest, using a sequential Monte Carlo (SMC) algorithm. We apply the technique to the estimation of a vector autoregression with stochastic volatility and a nonlinear dynamic stochastic general equilibrium model. The runtime reductions we obtain range from 27% to 88%. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.07070&r= |
By: | Tomás Caravello; Zacharias Psaradakis; Martín Sola |
Abstract: | Issues that arise in the practical implementation of the Phillips, Wu, and Yu (2011) and Phillips, Shi, and Yu (2015a) recursive procedures for identifying and dating explosive bubbles in time-series data are considered. It is argued that the standard practice of using conventional levels of significance for critical values involved in the algorithms that locate the origination and termination dates of explosive episodes lead to false discoveries of explosiveness with large probability. In addition, the use of critical values for right-tailed unit-root tests obtained under the assumption of a drift whose magnitude depends on the sample size and becomes negligible in large samples result in over-rejection of the unit-root hypothesis when, as in many financial time series, the drift effect is non-negligible relatively to the stochastic trend. The magnitude of these difficulties is quantified via simulations, using artificial data whose stochastic properties reflect closely those of real-world time series such as asset prices and dividends. The findings offer a potential explanation for the relatively large number of apparent explosive episodes that are often reported in applied work. An empirical illustration involving monthly data on U.S. real stock prices and real dividends is also discussed. |
Keywords: | Bubbles, Date-stamping, Explosive behaviour, Recursive, Unit-root test. |
JEL: | C12 C15 C22 |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:udt:wpecon:2021_06&r= |
By: | Christophe Bell\'ego; David Benatia; Louis Pape |
Abstract: | Log-linear models are prevalent in empirical research. Yet, how to handle zeros in the dependent variable remains an unsettled issue. This article clarifies it and addresses the log of zero by developing a new family of estimators called iterated Ordinary Least Squares (iOLS). This family nests standard approaches such as log-linear and Poisson regressions, offers several computational advantages, and corresponds to the correct way to perform the popular $\log(Y+1)$ transformation. We extend it to the endogenous regressor setting (i2SLS) and overcome other common issues with Poisson models, such as controlling for many fixed-effects. We also develop specification tests to help researchers select between alternative estimators. Finally, our methods are illustrated through numerical simulations and replications of landmark publications. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.11820&r= |
By: | Jérôme Trinh (Université de Cergy-Pontoise, THEMA) |
Abstract: | In this paper, we propose a method to disaggregate very small time series by fitting them with higher frequency related series using a cointegration regression with multiple partial endogenous structural breaks. We allow any coecient to change at up to two dates of structural break and three related series and provide critical values for the test of cointegration corrected for the very small sample size. We find that increasing the num- ber of related series drastically improves the power of the test by allowing for increased flexibility in the cointegration model. The simulated power of the test is shown to be very high even in very small sample sizes such as fifteen observations. This flexibility also mildly improves the accuracy of the disaggregation method when the sample size is as small as thirty-five observations. An application to the Chinese national accounts data is provided and allows the study of the Chinese business cycles stylized facts. We find that household consumption, public spending, and trade surpluses are the main driver of the business cycle. |
Keywords: | Time series, macroeconomic forecasting, disaggregation, structural change, business cycles, emerging economies |
JEL: | C32 E17 E37 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:ema:worpap:2022-10&r= |
By: | Marion, Rebecca (Université catholique de Louvain, LIDAM/ISBA, Belgium); Lederer, Johannes; Govaerts, Bernadette (Université catholique de Louvain, LIDAM/ISBA, Belgium); von Sachs, Rainer (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | Sparse linear prediction methods suffer from decreased prediction accuracy when the predictor variables have cluster structure (e.g. there are highly correlated groups of variables). To improve prediction accuracy, various methods have been proposed to identify variable clusters from the data and integrate cluster information into a sparse modeling process. But none of these methods achieve satisfactory performance for prediction, variable selection and variable clustering simultaneously. This paper presents Variable Cluster Principal Component Regression (VC-PCR), a prediction method that supervises variable selection and variable clustering in order to solve this problem. Experiments with real and simulated data demonstrate that, compared to competitor methods, VC-PCR achieves better prediction, variable selection and clustering performance when cluster structure is present. |
Keywords: | Variable clustering ; dimensionality reduction ; nonnegative matrix factorization ; latent variables ; sparsity ; prediction |
Date: | 2021–12–17 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2021040&r= |
By: | Emmanuel Flachaire; Gilles Hacheme; Sullivan Hu\'e; S\'ebastien Laurent |
Abstract: | Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.11691&r= |
By: | Caio Almeida (Princeton University); Gustavo Freire (Erasmus School of Economics); Rafael Azevedo (Getulio Vargas Foundation (FGV)); Kym Ardison (Getulio Vargas Foundation (FGV)) |
Abstract: | We propose a family of nonparametric estimators for an option price that require only the use of underlying return data, but can also easily incorporate available option prices. Each estimator comes from a risk-neutral measure minimizing generalized entropy according to a different Cressie-Read discrepancy function. In a large-scale empirical application with S&P 500 options, we investigate their out-of-sample pricing accuracy using different amounts of option data in the estimation. Relying only on underlying returns, our estimators significantly outperform the Black-Scholes and GARCH option pricing models. Using up to three options, our method delivers performance comparable to (and often better than) the ad-hoc Black-Scholes that exploits information from the whole cross-section of options. Overall, we provide a powerful option pricing technique suitable for limited option data availability. |
Keywords: | Risk-Neutral Measure, Option Pricing, Nonparametric Estimation, Generalized Entropy, Cressie-Read Discrepancies |
JEL: | C14 C58 G13 |
Date: | 2021–07 |
URL: | http://d.repec.org/n?u=RePEc:pri:econom:2021-92&r= |
By: | Elias Tsakas |
Abstract: | It is well known that individual beliefs cannot be identified using traditional choice data, unless we impose the practically restrictive and conceptually awkward assumption that utilities are state-independent. In this paper, we propose a novel methodology that solves this long-standing identification problem in a simple way. Our method relies on the concept of influential actions. These are actions that are controlled by the analyst and lead the agent to change her beliefs. Notably, the analyst does not need to have any idea on how the agent's beliefs will change in response to an influential action. Then, instead of eliciting directly the agent's beliefs about the state space, we elicit her subjective probabilities about the influential action having been undertaken conditional on each state realization. The latter can be easily done with existing elicitation tools. It turns out that this is enough to uniquely identify her beliefs about the state space irrespective of her utility function, thus solving the identification problem. We discuss that this method can be used in most applications of interest. As an example, we show how it can provide a new useful tool for identifying motivated beliefs on an individual level. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.10505&r= |
By: | Dalia Ghanem; Pedro H. C. Sant'Anna; Kaspar W\"uthrich |
Abstract: | One of the advantages of difference-in-differences (DiD) methods is that they do not explicitly restrict how units select into treatment. However, when justifying DiD, researchers often argue that the treatment is "quasi-randomly" assigned. We investigate what selection mechanisms are compatible with the parallel trends assumptions underlying DiD. We derive necessary conditions for parallel trends that clarify whether and how selection can depend on time-invariant and time-varying unobservables. Motivated by these necessary conditions, we suggest a menu of interpretable sufficient conditions for parallel trends, thereby providing the formal underpinnings for justifying DiD based on contextual information about selection into treatment. We provide results for both separable and nonseparable outcome models and show that this distinction has implications for the use of covariates in DiD analyses. |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2203.09001&r= |
By: | Cizek, Pavel (Tilburg University, Center For Economic Research); Sadikoglu, Serhan (Tilburg University, Center For Economic Research) |
Keywords: | correlated random effects; local polynomial smoothing; multiple-index model; nonlinear panel data; nonseparable models; outer product of gradients |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:tiu:tiucen:7899deb9-0eda-47e6-a3b8-20132fb0caea&r= |