|
on Econometrics |
By: | Fan Yang; Yi Qian; Hui Xie |
Abstract: | A prominent challenge when drawing causal inference using observational data is the ubiquitous presence of endogenous regressors. The classical econometric method to handle regressor endogeneity requires instrumental variables that must satisfy the stringent condition of exclusion restriction, making it infeasible to use in many settings. We propose new instrument-free methods using copulas to address the endogeneity problem. The existing copula correction method focuses only on the endogenous regressors and may yield biased estimates when exogenous and endogenous regressors are correlated. Furthermore, (nearly) normally distributed endogenous regressors cause model non-identification or finite-sample poor performance. Our proposed two-stage copula endogeneity correction (2sCOPE) method simultaneously overcomes the two key limitations and yields consistent causal-effect estimates with correlated endogenous and exogenous regressors as well as normally distributed endogenous regressors. 2sCOPE employs generated regressors derived from existing regressors to control for endogeneity, and is straightforward to use and broadly applicable. Moreover, we prove that exploiting correlated exogenous regressors can address the problem of insufficient regressor non-normality, relax identification requirements and improve estimation precision (by as much as ∼50% in empirical evaluation). Overall, 2sCOPE can greatly increase the ease of and broaden the applicability of instrument-free methods for dealing with regressor endogeneity. We demonstrate the performance of 2sCOPE via simulation studies and an empirical application. |
JEL: | C01 C1 C13 C18 C4 |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29708&r= |
By: | Nezakati, Ensiyeh (Université catholique de Louvain, LIDAM/ISBA, Belgium); Pircalabelu, Eugen (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | This paper studies the estimation of Gaussian graphical models in the unbalanced distributed framework. Unbalanced distributing is an effective approach when the available machines are of different powers or when the existing dataset comes from different resources with different sizes and can not be aggregated in one single computer. In this paper, we propose a new aggregated estimator of the precision matrix and justify such an approach by both theoretical and practical arguments. The limit distribution and consistency of this estimator are investigated. Furthermore, a procedure for performing statistical inference is proposed. On the practical side, a simulation study and real data examples are illustrated. We show that the performance of the distributed estimator is similar to that of the non-distributed estimator using the full data. |
Keywords: | Gaussian graphical models ; Precision matrix ; Lasso penalization ; Unbalanced distributed setting ; De-biased estimator ; Confidence distribution |
Date: | 2021–01–01 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2021031&r= |
By: | Masahiro Kato; Masaaki Imaizumi |
Abstract: | We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE), with linear regression models. As the development of machine learning for causal inference, a wide range of large-scale models for causality are gaining attention. One problem is that suspicions have been raised that the large-scale models are prone to overfitting to observations with sample selection, hence the large models may not be suitable for causal prediction. In this study, to resolve the suspicious, we investigate on the validity of causal inference methods for overparameterized models, by applying the recent theory of benign overfitting (Bartlett et al., 2020). Specifically, we consider samples whose distribution switches depending on an assignment rule, and study the prediction of CATE with linear models whose dimension diverges to infinity. We focus on two methods: the T-learner, which based on a difference between separately constructed estimators with each treatment group, and the inverse probability weight (IPW)-learner, which solves another regression problem approximated by a propensity score. In both methods, the estimator consists of interpolators that fit the samples perfectly. As a result, we show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known. This difference stems from that the T-learner is unable to preserve eigenspaces of the covariances, which is necessary for benign overfitting in the overparameterized setting. Our result provides new insights into the usage of causal inference methods in the overparameterizated setting, in particular, doubly robust estimators. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.05245&r= |
By: | Karsten Schweikert |
Abstract: | In this paper, we propose a two-step procedure based on the group LASSO estimator in combination with a backward elimination algorithm to efficiently detect multiple structural breaks in linear regressions with multivariate responses. Applying the two-step estimator, we jointly detect the number and location of change points, and provide consistent estimates of the coefficients. Our framework is flexible enough to allow for a mix of integrated and stationary regressors, as well as deterministic terms. Using simulation experiments, we show that the proposed two-step estimator performs competitively against the likelihood-based approach (Qu and Perron, 2007; Li and Perron, 2017; Oka and Perron, 2018) when trying to detect common breaks in finite samples. However, the two-step estimator is computationally much more efficient. An economic application to the identification of structural breaks in the term structure of interest rates illustrates this methodology. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.05430&r= |
By: | Christine Blandhol; John Bonney; Magne Mogstad; Alexander Torgovitsky |
Abstract: | Linear instrumental variable estimators, such as two-stage least squares (TSLS), are commonly interpreted as estimating positively weighted averages of causal effects, referred to as local average treatment effects (LATEs). We examine whether the LATE interpretation actually applies to the types of TSLS specifications that are used in practice. We show that if the specification includes covariates, which most empirical work does, then the LATE interpretation does not apply in general. Instead, the TSLS estimator will in general reflect treatment effects for both compliers and always/never-takers, and some of the treatment effects for the always/never-takers will necessarily be negatively weighted. We show that the only specifications that have a LATE interpretation are "saturated" specifications that control for covariates nonparametrically, implying that such specifications are both sufficient and necessary for TSLS to have a LATE interpretation, at least without additional parametric assumptions. This result is concerning because, as we document, empirical researchers almost never control for covariates nonparametrically, and rarely discuss or justify parametric specifications of covariates. We develop a decomposition that quantifies the extent to which the usual LATE interpretation fails. We apply the decomposition to four empirical analyses and find strong evidence that the LATE interpretation of TSLS is far from accurate for the types of specifications actually used in practice. |
JEL: | C26 |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29709&r= |
By: | Pircalabelu, Eugen (Université catholique de Louvain, LIDAM/ISBA, Belgium); Claeskens, Gerda (KU Leuven) |
Abstract: | We develop a high-dimensional graphical modeling approach for functional data where the number of functions exceeds the available sample size. This is accomplished by proposing a sparse estimator for a concentration matrix when identifying linear manifolds. As such, the procedure extends the ideas of the manifold representation for functional data to high-dimensional settings where the number of functions is larger than the sample size. By working in a penalized framework it enriches the functional data framework by estimating sparse undirected graphs that show how functional nodes connect to other functional nodes. The procedure allows multiple coarseness scales to be present in the data and proposes a simultaneous estimation of several related graphs. |
Keywords: | Multivariate functional data; Multiscale data; Graphical lasso; Joint estimation; Group penalty |
Date: | 2021–04–16 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2021032&r= |
By: | Aman Ullah (Department of Economics, University of California Riverside); Tao Wang (UC Riverside); Weixin Yao (UC Riverside) |
Abstract: | In this paper, under the stationary alpha-mixing dependent samples, we develop a novel nonlinear modal regression for time series sequences and establish the consistency and asymptotic property of the proposed nonlinear modal estimator with a shrinking bandwidth under certain regularity conditions. The asymptotic distribution is shown to be identical to the one derived from the independent observations, whereas the convergence rate is slower than that in the nonlinear mean regression. We numerically estimate the proposed nonlinear modal regression model by the use of a modified modal-expectation-maximization (MEM) algorithm in conjunction with Taylor expansion. Monte Carlo simulations are presented to demonstrate the good fi nite sample (prediction) performance of the newly proposed model. We also construct a specified nonlinear modal regression to match the available daily new cases and new deaths data of the COVID-19 outbreak at the state/region level in the United States, and provide forward prediction up to 130 days ahead (from August 24, 2020 to December 31, 2020). In comparison to the traditional nonlinear regressions, the suggested model can fit the COVID-19 data better and produce more precise predictions. The prediction results indicate that there are systematic differences in spreading distributions among states/regions. For most western and eastern states, they have many serious COVID-19 burdens compared to Midwest. We hope that the built nonlinear modal regression can help policymakers to implement fast actions to curb the spread of the infection, avoid overburdening the health system, and understand the development of COVID-19 from some points. |
Keywords: | COVID-19, Dependent data, MEM algorithm, Modal regression, Nonlinear, Prediction |
JEL: | C01 C14 C22 C53 |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:ucr:wpaper:202207&r= |
By: | Lambert, Philippe (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | Data on a continuous variable are often summarized by means of histograms or displayed in tabular format: the range of data is partitioned into consecutive interval classes and the number of observations falling within each class is provided to the analyst. Computations can then be carried in a nonparametric way by assuming a uniform distribution of the variable within each partitioning class, by concentrating all the observed values in the center, or by spreading them to the extremities. Smoothing methods can also be applied to estimate the underlying density or a parametric model can be fitted to these grouped data. For insurance loss data, some additional information is often provided about the observed values contained in each class, typically class-specific sample moments such as the mean, the variance or even the skewness and the kurtosis. The question is then how to include this additional information in the estimation procedure. The present paper proposes a method for performing density and quantile estimation based on such augmented information with an illustration on car insurance data. |
Keywords: | Nonparametric density estimation ; grouped data ; sample moments ; risk measures |
Date: | 2021–07–09 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2021039&r= |
By: | G. Cubadda; S. Grassi; B. Guardabascio |
Abstract: | Many economic variables feature changes in their conditional mean and volatility, and Time Varying Vector Autoregressive Models are often used to handle such complexity in the data. Unfortunately, when the number of series grows, they present increasing estimation and interpretation problems. This paper tries to address this issue proposing a new Multivariate Autoregressive Index model that features time varying means and volatility. Technically, we develop a new estimation methodology that mix switching algorithms with the forgetting factors strategy of Koop and Korobilis (2012). This substantially reduces the computational burden and allows to select or weight, in real time, the number of common components and other features of the data using Dynamic Model Selection or Dynamic Model Averaging without further computational cost. Using USA macroeconomic data, we provide a structural analysis and a forecasting exercise that demonstrates the feasibility and usefulness of this new model. Keywords: Large datasets, Multivariate Autoregressive Index models, Stochastic volatility, Bayesian VARs. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.07069&r= |
By: | James G. MacKinnon (Queen's University) |
Abstract: | As I document using evidence from a journal data repository that I manage, the datasets used in empirical work are getting larger. When we use very large datasets, it can be dangerous to rely on standard methods for statistical inference. In addition, we need to worry about computational issues. We must be careful in our choice of statistical methods and the algorithms used to implement them. |
Keywords: | datasets, clustered data, statistical computation, statistical inference, bootstrap |
JEL: | C10 C12 C13 C55 |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:qed:wpaper:1482&r= |
By: | Cl\'ement de Chaisemartin; Xavier D'Haultfoeuille; F\'elix Pasquier; Gonzalo Vazquez-Bare |
Abstract: | We propose new difference-in-difference (DID) estimators for treatments continuously distributed at every time period, as is often the case of trade tariffs, or temperatures. We start by assuming that the data only has two time periods. We also assume that from period one to two, the treatment of some units, the movers, changes, while the treatment of other units, the stayers, does not change. Then, our estimators compare the outcome evolution of movers and stayers with the same value of the treatment at period one. Our estimators only rely on parallel trends assumptions, unlike commonly used two-way fixed effects regressions that also rely on homogeneous treatment effect assumptions. With a continuous treatment, comparisons of movers and stayers with the same period-one treatment can either be achieved by non-parametric regression, or by propensity-score reweighting. We extend our results to applications with more than two time periods, no stayers, and where the treatment may have dynamic effects. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.06898&r= |
By: | Chen, Zezhun Chen; Dassios, Angelos; Tzougas, George |
Abstract: | Motivated by the extended Poisson INAR(1), which allows innovations to be serially dependent, we develop a new family of binomial-mixed Poisson INAR(1) (BMP INAR(1)) processes by adding a mixed Poisson component to the innovations of the classical Poisson INAR(1) process. Due to the flexibility of the mixed Poisson component, the model includes a large class of INAR(1) processes with different transition probabilities. Moreover, it can capture some overdispersion features coming from the data while keeping the innovations serially dependent. We discuss its statistical properties, stationarity conditions and transition probabilities for different mixing densities (Exponential, Lindley). Then, we derive the maximum likelihood estimation method and its asymptotic properties for this model. Finally, we demonstrate our approach using a real data example of iceberg count data from a financial system. |
Keywords: | Count data time series; Binomial-Mixed Poisson INAR(1) models; mixed Poisson distribution; overdispersion; maximum likelihood estimation; T&F deal |
JEL: | C1 |
Date: | 2021–11–01 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:112222&r= |
By: | Ding, Y. |
Abstract: | We propose a new class of conditional heteroskedasticity in the volatility (CHV) models which allows for time-varying volatility of volatility in the volatility of asset returns. This class nests a variety of GARCH-type models and the SHARV model of Ding (2021). CH-V models can be seen as a special case of the stochastic volatility of volatility model. We then introduce two examples of CH-V in which we specify a GJR-GARCH and an E-GARCH processes for the volatility of volatility, respectively. We also show a novel way of introducing the leverage effect of negative returns on the volatility through the volatility of volatility process. Empirical study confirms that CH-V models have better goodness-of-fit and out-of-sample volatility and Value-at-Risk forecasts than common GARCH-type models. |
Keywords: | forecasting, GARCH, SHARV, volatility, volatility of volatility |
JEL: | C22 C32 C53 C58 G17 |
Date: | 2021–11–09 |
URL: | http://d.repec.org/n?u=RePEc:cam:camdae:2179&r= |
By: | Jérôme Trinh (Université de Cergy-Pontoise, THEMA) |
Abstract: | This article proposes an adaptation of existing tests of cointegration with endogenous structural changes to very small sample size. Size-corrected critical values for both testing cointegration with endogenous structural breaks and testing structural breaks in the parameters in a cointegration model are computed in this context. We show that the power of such a testing procedure is satisfying in sample sizes smaller than fifty observations. This of interest for macroeconometric studies of emerging economies for which the data history is usually not long enough to apply conventional methods. When the serial correlation is low, we find the tests to be powerful for even less than thirty observations. A combined procedure of testing for cointegration and structural change allows us to improve the power of testing cointegration in very small sample sizes while staying agnostic about the underlying data generating processes. An example using the Chinese data finds a cointegration relationship with two structural breaks between the national household consumption expenditures, the retail sales of consumer goods and the investment in fixed assets during the last four decades. |
Keywords: | Time series, cointegration, structural change, very small sample, emerging economies |
JEL: | C32 E17 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:ema:worpap:2022-01&r= |
By: | Tzougas, George; Hong, Natalia; Ho, Ryan |
Abstract: | In this article we present a class of mixed Poisson regression models with varying dispersion arising from non-conjugate to the Poisson mixing distributions for modelling overdispersed claim counts in non-life insurance. The proposed family of models combined with the adopted modelling framework can provide sufficient flexibility for dealing with different levels of overdispersion. For illustrative purposes, the Poisson-lognormal regression model with regression structures on both its mean and dispersion parameters is employed for modelling claim count data from a motor insurance portfolio. Maximum likelihood estimation is carried out via an expectation-maximization type algorithm, which is developed for the proposed family of models and is demonstrated to perform satisfactorily. |
Keywords: | claim frequency; EM algorithm; non-life insurance; regression structures on the mean and dispersion parameters |
JEL: | C1 |
Date: | 2022–01–01 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:113616&r= |
By: | Tapsoba, Augustin |
Abstract: | Being able to assess conflict risk at local level is crucial for preventing political violence or mitigating its consequences. This paper develops a new approach for predicting the timing and location of conflict events from violence history data. It adapts the methodology developed in Tapsoba (2018) for measuring violence risk across space and time to conflict prediction. Violence is modeled as a stochastic process with an unknown underlying distribution. Each conflict event observed on the ground is interpreted as a random realization of this process and its underlying distribution is estimated using kernel density estimation methods in a three-dimensional space. The optimal smoothing parameters are estimated to maximize the likelihood of future conflict events. An illustration of the practical gains (in terms of out-of-sample forecasting performance) of this new methodology compared to standard space-time autoregressive models is shown using data from Côte d’Ivoire. |
Keywords: | Conflict; Insecurity; Kernel Density Estimation |
JEL: | C1 O12 O13 |
Date: | 2022–01–26 |
URL: | http://d.repec.org/n?u=RePEc:tse:wpaper:126538&r= |
By: | Calonaci, Fabio; Kapetanios, George; Price, Simon |
Abstract: | We re-examine predictability of US stock returns. Theoretically well-founded models predict that stationary combinations of I(1) variables such as the dividend or earnings to price ratios or the consumption/asset/income relationship often known as CAY may predict returns. However, there is evidence that these relationships are unstable, and that allowing for discrete shifts in the unconditional mean (location shifts) can lead to greater predictability. It is unclear why there should be a small number of discrete shifts and we allow for more general instability in the predictors, characterised by smooth variation variation, using a method introduced by Giraitis, Kapetanios and Yates. This can remove persistent components from observed time series, that may otherwise account for the presence of near unit root type behaviour. Our methodology may therefore be seen as an alternative to the widely used IVX methods where there is strong persistence in the predictor. We apply this to the three predictors mentioned above in a sample from 1952 to 2019 (including the financial crisis but excluding the Covid pandemic) and find that modelling smooth instability improves predictability and forecasting performance and tends to outperform discrete location shifts, whether identified by in-sample Bai-Perron tests or Markov-switching models. |
Keywords: | returns predictability; long horizons; instability |
Date: | 2022–02–18 |
URL: | http://d.repec.org/n?u=RePEc:esy:uefcwp:32331&r= |
By: | Pircalabelu, Eugen (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | We develop in this manuscript a method for performing estimation and inference for the reproduction number of an epidemiological outbreak. The estimator is time-dependent and uses spline modeling to adapt to changes in the outbreak. This is accomplished by directly modeling the series of new infections as a function of time and subsequently using the derivative of the function to define a time-varying reproduction number. |
Keywords: | reproduction number, time dependency, spline modeling, COVID-19 |
Date: | 2021–04–10 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2021030&r= |
By: | Collin Philipps (Department of Economics and Geosciences, US Air Force Academy) |
Abstract: | This article establishes how expectiles should be understood. An expectile is the minimizer of an asymmetric least squares criterion, making it a weighted average. This also means that an expectile is the conditional mean of the distribution under special circumstances. Specifically, an expectile of a distribution is a value that would be the mean if values above it were more likely to occur than they are. Expectiles summarize distributions in a manner comparable to quantiles, but quantiles are expectiles in location models. The reverse is true in special cases. Expectiles are m-estimators, m-quantiles, and Lp-quantiles, families which connect them to the majority of statistics commonly in use. |
Keywords: | Expectile regression, Generalized Quantile Regression |
JEL: | C0 C21 C46 |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:ats:wpaper:wp2022-1&r= |
By: | Isaiah Andrews; Drew Fudenberg; Annie Liang; Chaofeng Wu |
Abstract: | Whether a model's performance on a given domain can be extrapolated to other settings depends on whether it has learned generalizable structure. We formulate this as the problem of theory transfer, and provide a tractable way to measure a theory's transferability. We derive confidence intervals for transferability that ensure coverage in finite samples, and apply our approach to evaluate the transferability of predictions of certainty equivalents across different subject pools. We find that models motivated by economic theory perform more reliably than black-box machine learning methods at this transfer prediction task. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.04796&r= |