|
on Econometrics |
By: | Irmer, Julien Patrick; Klein, Andreas; Schermelleh-Engel, Karin |
Abstract: | Closed form power estimation is only available for limited classes of models, requiring correct model specification for most applications. Simulation is used in other scenarios, but a general framework in computing required sample sizes for given power rates is still missing. We propose a new model-implied simulation-based power estimation (MSPE) method that makes use of the asymptotic normality property of estimates of a wide class of estimators, the $M$-estimators, and we give theoretical justification for the approach. $M$-estimators include maximum-likelihood estimates and least squares estimates, but also limited information estimators and estimators used for misspecified models, hence, the new power modeling method is widely applicable. We highlight its performance for linear and nonlinear structural equation models (SEM) and a moderated logistic regression model for correctly specified models and models under distributional misspecification. Simulation results suggest that the new power modeling method is unbiased and shows good performance with regard to root mean squared error and Type I error rates for the predicted required sample sizes and predicted power rates. Alternative approaches, such as the na\"ive approach of selecting arbitrary sample sizes with linear interpolation of power or simple logistic regression approaches, showed poor performance. The MSPE appears to be a valuable tool to estimate power for models without (asymptotic) analytical power estimation. |
Date: | 2024–03–29 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:pe5bj&r=ecm |
By: | Victor Chernozhukov; Iv\'an Fern\'andez-Val; Sukjin Han; Kaspar W\"uthrich |
Abstract: | We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption, so-called copula invariance, that restricts the local dependence of the copula with respect to the treatment propensity. We show that copula invariance identifies treatment effects for the entire population and other subpopulations such as the treated. The identification results are constructive and lead to straightforward semiparametric estimation procedures based on distribution regression. An application to the effect of sleep on well-being uncovers interesting patterns of heterogeneity. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.05850&r=ecm |
By: | Anders Bredahl Kock; Rasmus S{\o}ndergaard Pedersen; Jesper Riis-Vestergaard S{\o}rensen |
Abstract: | Lasso-type estimators are routinely used to estimate high-dimensional time series models. The theoretical guarantees established for Lasso typically require the penalty level to be chosen in a suitable fashion often depending on unknown population quantities. Furthermore, the resulting estimates and the number of variables retained in the model depend crucially on the chosen penalty level. However, there is currently no theoretically founded guidance for this choice in the context of high-dimensional time series. Instead one resorts to selecting the penalty level in an ad hoc manner using, e.g., information criteria or cross-validation. We resolve this problem by considering estimation of the perhaps most commonly employed multivariate time series model, the linear vector autoregressive (VAR) model, and propose a weighted Lasso estimator with penalization chosen in a fully data-driven way. The theoretical guarantees that we establish for the resulting estimation and prediction error match those currently available for methods based on infeasible choices of penalization. We thus provide a first solution for choosing the penalization in high-dimensional time series models. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.06657&r=ecm |
By: | Emanuele Bacchiocchi; Andrea Bastianin; Toru Kitagawa; Elisabetta Mirto |
Abstract: | This paper studies the identification of Structural Vector Autoregressions (SVARs) exploiting a break in the variances of the structural shocks. Point-identification for this class of models relies on an eigen-decomposition involving the covariance matrices of reduced-form errors and requires that all the eigenvalues are distinct. This point-identification, however, fails in the presence of multiplicity of eigenvalues. This occurs in an empirically relevant scenario where, for instance, only a subset of structural shocks had the break in their variances, or where a group of variables shows a variance shift of the same amount. Together with zero or sign restrictions on the structural parameters and impulse responses, we derive the identified sets for impulse responses and show how to compute them. We perform inference on the impulse response functions, building on the robust Bayesian approach developed for set identified SVARs. To illustrate our proposal, we present an empirical example based on the literature on the global crude oil market where the identification is expected to fail due to multiplicity of eigenvalues. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.06879&r=ecm |
By: | Masahiro Kato |
Abstract: | This study investigates the estimation and the statistical inference about Conditional Average Treatment Effects (CATEs), which have garnered attention as a metric representing individualized causal effects. In our data-generating process, we assume linear models for the outcomes associated with binary treatments and define the CATE as a difference between the expected outcomes of these linear models. This study allows the linear models to be high-dimensional, and our interest lies in consistent estimation and statistical inference for the CATE. In high-dimensional linear regression, one typical approach is to assume sparsity. However, in our study, we do not assume sparsity directly. Instead, we consider sparsity only in the difference of the linear models. We first use a doubly robust estimator to approximate this difference and then regress the difference on covariates with Lasso regularization. Although this regression estimator is consistent for the CATE, we further reduce the bias using the techniques in double/debiased machine learning (DML) and debiased Lasso, leading to $\sqrt{n}$-consistency and confidence intervals. We refer to the debiased estimator as the triple/debiased Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the soundness of our proposed method through simulation studies. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.03240&r=ecm |
By: | Degui Li; Oliver Linton; Haoxuan Zhang |
Abstract: | We propose a new estimator of high-dimensional spot volatility matrices satisfying a low-rank plus sparse structure from noisy and asynchronous high-frequency data collected for an ultra-large number of assets. The noise processes are allowed to be temporally correlated, heteroskedastic, asymptotically vanishing and dependent on the efficient prices. We define a kernel-weighted pre-averaging method to jointly tackle the microstructure noise and asynchronicity issues, and we obtain uniformly consistent estimates for latent prices. We impose a continuous-time factor model with time-varying factor loadings on the price processes, and estimate the common factors and loadings via a local principal component analysis. Assuming a uniform sparsity condition on the idiosyncratic volatility structure, we combine the POET and kernel-smoothing techniques to estimate the spot volatility matrices for both the latent prices and idiosyncratic errors. Under some mild restrictions, the estimated spot volatility matrices are shown to be uniformly consistent under various matrix norms. We provide Monte-Carlo simulation and empirical studies to examine the numerical performance of the developed estimation methodology. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.06246&r=ecm |
By: | Wanbo Lu; Guanglin Huang; Kris Boudt (-) |
Abstract: | We estimate the latent factors in high-dimensional non-Gaussian panel data using the eigenvalue decomposition of the product between the higher-order multi-cumulant and its transpose. The proposed Higher order multi-cumulant Factor Analysis (HFA) approach comprises an eigenvalue ratio test to select the number of non-Gaussian factors and uses the eigenvector to estimate the factor loadings. Unlike covariance-based approaches, HFA remains reliable for estimating the nonGaussian factors in weak factor models with Gaussian error terms. Simulation results confirm that HFA estimators improve the accuracy of factor selection and estimation compared to covariancebased approaches. We illustrate the use of HFA to detect and estimate the factors for the FREDMD data set and use them to forecast the monthly S&P 500 equity premium. |
Keywords: | Higher-order multi-cumulants, High-dimensional factor models, Weak factors, Consistency, Eigenvalues |
JEL: | G11 G12 G15 |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:rug:rugwps:24/1085&r=ecm |
By: | Sarah Moon |
Abstract: | It is well known that the relationship between variables at the individual level can be different from the relationship between those same variables aggregated over individuals. This problem of aggregation becomes relevant when the researcher wants to learn individual-level relationships, but only has access to data that has been aggregated. In this paper, I develop a methodology to partially identify linear combinations of conditional average outcomes from aggregate data when the outcome of interest is binary, while imposing as few restrictions on the underlying data generating process as possible. I construct identified sets using an optimization program that allows for researchers to impose additional shape restrictions. I also provide consistency results and construct an inference procedure that is valid with aggregate data, which only provides marginal information about each variable. I apply the methodology to simulated and real-world data sets and find that the estimated identified sets are too wide to be useful. This suggests that to obtain useful information from aggregate data sets about individual-level relationships, researchers must impose further assumptions that are carefully justified. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.07236&r=ecm |
By: | Jiawei Fu |
Abstract: | Understanding causal mechanisms is essential for explaining and generalizing empirical phenomena. Causal mediation analysis offers statistical techniques to quantify mediation effects. However, existing methods typically require strong identification assumptions or sophisticated research designs. We develop a new identification strategy that simplifies these assumptions, enabling the simultaneous estimation of causal and mediation effects. The strategy is based on a novel decomposition of total treatment effects, which transforms the challenging mediation problem into a simple linear regression problem. The new method establishes a new link between causal mediation and causal moderation. We discuss several research designs and estimators to increase the usability of our identification strategy for a variety of empirical studies. We demonstrate the application of our method by estimating the causal mediation effect in experiments concerning common pool resource governance and voting information. Additionally, we have created statistical software to facilitate the implementation of our method. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.04131&r=ecm |
By: | Yi Qian; Anthony Koschmann; Hui Xie |
Abstract: | Causal inference is of central interests in many empirical applications yet often challenging because of the presence of endogenous regressors. The classical approach to the problem requires using instrumental variables that must satisfy the stringent condition of exclusion restriction. At the forefront of recent research, instrument-free copula methods have been increasingly used to handle endogenous regressors. This article aims to provide a practical guide for how to handle endogeneity using copulas. The authors give an overview of copula endogeneity correction and its usage in marketing research, discuss recent advances that broaden the understanding, applicability, and robustness of copula correction, and examine implementation challenges of copula correction such as construction of copula control functions and handling of higher-order terms of endogenous regressors. To facilitate the appropriate usage of copula correction, the authors detail a process of checking data requirements and identification assumptions to determine when and how to use copula correction methods, and illustrate its usage using empirical examples. |
JEL: | C01 C10 C5 |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:32231&r=ecm |
By: | Mathur, Maya B |
Abstract: | In small meta-analyses (e.g., up to 20 studies), the best-performing frequentist approaches for estimation and inference can yield very wide confidence intervals for the meta-analytic mean, as well as biased and imprecise estimates of the heterogeneity. We investigate the frequentist performance of alternative Bayesian methods that use the invariant Jeffreys prior. This prior can be motivated from the usual Bayesian perspective, but can alternatively be motivated from a purely frequentist perspective: the resulting posterior modes correspond to the established Firth bias correction of the maximum likelihood estimator. We consider two forms of the Jeffreys prior for random-effects meta-analysis: the previously established “Jeffreys1” prior treats the heterogeneity as a nuisance parameter, whereas the “Jeffreys2” prior treats both the mean and the heterogeneity as estimands of interest. In a large simulation study, we assess the performance of both Jeffreys priors, considering different types of Bayesian point estimates and intervals. We assess the performance of estimation and inference for both the mean and the heterogeneity parameters, comparing to the best-performing frequentist methods. We conclude that for small meta-analyses of binary outcomes, the Jeffreys2 prior may offer advantages over standard frequentist methods for estimation and inference of the mean parameter. In these cases, Jeffreys2 can substantially improve efficiency while more often showing nominal frequentist coverage. However, for small meta-analyses of continuous outcomes, standard frequentist methods seem to remain the best choices. The best-performing method for estimating the heterogeneity varied according to the heterogeneity itself. |
Date: | 2024–03–11 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:7jvrw&r=ecm |
By: | Bailly, Gabriel (Université catholique de Louvain, LIDAM/ISBA, Belgium); von Sachs, Rainer (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | We tackle the problem of estimating time-varying covariance matrices (TVCM; i.e. covariance matrices with entries being time-dependent curves) whose elements show inhomogeneous smoothness over time (e.g. pronounced local peaks). To address this challenge, wavelet denoising estimators are particularly appropriate. Specifically, we model TVCM using a signal-noise model within the Riemannian manifold of symmetric positive definite matrices (endowed with the log-Euclidean metric) and use the intrinsic wavelet transform, designed for curves in Riemannian manifolds. Within this non-Euclidean framework, the proposed estimators preserve positive definiteness. Although linear wavelet estimators for smooth TVCM achieve good results in various scenarios, they are less suitable if the underlying curve features singularities. Consequently, our estimator is designed around a nonlinear thresholding scheme, tailored to the characteristics of the noise in covariance matrix regression models. The effectiveness of this novel nonlinear scheme is assessed by deriving mean-squared error consistency and by numerical simulations, and its practical application is demonstrated on TVCM of electroencephalography (EEG) data showing abrupt transients over time. |
Keywords: | Nonlinear wavelet thresholding ; non-Euclidean geometry ; sample covariance matrices ; time-varying second-order structure ; log-Wishart distribution |
Date: | 2024–02–12 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2024004&r=ecm |
By: | Kiriliouk, Anna (Université de Namur); Lee, Jeongjin (Université de Namur); Segers, Johan (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | Regular vine sequences permit the organisation of variables in a random vector along a sequence of trees. Regular vine models have become greatly popular in dependence modelling as a way to combine arbitrary bivariate copulas into higher-dimensional ones, offering flexibility, parsimony, and tractability. In this project, we use regular vine structures to decompose and construct the exponent measure density of a multivariate extreme value distribution, or, equivalently, the tail copula density. Although these densities pose theoretical challenges due to their infinite mass, their homogeneity property offers simplifications. The theory sheds new light on existing parametric families and facilitates the construction of new ones, called X-vines. Computations proceed via recursive formulas in terms of bivariate model components. We develop simulation algorithms for X-vine multivariate Pareto distributions as well as methods for parameter estimation and model selection on the basis of threshold exceedances. The methods are illustrated by Monte Carlo experiments and a case study on US flight delay data. |
Keywords: | Exponent measure ; graphical model ; multivariate Pareto distribution ; pair copula construction ; regular vine ; tail copula |
Date: | 2023–12–22 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2023038&r=ecm |
By: | Michail Tsagris |
Abstract: | Simplicial-simplicial regression refers to the regression setting where both the responses and predictor variables lie within the simplex space, i.e. they are compositional. For this setting, constrained least squares, where the regression coefficients themselves lie within the simplex, is proposed. The model is transformation-free but the adoption of a power transformation is straightforward, it can treat more than one compositional datasets as predictors and offers the possibility of weights among the simplicial predictors. Among the model’s advantages are its ability to treat zeros in a natural way and a highly computationally efficient algorithm to estimate its coefficients. Resampling based hypothesis testing procedures are employed regarding inference, such as linear independence, and equality of the regression coefficients to some pre-specified values. The performance of the proposed technique and its comparison to an existing methodology that is of the same spirit takes place u |
Keywords: | compositional data, regression, quadratic programming |
JEL: | C10 C13 |
Date: | 2024–03–31 |
URL: | http://d.repec.org/n?u=RePEc:crt:wpaper:2402&r=ecm |
By: | Antoine Poulin-Moore; Kerem Tuzcuoglu |
Abstract: | We forecast recessions in Canada using an autoregressive (AR) probit model. In this model, the presence of the lagged latent variable, which captures the autocorrelation in the recession binary variable, results in an intractable likelihood with a high dimensional integral. Therefore, we employ composite likelihood methods to facilitate the estimation of this complex model, and we provide their asymptotic results. We perform a variable selection procedure on a large variety of Canadian and foreign macro-financial variables by using the area under the receiver operating characteristic curve (AUROC) as the performance criterion. Our findings suggest that the AR model meaningfully improves the ability to forecast Canadian recessions, relative to a variety of probit models proposed in the Canadian literature. These results are robust to changes in the performance criteria or the sample considered. Our findings also highlight the short-term predictive power of US economic activity and suggest that financial indicators are reliable predictors of Canadian recessions. |
Keywords: | Business fluctuations and cycles; Econometric and statistical methods |
JEL: | E32 C53 C51 |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:bca:bocawp:24-10&r=ecm |
By: | Serrano-Serrat, Josep |
Abstract: | This paper examines critical aspects of analyzing interactions with continuous treatment variables. A theoretical estimand is defined, the Average Interactive Effects, which is the mean of the difference in the slopes of the treatment variable at different levels of the moderator. Crucially, this theoretical estimand is distinct from the difference in Conditional Average Marginal Effects (CAME) at different levels of the moderator. The paper proposes a flexible parametric model that can be used to estimate three important pieces of information: i) predicted values at different levels of the moderator; ii) the difference in the slopes of the relationship between the treatment and the dependent variable at different values of the moderator; and iii) the mean of these slope differences. Simulations show that this approach is better than models that only estimatethe differences in the CAMEs. However, since the unbiasedness of the proposed method depends on the correct specification of the functional form, it is proposed to complement the parametric model with results from Generalized Additive Models. Finally, the effectiveness of the proposed approach is illustrated by two examples. |
Date: | 2024–03–14 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:4e8h2&r=ecm |
By: | Isaac M. Opper; Umut Özek |
Abstract: | We use a marginal treatment effect (MTE) representation of a fuzzy regression discontinuity setting to propose a novel estimator. The estimator can be thought of as extrapolating the traditional fuzzy regression discontinuity estimate or as an observational study that adjusts for endogenous selection into treatment using information at the discontinuity. We show in a frequentest framework that it is consistent under weaker assumptions than existing approaches and then discuss conditions in a Bayesian framework under which it can be considered the posterior mean given the observed conditional moments. We then use this approach to examine the effects of early grade retention. We show that the benefits of early grade retention policies are larger for students with lower baseline achievement and smaller for low-performing students who are exempt from retention. These findings imply that (1) the benefits of early grade retention policies are larger than have been estimated using traditional fuzzy regression discontinuity designs but that (2) retaining additional students would have a limited effect on student outcomes. |
Keywords: | regression discontinuity designs, local average treatment effects, early grade retention policies |
JEL: | C01 I20 I28 |
Date: | 2024 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_10972&r=ecm |
By: | Daraio, Cinzia (Sapienza Univer- sity of Rome); Di Leo, Simone (Sapienza Univer- sity of Rome); Simar, Léopold (Université catholique de Louvain, LIDAM/ISBA, Belgium) |
Abstract: | In productivity and efficiency analysis, directional distances are very popular, due to their flexibility for choosing the direction to evaluate the distance of Decision Making Units (DMUs) to the efficient frontier of the production set. The theory and the statistical properties of these measures are today well known in various situations. But so far, the way to measure directional distances to the cone spanned by the attainable set has not been analyzed. In this paper we fill this gap and describe how to define and estimate directional distances to this cone, for general technologies, i.e. without imposing convexity. Their statistical properties are also developed. This allows us to measure distances to non-convex attainable set under Constant Returns to Scale (CRS) but also to measure and estimate Luenberger productivity indices and their decompositions for general technologies. The way to make inference on these indices is also described in details. We propose illustrations with some simulated data, as well as, a practical example of inference on Luenberger productivity indices and their decompositions with a well-known real data set. |
Keywords: | Nonparametric production frontiers ; Cone ; DEA ; FDH ; Directional Distances ; Luenberger productivity indices |
JEL: | C1 C14 C13 |
Date: | 2024–02–01 |
URL: | http://d.repec.org/n?u=RePEc:aiz:louvad:2024009&r=ecm |
By: | Buczak, Philip; Horn, Daniel; Pauly, Markus |
Abstract: | There is a long tradition of modeling ordinal response data with parametric models such as the proportional odds model. With the advent of machine learning (ML), however, the classical stream of parametric models has been increasingly challenged by a more recent stream of tree ensemble (TE) methods extending popular ML algorithms such as random forest to ordinal response data. Despite selective efforts, the current literature lacks an encompassing comparison between the two methodological streams. In this work, we fill this gap by investigating under which circumstances a proportional odds model is competitive with TE methods regarding its predictive performance, and when TE should be preferred. Additionally, we study whether the optimization of the numeric scores assigned to ordinal response categories, as in Ordinal Forest (OF; Hornung, 2019), is worth the associated computational burden. To this end, we further contribute to the literature by proposing the Ordinal Score Optimization Algorithm (OSOA). Similar, to OF, OSOA optimizes the numeric scores assigned to the ordinal response categories, but aims to enhance the optimization procedure used in OF by employing a non-linear optimization algorithm. Our comparison results show that while TE approaches outperformed the proportional odds model in the presence of strong non-linear effects, the latter was competitive for small sample sizes even under medium non-linear effects. Regarding the TE methods, only subtle differences emerged between the individual methods, showing that the benefit of score optimization was situational. We analyze potential reasons for the mixed benefits of score optimization to motivate further methodological research. Based on our results, we derive practical recommendations for researchers and practitioners. |
Date: | 2024–03–29 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:v7bcf&r=ecm |