|
on Econometrics |
By: | Zhewen Pan; Yifan Zhang |
Abstract: | Existing identification and estimation methods for semiparametric sample selection models rely heavily on exclusion restrictions. However, it is difficult in practice to find a credible excluded variable that has a correlation with selection but no correlation with the outcome. In this paper, we establish a new identification result for a semiparametric sample selection model without the exclusion restriction. The key identifying assumptions are nonlinearity on the selection equation and linearity on the outcome equation. The difference in the functional form plays the role of an excluded variable and provides identification power. According to the identification result, we propose to estimate the model by a partially linear regression with a nonparametrically generated regressor. To accommodate modern machine learning methods in generating the regressor, we construct an orthogonalized moment by adding the first-step influence function and develop a locally robust estimator by solving the cross-fitted orthogonalized moment condition. We prove root-n-consistency and asymptotic normality of the proposed estimator under mild regularity conditions. A Monte Carlo simulation shows the satisfactory performance of the estimator in finite samples, and an application to wage regression illustrates its usefulness in the absence of exclusion restrictions. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.01208 |
By: | Javier Alejo; Antonio F. Galvao; Julian Martinez-Iriarte; Gabriel Montes-Rojas |
Abstract: | Linear regressions with endogeneity are widely used to estimate causal effects. This paper studies a statistical framework that has two common issues, endogeneity of the regressors, and heteroskedasticity that is allowed to depend on endogenous regressors, i.e., endogenous heteroskedasticity. We show that the presence of such conditional heteroskedasticity in the structural regression renders the two-stages least squares estimator inconsistent. To solve this issue, we propose sufficient conditions together with a control function approach to identify and estimate the causal parameters of interest. We establish statistical properties of the estimator, say consistency and asymptotic normality, and propose valid inference procedures. Monte Carlo simulations provide evidence of the finite sample performance of the proposed methods, and evaluate different implementation procedures. We revisit an empirical application about job training to illustrate the methods. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.02767 |
By: | Robin Braun; George Kapetanios; Massimiliano Marcellino |
Abstract: | This paper studies the estimation and inference of time-varying impulse response functions in structural vector autoregressions (SVARs) identified with external instruments. Building on kernel estimators that allow for nonparametric time variation, we derive the asymptotic distributions of the relevant quantities. Our estimators are simple and computationally trivial and allow for potentially weak instruments. Simulations suggest satisfactory empirical coverage even in relatively small samples as long as the underlying parameter instabilities are sufficiently smooth. We illustrate the methods by studying the time-varying effects of global oil supply news shocks on US industrial production. |
Keywords: | Time-varying parameters; Nonparametric estimation; Structural VAR; External instruments; Weak instruments; Oil supply news shocks; Impulse response analysis |
JEL: | C14 C32 C53 C55 |
Date: | 2025–01–06 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:2025-04 |
By: | Christoph Breunig; Ruixuan Liu; Zhengfei Yu |
Abstract: | This paper studies semiparametric Bayesian inference for the average treatment effect on the treated (ATT) within the difference-in-differences research design. We propose two new Bayesian methods with frequentist validity. The first one places a standard Gaussian process prior on the conditional mean function of the control group. We obtain asymptotic equivalence of our Bayesian estimator and an efficient frequentist estimator by establishing a semiparametric Bernstein-von Mises (BvM) theorem. The second method is a double robust Bayesian procedure that adjusts the prior distribution of the conditional mean function and subsequently corrects the posterior distribution of the resulting ATT. We establish a semiparametric BvM result under double robust smoothness conditions; i.e., the lack of smoothness of conditional mean functions can be compensated by high regularity of the propensity score, and vice versa. Monte Carlo simulations and an empirical application demonstrate that the proposed Bayesian DiD methods exhibit strong finite-sample performance compared to existing frequentist methods. Finally, we outline an extension to difference-in-differences with multiple periods and staggered entry. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.04605 |
By: | Sunny Karim; Matthew D. Webb |
Abstract: | This paper introduces the two-way common causal covariates (CCC) assumption, which is necessary to get an unbiased estimate of the ATT when using time-varying covariates in existing Difference-in-Differences methods. The two-way CCC assumption implies that the effect of the covariates remain the same between groups and across time periods. This assumption has been implied in previous literature, but has not been explicitly addressed. Through theoretical proofs and a Monte Carlo simulation study, we show that the standard TWFE and the CS-DID estimators are biased when the two-way CCC assumption is violated. We propose a new estimator called the Intersection Difference-in-differences (DID-INT) which can provide an unbiased estimate of the ATT under two-way CCC violations. DID-INT can also identify the ATT under heterogeneous treatment effects and with staggered treatment rollout. The estimator relies on parallel trends of the residuals of the outcome variable, after appropriately adjusting for covariates. This covariate residualization can recover parallel trends that are hidden with conventional estimators. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.14447 |
By: | Mengsi Gao |
Abstract: | This paper investigates the identification and inference of treatment effects in randomized controlled trials with social interactions. Two key network features characterize the setting and introduce endogeneity: (1) latent variables may affect both network formation and outcomes, and (2) the intervention may alter network structure, mediating treatment effects. I make three contributions. First, I define parameters within a post-treatment network framework, distinguishing direct effects of treatment from indirect effects mediated through changes in network structure. I provide a causal interpretation of the coefficients in a linear outcome model. For estimation and inference, I focus on a specific form of peer effects, represented by the fraction of treated friends. Second, in the absence of endogeneity, I establish the consistency and asymptotic normality of ordinary least squares estimators. Third, if endogeneity is present, I propose addressing it through shift-share instrumental variables, demonstrating the consistency and asymptotic normality of instrumental variable estimators in relatively sparse networks. For denser networks, I propose a denoised estimator based on eigendecomposition to restore consistency. Finally, I revisit Prina (2015) as an empirical illustration, demonstrating that treatment can influence outcomes both directly and through network structure changes. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.02183 |
By: | Yanqin Fan; Hyeonseok Park |
Abstract: | This paper proposes minimum sliced distance estimation in structural econometric models with possibly parameter-dependent supports. In contrast to likelihood-based estimation, we show that under mild regularity conditions, the minimum sliced distance estimator is asymptotically normally distributed leading to simple inference regardless of the presence/absence of parameter dependent supports. We illustrate the performance of our estimator on an auction model. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.05621 |
By: | Zongwu Cai (Department of Economics, The University of Kansas, Lawrence, KS 66045, USA); Ying Fang (The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian 361005, China and Department of Statistics & Data Science, School of Economics, Xiamen University, Xiamen, Fujian 361005, China); Ming Lin (The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian 361005, China and Department of Statistics and Data Science, School of Economics, Xiamen University, Xiamen, Fujian 361005, China); Yaqian Wu (School of Economics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China) |
Abstract: | In this paper, we propose a new method to estimate counterfactual distribution functions via the optimal distribution balancing weights, to avoid estimating the inverse propensity weights, which is sensitive to model specification and easily causes unstable estimates as well as often fails to adequately balance covariates. First, we demonstrate that the estimated weights exactly balance the estimated conditional distributions among the treated, untreated, and combined groups via a well-defined convex optimization problem. Secondly, we show that the resulting estimator of counterfactual distribution function converges weakly to a mean-zero Gaussian process at the parametric rate of the squared root n. Additionally, we show that a properly designed Bootstrap method can be used to obtain confidence intervals for conducting statistical inferences, together with its theoretical justification. Furthermore, with the estimates of counterfactual distribution functions, we provide methods to estimate the quantile treatment effects and test the stochastic dominance relationship between the potential outcome distributions. Moreover, Monte Carlo simulations are conducted to illustrate that the finite sample performance for the proposed estimator is better than the inverse propensity score weighted estimators in many scenarios. Finally, our empirical study revisits the effect of maternal smoking on infant birth weight. |
Keywords: | Counterfactual distribution function; Covariate balance; Quantile treatment effect; Stochastic dominance; Weighting scheme. |
JEL: | C01 C14 C54 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:kan:wpaper:202415 |
By: | Yanqin Fan; Yigit Okar; Xuetao Shi |
Abstract: | This article introduces an iterative distributed computing estimator for the multinomial logistic regression model with large choice sets. Compared to the maximum likelihood estimator, the proposed iterative distributed estimator achieves significantly faster computation and, when initialized with a consistent estimator, attains asymptotic efficiency under a weak dominance condition. Additionally, we propose a parametric bootstrap inference procedure based on the iterative distributed estimator and establish its consistency. Extensive simulation studies validate the effectiveness of the proposed methods and highlight the computational efficiency of the iterative distributed estimator. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.01030 |
By: | Zhaomeng Chen; Junting Duan; Victor Chernozhukov; Vasilis Syrgkanis |
Abstract: | This paper proposes the automatic Doubly Robust Random Forest (DRRF) algorithm for estimating the conditional expectation of a moment functional in the presence of high-dimensional nuisance functions. DRRF combines the automatic debiasing framework using the Riesz representer (Chernozhukov et al., 2022) with non-parametric, forest-based estimation methods for the conditional moment (Athey et al., 2019; Oprescu et al., 2019). In contrast to existing methods, DRRF does not require prior knowledge of the form of the debiasing term nor impose restrictive parametric or semi-parametric assumptions on the target quantity. Additionally, it is computationally efficient for making predictions at multiple query points and significantly reduces runtime compared to methods such as Orthogonal Random Forest (Oprescu et al., 2019). We establish the consistency and asymptotic normality results of DRRF estimator under general assumptions, allowing for the construction of valid confidence intervals. Through extensive simulations in heterogeneous treatment effect (HTE) estimation, we demonstrate the superior performance of DRRF over benchmark approaches in terms of estimation accuracy, robustness, and computational efficiency. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.07184 |
By: | Pierluigi Vallarino (Erasmus University Rotterdam and Tinbergen Institute) |
Abstract: | This paper introduces the family of Dynamic Kernel models. These models study the predictive density function of a time series through a weighted average of kernel densities possessing a dynamic bandwidth. A general specification is presented and several particular models are studied in details. We propose an M-estimator for model parameters and derive its asymptotic properties under a misspecified setting. A consistent density estimator also introduced. Monte Carlo results show that the new models effectively track the time-varying distribution of several data generating processes. Dynamic Kernel models outperform extant kernel-based approaches in tracking the predictive distribution of GDP growth. |
JEL: | C14 C51 C53 |
Date: | 2024–12–31 |
URL: | https://d.repec.org/n?u=RePEc:tin:wpaper:20240082 |
By: | Victor Chernozhukov; Whitney K. Newey; Vasilis Syrgkanis |
Abstract: | There are many nonparametric objects of interest that are a function of a conditional distribution. One important example is an average treatment effect conditional on a subset of covariates. Many of these objects have a conditional influence function that generalizes the classical influence function of a functional of a (unconditional) distribution. Conditional influence functions have important uses analogous to those of the classical influence function. They can be used to construct Neyman orthogonal estimating equations for conditional objects of interest that depend on high dimensional regressions. They can be used to formulate local policy effects and describe the effect of local misspecification on conditional objects of interest. We derive conditional influence functions for functionals of conditional means and other features of the conditional distribution of an outcome variable. We show how these can be used for locally linear estimation of conditional objects of interest. We give rate conditions for first step machine learners to have no effect on asymptotic distributions of locally linear estimators. We also give a general construction of Neyman orthogonal estimating equations for conditional objects of interest. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.18080 |
By: | St\'ephane Bonhomme; Koen Jochmans; Martin Weidner |
Abstract: | A popular approach to perform inference on a target parameter in the presence of nuisance parameters is to construct estimating equations that are orthogonal to the nuisance parameters, in the sense that their expected first derivative is zero. Such first-order orthogonalization may, however, not suffice when the nuisance parameters are very imprecisely estimated. Leading examples where this is the case are models for panel and network data that feature fixed effects. In this paper, we show how, in the conditional-likelihood setting, estimating equations can be constructed that are orthogonal to any chosen order. Combining these equations with sample splitting yields higher-order bias-corrected estimators of target parameters. In an empirical application we apply our method to a fixed-effect model of team production and obtain estimates of complementarity in production and impacts of counterfactual re-allocations. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.10304 |
By: | Zhaoyang Shi; Chinmoy Bhattacharjee; Krishnakumar Balasubramanian; Wolfgang Polonik |
Abstract: | We establish Gaussian approximation bounds for covariate and rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing these estimators through the lens of stabilization theory, we employ the Malliavin-Stein method to derive our results. Our bounds precisely quantify the impact of key problem parameters, including the number of matches and treatment balance, on the accuracy of the Gaussian approximation. Additionally, we develop multiplier bootstrap procedures to estimate the limiting distribution in a fully data-driven manner, and we leverage the derived Gaussian approximation results to further obtain bootstrap approximation bounds. Our work not only introduces a novel theoretical framework for commonly used ATE estimators, but also provides data-driven methods for constructing non-asymptotically valid confidence intervals. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.17181 |
By: | Sung Hoon Choi; Donggyu Kim |
Abstract: | In this paper, we develop a novel method for predicting future large volatility matrices based on high-dimensional factor-based It\^o processes. Several studies have proposed volatility matrix prediction methods using parametric models to account for volatility dynamics. However, these methods often impose restrictions, such as constant eigenvectors over time. To generalize the factor structure, we construct a cubic (order-3 tensor) form of an integrated volatility matrix process, which can be decomposed into low-rank tensor and idiosyncratic tensor components. To predict conditional expected large volatility matrices, we introduce the Projected Tensor Principal Orthogonal componEnt Thresholding (PT-POET) procedure and establish its asymptotic properties. Finally, the advantages of PT-POET are also verified by a simulation study and illustrated by applying minimum variance portfolio allocation using high-frequency trading data. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.04293 |
By: | Dennis Lim; Wenjie Wang; Yichong Zhang |
Abstract: | Weak-identification-robust Anderson-Rubin (AR) tests for instrumental variable (IV) regressions are typically developed separately depending on whether the number of IVs is treated as fixed or increasing with the sample size. These tests rely on distinct test statistics and critical values. To apply them, researchers are forced to take a stance on the asymptotic behavior of the number of IVs, which can be ambiguous when the number is moderate. In this paper, we propose a bootstrap-based, dimension-agnostic AR test. By deriving strong approximations for the test statistic and its bootstrap counterpart, we show that our new test has a correct asymptotic size regardless of whether the number of IVs is fixed or increasing -- allowing, but not requiring, the number of IVs to exceed the sample size. We also analyze the power properties of the proposed uniformly valid test under both fixed and increasing numbers of IVs. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.01603 |
By: | Phillip Heiler; Asbj{\o}rn Kaufmann; Bezirgen Veliyev |
Abstract: | This paper provides a solution to the evaluation of treatment effects in selective samples when neither instruments nor parametric assumptions are available. We provide sharp bounds for average treatment effects under a conditional monotonicity assumption for all principal strata, i.e. units characterizing the complete intensive and extensive margins. Most importantly, we allow for a large share of units whose selection is indifferent to treatment, e.g. due to non-compliance. The existence of such a population is crucially tied to the regularity of sharp population bounds and thus conventional asymptotic inference for methods such as Lee bounds can be misleading. It can be solved using smoothed outer identification regions for inference. We provide semiparametrically efficient debiased machine learning estimators for both regular and smooth bounds that can accommodate high-dimensional covariates and flexible functional forms. Our study of active labor market policy reveals the empirical prevalence of the aforementioned indifference population and supports results from previous impact analysis under much weaker assumptions. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.11179 |
By: | Eduardo Schirmer Finn; Eduardo Horta |
Abstract: | For highly skewed or fat-tailed distributions, mean or median-based methods often fail to capture the central tendencies in the data. Despite being a viable alternative, estimating the conditional mode given certain covariates (or mode regression) presents significant challenges. Nonparametric approaches suffer from the "curse of dimensionality", while semiparametric strategies often lead to non-convex optimization problems. In order to avoid these issues, we propose a novel mode regression estimator that relies on an intermediate step of inverting the conditional quantile density. In contrast to existing approaches, we employ a convolution-type smoothed variant of the quantile regression. Our estimator converges uniformly over the design points of the covariates and, unlike previous quantile-based mode regressions, is uniform with respect to the smoothing bandwidth. Additionally, the Convolution Mode Regression is dimension-free, carries no issues regarding optimization and preliminary simulations suggest the estimator is normally distributed in finite samples. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.05736 |
By: | Bora Kim |
Abstract: | In estimating spillover effects under network interference, practitioners often use linear regression with either the number or fraction of treated neighbors as regressors. An often overlooked fact is that the latter is undefined for units without neighbors (``isolated nodes"). The common practice is to impute this fraction as zero for isolated nodes. This paper shows that such practice introduces bias through theoretical derivations and simulations. Causal interpretations of the commonly used spillover regression coefficients are also provided. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.05919 |
By: | Jan Pr\"user |
Abstract: | We propose a large structural VAR which is identified by higher moments without the need to impose economically motivated restrictions. The model scales well to higher dimensions, allowing the inclusion of a larger number of variables. We develop an efficient Gibbs sampler to estimate the model. We also present an estimator of the deviance information criterion to facilitate model comparison. Finally, we discuss how economically motivated restrictions can be added to the model. Experiments with artificial data show that the model possesses good estimation properties. Using real data we highlight the benefits of including more variables in the structural analysis. Specifically, we identify a monetary policy shock and provide empirical evidence that prices and economic output respond with a large delay to the monetary policy shock. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.17598 |
By: | Tae-Hwy Lee (Department of Economics, University of California Riverside); Daanish Padha (University of California, Riverside) |
Abstract: | We extend the Three-Pass Regression Filter (3PRF) in two key dimensions: (1) accommodating weak factors and, (2) allowing for a correlation between the target variable and the predictors, even after adjusting for common factors, driven by correlations in the idiosyncratic components of the covariates and the prediction target. Our theoretical contribution is to establish the consistency of 3PRF under these flexible assumptions, showing that relevant factors can be consistently estimated even when they are weak, albeit at slower rates. Stronger relevant factors improve 3PRF convergence to the infeasible best forecast, while weaker relevant factors dampen it. Conversely, stronger irrelevant factors hinder the rate of convergence, whereas weaker irrelevant factors enhance it. We compare 3PRF with Principal Component Regression (PCR), highlighting scenarios where 3PRF performs better. Methodologically, we extend 3PRF by integrating a LASSO step to develop the 3PRF LASSO estimator, which effectively captures the target's dependency on the predictors' idiosyncratic components. We derive the rate at which the average prediction error from this step converges to zero, accounting for generated regressor effects. Simulation results confirm that 3PRF performs well under these broad assumptions, with the LASSO step delivering a substantial gain. In an empirical application using the FRED-QD dataset, 3PRF LASSO delivers reliable forecasts of key macroeconomic variables across multiple horizons. |
Keywords: | Weak Factors; Forecasting; high dimension; supervision; three pass regression filter; LASSO. |
JEL: | C18 C22 C53 C55 E27 |
Date: | 2025–01 |
URL: | https://d.repec.org/n?u=RePEc:ucr:wpaper:202502 |
By: | Jinyuan Chang; Cheng Yong Tang; Yuanzheng Zhu |
Abstract: | In this study, we introduce a novel methodological framework called Bayesian Penalized Empirical Likelihood (BPEL), designed to address the computational challenges inherent in empirical likelihood (EL) approaches. Our approach has two primary objectives: (i) to enhance the inherent flexibility of EL in accommodating diverse model conditions, and (ii) to facilitate the use of well-established Markov Chain Monte Carlo (MCMC) sampling schemes as a convenient alternative to the complex optimization typically required for statistical inference using EL. To achieve the first objective, we propose a penalized approach that regularizes the Lagrange multipliers, significantly reducing the dimensionality of the problem while accommodating a comprehensive set of model conditions. For the second objective, our study designs and thoroughly investigates two popular sampling schemes within the BPEL context. We demonstrate that the BPEL framework is highly flexible and efficient, enhancing the adaptability and practicality of EL methods. Our study highlights the practical advantages of using sampling techniques over traditional optimization methods for EL problems, showing rapid convergence to the global optima of posterior distributions and ensuring the effective resolution of complex statistical inference challenges. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.17354 |
By: | Abhimanyu Gupta; Jungyoon Lee; Francesca Rossi |
Abstract: | We propose a computationally straightforward test for the linearity of a spatial interaction function. Such functions arise commonly, either as practitioner imposed specifications or due to optimizing behaviour by agents. Our test is nonparametric, but based on the Lagrange Multiplier principle and reminiscent of the Ramsey RESET approach. This entails estimation only under the null hypothesis, which yields an easy to estimate linear spatial autoregressive model. Monte Carlo simulations show excellent size control and power. An empirical study with Finnish data illustrates the test's practical usefulness, shedding light on debates on the presence of tax competition among neighbouring municipalities. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.14778 |
By: | Yuta Okamoto; Yuuki Ozaki |
Abstract: | Regression discontinuity (RD) designs typically identify the treatment effect at a single cutoff point. But when and how can we learn about treatment effects away from the cutoff? This paper addresses this question within a multiple-cutoff RD framework. We begin by examining the plausibility of the constant bias assumption proposed by Cattaneo, Keele, Titiunik, and Vazquez-Bare (2021) through the lens of rational decision-making behavior, which suggests that a kind of similarity between groups and whether individuals can influence the running variable are important factors. We then introduce an alternative set of assumptions and propose a broadly applicable partial identification strategy. The potential applicability and usefulness of the proposed bounds are illustrated through two empirical examples. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.04265 |
By: | Giuseppe Buccheri; Fulvio Corsi; Emilija Dzuverovic |
Abstract: | We show that, for a certain class of scaling matrices including the commonly used inverse square-root of the conditional Fisher Information, score-driven factor models are identifiable up to a multiplicative scalar constant under very mild restrictions. This result has no analogue in parameter-driven models, as it exploits the different structure of the score-driven factor dynamics. Consequently, score-driven models offer a clear advantage in terms of economic interpretability compared to parameter-driven factor models, which are identifiable only up to orthogonal transformations. Our restrictions are order-invariant and can be generalized to scoredriven factor models with dynamic loadings and nonlinear factor models. We test extensively the identification strategy using simulated and real data. The empirical analysis on financial and macroeconomic data reveals a substantial increase of log-likelihood ratios and significantly improved out-of-sample forecast performance when switching from the classical restrictions adopted in the literature to our more flexible specifications. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.01367 |
By: | Andrea Bucci; Michele Palma; Chao Zhang |
Abstract: | Traditional methods employed in matrix volatility forecasting often overlook the inherent Riemannian manifold structure of symmetric positive definite matrices, treating them as elements of Euclidean space, which can lead to suboptimal predictive performance. Moreover, they often struggle to handle high-dimensional matrices. In this paper, we propose a novel approach for forecasting realized covariance matrices of asset returns using a Riemannian-geometry-aware deep learning framework. In this way, we account for the geometric properties of the covariance matrices, including possible non-linear dynamics and efficient handling of high-dimensionality. Moreover, building upon a Fr\'echet sample mean of realized covariance matrices, we are able to extend the HAR model to the matrix-variate. We demonstrate the efficacy of our approach using daily realized covariance matrices for the 50 most capitalized companies in the S&P 500 index, showing that our method outperforms traditional approaches in terms of predictive accuracy. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.09517 |
By: | Beatrice Foroni; Luca Merlo; Lea Petrella |
Abstract: | In this paper we develop a novel hidden Markov graphical model to investigate time-varying interconnectedness between different financial markets. To identify conditional correlation structures under varying market conditions and accommodate stylized facts embedded in financial time series, we rely upon the generalized hyperbolic family of distributions with time-dependent parameters evolving according to a latent Markov chain. We exploit its location-scale mixture representation to build a penalized EM algorithm for estimating the state-specific sparse precision matrices by means of an $L_1$ penalty. The proposed approach leads to regime-specific conditional correlation graphs that allow us to identify different degrees of network connectivity of returns over time. The methodology's effectiveness is validated through simulation exercises under different scenarios. In the empirical analysis we apply our model to daily returns of a large set of market indexes, cryptocurrencies and commodity futures over the period 2017-2023. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.03668 |
By: | Peder Isager (Oslo New University College); Jack Fitzgerald (Vrije Universiteit Amsterdam and Tinbergen Institute) |
Abstract: | Researchers may want to know whether an observed statistical relationship is either meaningfully negative, meaningfully positive, or small enough to be considered practically equivalent to zero. Such a question can not be addressed with standard null hypothesis significance testing, nor with standard equivalence testing. Three-sided testing (TST) is a procedure to address such questions, by simultaneously testing whether an estimated relationship is significantly below, within, or above predetermined smallest effect sizes of interest. TST is a natural extension of the standard two one-sided tests (TOST) procedure for equivalence testing. TST offers a more comprehensive decision framework than TOST with no penalty to error rates or statistical power. In this paper, we give a non-technical introduction to TST, provide commands for conducting TST in R, Jamovi, and Stata, and provide a Shiny app for easy implementation. Whenever a meaningful smallest effect size of interest can be specified, TST should be combined with null hypothesis significance testing as a standard frequentist testing procedure. |
Keywords: | Three-sided testing, equivalence testing, interval testing, hypothesis testing, R, Stata, NHST, effect size |
JEL: | C12 C18 C87 |
Date: | 2024–12–20 |
URL: | https://d.repec.org/n?u=RePEc:tin:wpaper:20240077 |
By: | Kexin Zhang (City University of Hong Kong); Simon Trimborn (University of Amsterdam and Tinbergen Institute) |
Abstract: | When a company releases earnings results or makes announcements, a dominant sectoral wide lead-lag effect from the stock on the entire system may occur. To improve the estimation of a system experiencing dominant system-wide lead-lag effects from one or a few asset in the presence of short time series, we introduce a model for Large-scale Influencer Structures in Vector AutoRegressions (LISAR). To investigate its performance when little observations are available, we compare the LISAR model against competing models on synthetic data, showing that LISAR outperforms in forecasting accuracy and structural detection even for different strength of system persistence and when the model is misspecified. On high-frequency data for the constituents of the S&P100, separated by sectors, we find the LISAR model to significantly outperform or perform equally good for up to 91% of the time series under consideration in terms of forecasting accuracy. We show in this study, that in the presence of influencer structures within a sector, the LISAR model, compared to alternative models, provides higher accuracy, better forecasting results, and improves the understanding of market movements and sectoral structures. |
Date: | 2024–12–20 |
URL: | https://d.repec.org/n?u=RePEc:tin:wpaper:20240080 |
By: | Dobrislav Dobrev; Pawel J. Szerszen |
Abstract: | Replacing faulty measurements with missing values can suppress outlier-induced distortions in state-space inference. We therefore put forward two complementary methods for enhanced outlier-robust filtering and forecasting: supervised missing data substitution (MD) upon exceeding a Huber threshold, and unsupervised missing data substitution via exogenous randomization (RMDX).Our supervised method, MD, is designed to improve performance of existing Huber-based linear filters known to lose optimality when outliers of the same sign are clustered in time rather than arriving independently. The unsupervised method, RMDX, further aims to suppress smaller outliers whose size may fall below the Huber detection threshold. To this end, RMDX averages filtered or forecasted targets based on measurement series with randomly induced subsets of missing data at an exogenously set randomization rate. This gives rise to regularization and bias-variance trade-off as a function of the missing data randomization rate, which can be set optimally using standard cross-validation techniques.We validate through Monte Carlo simulations that both methods for missing data substitution can significantly improve robust filtering, especially when combined together. As further empirical validation, we document consistently attractive performance in linear models for forecasting inflation trends prone to clustering of measurement outliers. |
Keywords: | Kalman filter; Outliers; Huberization; Missing data; Randomization |
JEL: | C15 C22 C53 E37 |
Date: | 2025–01–03 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:2025-01 |
By: | Kazuki Tomioka; Thomas T. Yang; Xibin Zhang |
Abstract: | Stochastic frontier models have attracted significant interest over the years due to their unique feature of including a distinct inefficiency term alongside the usual error term. To effectively separate these two components, strong distributional assumptions are often necessary. To overcome this limitation, numerous studies have sought to relax or generalize these models for more robust estimation. In line with these efforts, we introduce a latent group structure that accommodates heterogeneity across firms, addressing not only the stochastic frontiers but also the distribution of the inefficiency term. This framework accounts for the distinctive features of stochastic frontier models, and we propose a practical estimation procedure to implement it. Simulation studies demonstrate the strong performance of our proposed method, which is further illustrated through an application to study the cost efficiency of the U.S. commercial banking sector. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.08831 |
By: | Gianluca Cubadda |
Abstract: | The main aim of this paper is to review recent advances in the multivariate autoregressive index model [MAI], originally proposed by reinsel1983some , and their applications to economic and financial time series. MAI has recently gained momentum because it can be seen as a link between two popular but distinct multivariate time series approaches: vector autoregressive modeling [VAR] and the dynamic factor model [DFM]. Indeed, on the one hand, the MAI is a VAR model with a peculiar reduced-rank structure; on the other hand, it allows for identification of common components and common shocks in a similar way as the DFM. The focus is on recent developments of the MAI, which include extending the original model with individual autoregressive structures, stochastic volatility, time-varying parameters, high-dimensionality, and cointegration. In addition, new insights on previous contributions and a novel model are also provided. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.11278 |
By: | Miguel C. Herculano; Santiago Montoya-Bland\'on |
Abstract: | We develop a probabilistic variant of Partial Least Squares (PLS) we call Probabilistic Targeted Factor Analysis (PTFA), which can be used to extract common factors in predictors that are useful to predict a set of predetermined target variables. Along with the technique, we provide an efficient expectation-maximization (EM) algorithm to learn the parameters and forecast the targets of interest. We develop a number of extensions to missing-at-random data, stochastic volatility, and mixed-frequency data for real-time forecasting. In a simulation exercise, we show that PTFA outperforms PLS at recovering the common underlying factors affecting both features and target variables delivering better in-sample fit, and providing valid forecasts under contamination such as measurement error or outliers. Finally, we provide two applications in Economics and Finance where PTFA performs competitively compared with PLS and Principal Component Analysis (PCA) at out-of-sample forecasting. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.06688 |
By: | Kees Jan van Garderen; Noud van Giersbergen |
Abstract: | Mediation analysis is a form of causal inference that investigates indirect effects and causal mechanisms. Confidence intervals for indirect effects play a central role in conducting inference. The problem is non-standard leading to coverage rates that deviate considerably from their nominal level. The default inference method in the mediation model is the paired bootstrap, which resamples directly from the observed data. However, a residual bootstrap that explicitly exploits the assumed causal structure (X->M->Y) could also be applied. There is also a debate whether the bias-corrected (BC) bootstrap method is superior to the percentile method, with the former showing liberal behavior (actual coverage too low) in certain circumstances. Moreover, bootstrap methods tend to be very conservative (coverage higher than required) when mediation effects are small. Finally, iterated bootstrap methods like the double bootstrap have not been considered due to their high computational demands. We investigate the issues mentioned in the simple mediation model by a large-scale simulation. Results are explained using graphical methods and the newly derived finite-sample distribution. The main findings are: (i) conservative behavior of the bootstrap is caused by extreme dependence of the bootstrap distribution's shape on the estimated coefficients (ii) this dependence leads to counterproductive correction of the the double bootstrap. The added randomness of the BC method inflates the coverage in the absence of mediation, but still leads to (invalid) liberal inference when the mediation effect is small. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.11285 |
By: | James Bland; Yaroslav Rosokha |
Abstract: | Estimation of belief learning models relies on several important assumptions regarding measurement errors. Whereas existing work has focused on classical measurement errors, the current paper is the first to investigate the impact of a non-classical, behavioral measurement error—rounding bias. In particular, we design and carry out a novel economics experiment in conjunction with simulations and a meta-study of existing papers to show a strong impact of rounding bias on belief updating. In addition, we propose an econometric technique to aid researchers in overcoming challenges posed by the rounded responses in belief elicitation questions. |
Keywords: | Rounding Bias, Measurement Errors, Bayesian Updating, Belief Updating, Learning, Conservatism, Base-Rate Neglect, Econometrics, Hierarchical Bayesian Models |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:pur:prukra:1353 |
By: | Tao Sun |
Abstract: | This paper proposes a Bayesian factor-augmented bundle choice model to estimate joint consumption as well as the substitutability and complementarity of multiple goods in the presence of endogenous regressors. The model extends the two primary treatments of endogeneity in existing bundle choice models: (1) endogenous market-level prices and (2) time-invariant unobserved individual heterogeneity. A Bayesian sparse factor approach is employed to capture high-dimensional error correlations that induce taste correlation and endogeneity. Time-varying factor loadings allow for more general individual-level and time-varying heterogeneity and endogeneity, while the sparsity induced by the shrinkage prior on loadings balances flexibility with parsimony. Applied to a soda tax in the context of complementarities, the new approach captures broader effects of the tax that were previously overlooked. Results suggest that a soda tax could yield additional health benefits by marginally decreasing the consumption of salty snacks along with sugary drinks, extending the health benefits beyond the reduction in sugar consumption alone. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.05794 |
By: | Sourav Majumdar; Arnab Kumar Laha |
Abstract: | We propose analytically tractable SDE models for correlation in financial markets. We study diffusions on the circle, namely the Brownian motion on the circle and the von Mises process, and consider these as models for correlation. The von Mises process was proposed in Kent (1975) as a probabilistic justification for the von Mises distribution which is widely used in Circular statistics. The transition density of the von Mises process has been unknown, we identify an approximate analytic transition density for the von Mises process. We discuss the estimation of these diffusion models and a stochastic correlation model in finance. We illustrate the application of the proposed model on real-data of equity-currency pairs. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.06343 |
By: | Benedikt M. P\"otscher; David Preinerstorfer |
Abstract: | We revisit size controllability results in P\"otscher and Preinerstorfer (2021) concerning heteroskedasticity robust test statistics in regression models. For the special, but important, case of testing a single restriction (e.g., a zero restriction on a single coefficient), we povide a necessary and sufficient condition for size controllability, whereas the condition in P\"otscher and Preinerstorfer (2021) is, in general, only sufficient (even in the case of testing a single restriction). |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.17470 |
By: | Neil Christy; Amanda Ellen Kowalski |
Abstract: | We present a design-based model of a randomized experiment in which the observed outcomes are informative about the joint distribution of potential outcomes within the experimental sample. We derive a likelihood function that maintains curvature with respect to the joint distribution of potential outcomes, even when holding the marginal distributions of potential outcomes constant -- curvature that is not maintained in a sampling-based likelihood that imposes a large sample assumption. Our proposed decision rule guesses the joint distribution of potential outcomes in the sample as the distribution that maximizes the likelihood. We show that this decision rule is Bayes optimal under a uniform prior. Our optimal decision rule differs from and significantly outperforms a ``monotonicity'' decision rule that assumes no defiers or no compliers. In sample sizes ranging from 2 to 40, we show that the Bayes expected utility of the optimal rule increases relative to the monotonicity rule as the sample size increases. In two experiments in health care, we show that the joint distribution of potential outcomes that maximizes the likelihood need not include compliers even when the average outcome in the intervention group exceeds the average outcome in the control group, and that the maximizer of the likelihood may include both compliers and defiers, even when the average intervention effect is large and statistically significant. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.16352 |
By: | Markus Trunschke; Kenneth L. Judd |
Abstract: | This paper develops a novel method to estimate production functions. Earlier papers rely on special assumptions about the functional form of production functions. Our approach efficiently estimates all parameters of any production functions with Hicks-neutral productivity without additional exogenous variables or sources of variation in flexible input demand. We provide Monte Carlo Simulation evidence of our method’s performance and test our approach on empirical data from Chilean and Colombian manufacturing industries. |
JEL: | C02 D24 L0 |
Date: | 2024–11 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:33205 |
By: | Dou, Liyu (School of Economics, Singapore Management University); Ho, Paul (FRB); Lubik, Thomas A. (FRB) |
Abstract: | Max-share identification relies on a decomposition of the forecast error variance (FEV) over a target horizon. Consequently, it often conflates multiple shocks because the contribution to the FEV depends on the impulse responses at untargeted horizons and the shapes of the responses to untargeted shocks. We alleviate the issues using a socalled “single horizon” alternative that focuses narrowly on the actual target horizon. We characterize the identified shock in terms of true structural shocks in the single horizon problem and show that this typically bounds results in the literature’s usual implementation. Using a numerical demand and supply example and an empirical news shock application, we show that the traditional max-share approach inadvertently places weight on untargeted transitory shocks, a problem that the single horizon approach avoids. |
Date: | 2024–09–01 |
URL: | https://d.repec.org/n?u=RePEc:ris:smuesw:2024_013 |
By: | Lin William Cong; Tengyuan Liang; Xiao Zhang; Wu Zhu |
Abstract: | We introduce a general approach for analyzing large-scale text-based data, combining the strengths of neural network language processing and generative statistical modeling to create a factor structure of unstructured data for downstream regressions typically used in social sciences. We generate textual factors by (i) representing texts using vector word embedding, (ii) clustering the vectors using Locality-Sensitive Hashing to generate supports of topics, and (iii) identifying relatively interpretable spanning clusters (i.e., textual factors) through topic modeling. Our data-driven approach captures complex linguistic structures while ensuring computational scalability and economic interpretability, plausibly attaining certain advantages over and complementing other unstructured data analytics used by researchers, including emergent large language models. We conduct initial validation tests of the framework and discuss three types of its applications: (i) enhancing prediction and inference with texts, (ii) interpreting (non-text-based) models, and (iii) constructing new text-based metrics and explanatory variables. We illustrate each of these applications using examples in finance and economics such as macroeconomic forecasting from news articles, interpreting multi-factor asset pricing models from corporate filings, and measuring theme-based technology breakthroughs from patents. Finally, we provide a flexible statistical package of textual factors for online distribution to facilitate future research and applications. |
JEL: | C13 |
Date: | 2024–11 |
URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:33168 |
By: | Jens Ludwig; Sendhil Mullainathan; Ashesh Rambachan |
Abstract: | Large language models (LLMs) are being used in economics research to form predictions, label text, simulate human responses, generate hypotheses, and even produce data for times and places where such data don't exist. While these uses are creative, are they valid? When can we abstract away from the inner workings of an LLM and simply rely on their outputs? We develop an econometric framework to answer this question. Our framework distinguishes between two types of empirical tasks. Using LLM outputs for prediction problems (including hypothesis generation) is valid under one condition: no "leakage" between the LLM's training dataset and the researcher's sample. Using LLM outputs for estimation problems to automate the measurement of some economic concept (expressed by some text or from human subjects) requires an additional assumption: LLM outputs must be as good as the gold standard measurements they replace. Otherwise estimates can be biased, even if LLM outputs are highly accurate but not perfectly so. We document the extent to which these conditions are violated and the implications for research findings in illustrative applications to finance and political economy. We also provide guidance to empirical researchers. The only way to ensure no training leakage is to use open-source LLMs with documented training data and published weights. The only way to deal with LLM measurement error is to collect validation data and model the error structure. A corollary is that if such conditions can't be met for a candidate LLM application, our strong advice is: don't. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.07031 |
By: | Niko Hauzenberger; Florian Huber; Karin Klieber; Massimiliano Marcellino |
Abstract: | We propose a method to learn the nonlinear impulse responses to structural shocks using neural networks, and apply it to uncover the effects of US financial shocks. The results reveal substantial asymmetries with respect to the sign of the shock. Adverse financial shocks have powerful effects on the US economy, while benign shocks trigger much smaller reactions. Instead, with respect to the size of the shocks, we find no discernible asymmetries. |
Date: | 2024–12 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2412.07649 |