|
on Econometrics |
By: | Yihong Xu; Li Zheng |
Abstract: | We introduce a novel estimator for quantile causal effects with high-dimensional panel data (large $N$ and $T$), where only one or a few units are affected by the intervention or policy. Our method extends the generalized synthetic control method (Xu 2017) from average treatment effect on the treated to quantile treatment effect on the treated, allowing the underlying factor structure to change across the quantile of the interested outcome distribution. Our method involves estimating the quantile-dependent factors using the control group, followed by a quantile regression to estimate the quantile treatment effect using the treated units. We establish the asymptotic properties of our estimator and propose a bootstrap procedure for statistical inference, supported by simulation studies. An empirical application of the 2008 China Stimulus Program is provided. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.00785 |
By: | Alberto Abadie; Anish Agarwal; Devavrat Shah |
Abstract: | We propose a formal model for counterfactual estimation with unobserved confounding in "data-rich" settings, i.e., where there are a large number of units and a large number of measurements per unit. Our model provides a bridge between the structural causal model view of causal inference common in the graphical models literature with that of the latent factor model view common in the potential outcomes literature. We show how classic models for potential outcomes and treatment assignments fit within our framework. We provide an identification argument for the average treatment effect, the average treatment effect on the treated, and the average treatment effect on the untreated. For any estimator that has a fast enough estimation error rate for a certain nuisance parameter, we establish it is consistent for these various causal parameters. We then show principal component regression is one such estimator that leads to consistent estimation, and we analyze the minimal smoothness required of the potential outcomes function for consistency. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.01702 |
By: | Monika Avila Marquez |
Abstract: | A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear. The reason is that ignoring the nonlinearity could lead to weak instruments As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. Estimating the parameter of interest is done using an estimator that we call Super Learner Control Function estimator (SLCFE). The estimation procedure is composed of two main steps and sample splitting. We estimate the control function using a super learner using sample splitting. In the following step, we use the estimated control function to control for endogeneity in the structural equation. Sample splitting is done across the individual dimension. We perform a Monte Carlo simulation to test the performance of the estimators proposed. We conclude that the Super Learner Control Function Estimators significantly outperform Within 2SLS estimators. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.03228 |
By: | Qin Fang; Shaojun Guo; Yang Hong; Xinghao Qiao |
Abstract: | Empirical likelihood serves as a powerful tool for constructing confidence intervals in nonparametric regression and regression discontinuity designs (RDD). The original empirical likelihood framework can be naturally extended to these settings using local linear smoothers, with Wilks' theorem holding only when an undersmoothed bandwidth is selected. However, the generalization of bias-corrected versions of empirical likelihood under more realistic conditions is non-trivial and has remained an open challenge in the literature. This paper provides a satisfactory solution by proposing a novel approach, referred to as robust empirical likelihood, designed for nonparametric regression and RDD. The core idea is to construct robust weights which simultaneously achieve bias correction and account for the additional variability introduced by the estimated bias, thereby enabling valid confidence interval construction without extra estimation steps involved. We demonstrate that the Wilks' phenomenon still holds under weaker conditions in nonparametric regression, sharp and fuzzy RDD settings. Extensive simulation studies confirm the effectiveness of our proposed approach, showing superior performance over existing methods in terms of coverage probabilities and interval lengths. Moreover, the proposed procedure exhibits robustness to bandwidth selection, making it a flexible and reliable tool for empirical analyses. The practical usefulness is further illustrated through applications to two real datasets. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.01535 |
By: | Paulo M.M. Rodrigues; Vivien Less; Philipp Sibbertsen |
Abstract: | This paper focuses on the estimation and testing of multiple breaks that occur at unknown dates in multivariate long memory time series regression models, allowing for fractional cointegration. A likelihood-ratio based approach for estimating the breaks in the parameters and in the covariance of a system of long memory time series regressions is proposed. The limiting distributions as well as the consistency of the estimators are derived. Furthermore, a testing procedure to determine the unknown number of breaks is introduced which is based on iterative testing on the regression residuals. A Monte Carlo exercise shows the good finite sample properties of our novel approach, and empirical applications on inflation series of France and Germany and on benchmark government bonds of eight euro area countries illustrate theusefulness of the proposed procedures. |
JEL: | C12 C22 C58 G15 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ptu:wpaper:w202503 |
By: | Haoze Hou; Wei Huang; Zheng Zhang |
Abstract: | This paper studies the non-parametric estimation and uniform inference for the conditional quantile regression function (CQRF) with covariates exposed to measurement errors. We consider the case that the distribution of the measurement error is unknown and allowed to be either ordinary or super smooth. We estimate the density of the measurement error by the repeated measurements and propose the deconvolution kernel estimator for the CQRF. We derive the uniform Bahadur representation of the proposed estimator and construct the uniform confidence bands for the CQRF, uniformly in the sense for all covariates and a set of quantile indices, and establish the theoretical validity of the proposed inference. A data-driven approach for selecting the tuning parameter is also included. Monte Carlo simulations and a real data application demonstrate the usefulness of the proposed method. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.01761 |
By: | Paul Haimerl; Stephan Smeekes; Ines Wilms |
Abstract: | We introduce a panel data model where coefficients vary both over time and the cross-section. Slope coefficients change smoothly over time and follow a latent group structure, being homogeneous within but heterogeneous across groups. The group structure is identified using a pairwise adaptive group fused-Lasso penalty. The trajectories of time-varying coefficients are estimated via polynomial spline functions. We derive the asymptotic distributions of the penalized and post-selection estimators and show their oracle efficiency. A simulation study demonstrates excellent finite sample properties. An application to the emission intensity of GDP highlights the relevance of addressing cross-sectional heterogeneity and time-variance in empirical settings. |
Date: | 2025–03 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2503.23165 |
By: | Gilles Crommen; Jad Beyhum; Ingrid Van Keilegom |
Abstract: | In this work, we are interested in studying the causal effect of an endogenous binary treatment on a dependently censored duration outcome. By dependent censoring, it is meant that the duration time ($T$) and right censoring time ($C$) are not statistically independent of each other, even after conditioning on the measured covariates. The endogeneity issue is handled by making use of a binary instrumental variable for the treatment. To deal with the dependent censoring problem, it is assumed that on the stratum of compliers: (i) $T$ follows a semiparametric proportional hazards model; (ii) $C$ follows a fully parametric model; and (iii) the relation between $T$ and $C$ is modeled by a parametric copula, such that the association parameter can be left unspecified. In this framework, the treatment effect of interest is the complier causal hazard ratio (CCHR). We devise an estimation procedure that is based on a weighted maximum likelihood approach, where the weights are the probabilities of an observation coming from a complier. The weights are estimated non-parametrically in a first stage, followed by the estimation of the CCHR. Novel conditions under which the model is identifiable are given, a two-step estimation procedure is proposed and some important asymptotic properties are established. Simulations are used to assess the validity and finite-sample performance of the estimation procedure. Finally, we apply the approach to estimate the CCHR of both job training programs on unemployment duration and periodic screening examinations on time until death from breast cancer. The data come from the National Job Training Partnership Act study and the Health Insurance Plan of Greater New York experiment respectively. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.02096 |
By: | David Van Dijcke |
Abstract: | This article introduces Regression Discontinuity Design (RDD) with Distribution-Valued Outcomes (R3D), extending the standard RDD framework to settings where the outcome is a distribution rather than a scalar. Such settings arise when treatment is assigned at a higher level of aggregation than the outcome-for example, when a subsidy is allocated based on a firm-level revenue cutoff while the outcome of interest is the distribution of employee wages within the firm. Since standard RDD methods cannot accommodate such two-level randomness, I propose a novel approach based on random distributions. The target estimand is a "local average quantile treatment effect", which averages across random quantiles. To estimate this target, I introduce two related approaches: one that extends local polynomial regression to random quantiles and another based on local Fr\'echet regression, a form of functional regression. For both estimators, I establish asymptotic normality and develop uniform, debiased confidence bands together with a data-driven bandwidth selection procedure. Simulations validate these theoretical properties and show existing methods to be biased and inconsistent in this setting. I then apply the proposed methods to study the effects of gubernatorial party control on within-state income distributions in the US, using a close-election design. The results suggest a classic equality-efficiency tradeoff under Democratic governorship, driven by reductions in income at the top of the distribution. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.03992 |
By: | Jiehan Liu; Ziyi Liu; Yiqing Xu |
Abstract: | This Element offers a practical guide to estimating conditional marginal effects-how treatment effects vary with a moderating variable-using modern statistical methods. Commonly used approaches, such as linear interaction models, often suffer from unclarified estimands, limited overlap, and restrictive functional forms. This guide begins by clearly defining the estimand and presenting the main identification results. It then reviews and improves upon existing solutions, such as the semiparametric kernel estimator, and introduces robust estimation strategies, including augmented inverse propensity score weighting with Lasso selection (AIPW-Lasso) and double machine learning (DML) with modern algorithms. Each method is evaluated through simulations and empirical examples, with practical recommendations tailored to sample size and research context. All tools are implemented in the accompanying interflex package for R. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.01355 |
By: | Peter C. B. Phillips (Yale University); Liang Jiang (Fudan University) |
Abstract: | This paper develops and applies new asymptotic theory for estimation and inference in parametric autoregression with function valued cross section curve time series. The study provides a new approach to dynamic panel regression with high dimensional dependent cross section data. Here we deal with the stationary case and provide a full set of results extending those of standard Euclidean space autoregression, showing how function space curve cross section data raises efficiency and reduces bias in estimation and shortens confidence intervals in inference. Methods are developed for high-dimensional covariance kernel estimation that are useful for inference. The findings reveal that function space models with wide-domain and narrow-domain cross section dependence provide insights on the effects of various forms of cross section dependence in discrete dynamic panel models with fixed and interactive fixed effects. The methodology is applicable to panels of high dimensional wide datasets that are now available in many longitudinal studies. An empirical illustration is provided that sheds light on household Engel curves among ageing seniors in Singapore using the Singapore life panel longitudinal dataset. |
Date: | 2025–04–19 |
URL: | https://d.repec.org/n?u=RePEc:cwl:cwldpp:2439 |
By: | Klebel, Thomas; Traag, Vincent |
Abstract: | Sound causal inference is crucial for advancing the study of science. Incorrectly interpreting predictive effects as causal might lead to ineffective or even detrimental policy recommendations. Many publications in science studies lack appropriate methods to substantiate causal claims. We here provide an introduction to structural causal models for science studies. Structural causal models, usually represented in a graphical form, allow researchers to make their causal assumptions transparent and provide a foundation for causal inference. We illustrate how to use structural causal models to conduct causal inference using regression models based on simulated data of a hypothetical structural causal model of Open Science. The graphical representation of structural causal models allows researchers to clearly communicate their assumptions and findings, thereby fostering further discussion. We hope our introduction helps more researchers in science studies to consider causality explicitly. |
Date: | 2025–03–12 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:4bw9e_v3 |
By: | Mittag, Nikolas (CERGE-EI) |
Abstract: | Regressors often have heterogeneous effects in the social sciences, implying unit-specific slopes. OLS is frequently applied to these correlated coefficient models. I first show that without restrictions on the relation between slopes and regressors, OLS estimates can take any value including being negative even though all individual slopes are positive. I derive a simple formula for the bias in the OLS estimates, which depends on the covariance of the slopes with the squared regressor. While instrumental variable methods still allow estimation of (local) average effects under the additional assumptions that the instrument is independent of the coefficients in the first stage and reduced form equations, the results here imply complicated biases when these assumptions fail. Taken together, these results imply that heterogeneous effects systematically affect estimates beyond the well-known case of local average effects and provides researchers with a simple approach to assess how heterogeneity alters their estimates and conclusions. |
Keywords: | correlated coefficient model, heterogeneous effects |
JEL: | C21 C26 |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:iza:izadps:dp17856 |
By: | Markus Bibinger (Faculty of Mathematics and Computer Science, Institute of Mathematics, University of Würzburg); Jun Yu (Faculty of Business Administration, University of Macau); Chen Zhang (Faculty of Business Administration, University of Macau) |
Abstract: | A multivariate fractional Brownian motion (mfBm) with component-wise Hurst exponents is used to model and forecast realized volatility. We investigate the interplay between correlation coefficients and Hurst exponents and propose a novel estimation method for all model parameters, establishing consistency and asymptotic normality of the estimators. Additionally, we develop a time-reversibility test, which is typically not rejected by real volatility data. When the data-generating process is a time-reversible mfBm, we derive optimal forecasting formulae and analyze their properties. A key insight is that an mfBm with different Hurst exponents and non-zero correlations can reduce forecasting errors compared to a one-dimensional model. Consistent with optimal forecasting theory, out-of-sample forecasts using the time-reversible mfBm show improvements over univariate fBm, particularly when the estimated Hurst exponents differ significantly. Empirical results demonstrate that mfBm-based forecasts outperform the (vector) HAR model. |
Keywords: | Forecasting, Hurst exponent, multivariate fractional Brownian motion, realized volatility, rough volatility |
JEL: | C12 C58 |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:boa:wpaper:202528 |
By: | Rüttenauer, Tobias; Kapelle, Nicole (Trinity College Dublin) |
Abstract: | Panel data offer a valuable lens through which social science phenomena can be examined over time. With panel data, we can overcome some of the fundamental problems with conventional cross-sectional analyses by focusing on the within-unit changes rather than the differences between units. This chapter delves into the foundations, recent advancements, and critical issues associated with panel data analysis. The chapter illustrates the basic concepts of random effects (RE) and fixed effects (FE) estimators. Moving beyond the fundamentals, we provide an intuition for various recent developments and advances in the field of panel data methods, paying particular attention to the identification of time-varying treatment effects or impact functions. To illustrate practical application, we investigate how marriage influences sexual satisfaction. While married individuals report a higher sexual satisfaction than un-married respondents (between-comparison), individuals experience a decline in satisfaction after marriage compared to their pre-marital levels (within-comparison). |
Date: | 2024–05–01 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:3mfzq_v2 |
By: | Scott Kostyshak |
Abstract: | Critical bandwidth (CB) is used to test the multimodality of densities and regression functions, as well as for clustering methods. CB tests are known to be inconsistent if the function of interest is constant ("flat") over even a small interval, and to suffer from low power and incorrect size in finite samples if the function has a relatively small derivative over an interval. This paper proposes a solution, flatness-robust CB (FRCB), that exploits the novel observation that the inconsistency manifests only from regions consistent with the null hypothesis, and thus identifying and excluding them does not alter the null or alternative sets. I provide sufficient conditions for consistency of FRCB, and simulations of a test of regression monotonicity demonstrate the finite-sample properties of FRCB compared with CB for various regression functions. Surprisingly, FRCB performs better than CB in some cases where there are no flat regions, which can be explained by FRCB essentially giving more importance to parts of the function where there are larger violations of the null hypothesis. I illustrate the usefulness of FRCB with an empirical analysis of the monotonicity of the conditional mean function of radiocarbon age with respect to calendar age. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.03594 |
By: | Simon Hirsch |
Abstract: | Probabilistic electricity price forecasting (PEPF) is a key task for market participants in short-term electricity markets. The increasing availability of high-frequency data and the need for real-time decision-making in energy markets require online estimation methods for efficient model updating. We present an online, multivariate, regularized distributional regression model, allowing for the modeling of all distribution parameters conditional on explanatory variables. Our approach is based on the combination of the multivariate distributional regression and an efficient online learning algorithm based on online coordinate descent for LASSO-type regularization. Additionally, we propose to regularize the estimation along a path of increasingly complex dependence structures of the multivariate distribution, allowing for parsimonious estimation and early stopping. We validate our approach through one of the first forecasting studies focusing on multivariate probabilistic forecasting in the German day-ahead electricity market while using only online estimation methods. We compare our approach to online LASSO-ARX-models with adaptive marginal distribution and to online univariate distributional models combined with an adaptive Copula. We show that the multivariate distributional regression, which allows modeling all distribution parameters - including the mean and the dependence structure - conditional on explanatory variables such as renewable in-feed or past prices provide superior forecasting performance compared to modeling of the marginals only and keeping a static/unconditional dependence structure. Additionally, online estimation yields a speed-up by a factor of 80 to over 400 times compared to batch fitting. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.02518 |
By: | Jiafeng Chen |
Abstract: | This paper connects the literature on demand estimation to the literature on causal inference by interpreting nonparametric structural assumptions as restrictions on counterfactual outcomes. It offers nontrivial and equivalent restatements of key demand estimation assumptions in the Neyman-Rubin potential outcomes model, for both settings with market-level data (Berry and Haile, 2014) and settings with demographic-specific market shares (Berry and Haile, 2024). This exercise helps bridge the literatures on structural estimation and on causal inference by separating notational and linguistic differences from substantive ones. |
Date: | 2025–03 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2503.23524 |
By: | Dzemski, Andreas (Department of Economics, School of Business, Economics and Law, Göteborg University); Farago, Adam (Department of Economics, School of Business, Economics and Law, Göteborg University); Hjalmarsson, Erik (Department of Economics, School of Business, Economics and Law, Göteborg University); Kiss, Tamas (The School of Business, Örebro University, Sweden) |
Abstract: | We analyze empirical estimation of the distribution of total payoffs for stock investments over very long horizons, such as 30 years. Formal results for recently proposed bootstrap estimators are derived and alternative parametric methods are proposed. All estimators should be viewed as inconsistent for longer investment horizons. Valid confidence bands are derived and should be the focus when performing inference. Empirically, confidence bands around long-run distributions are very wide and point estimates must be interpreted with great caution. Consequently, it is difficult to distinguish long-run aggregate return distributions across countries; long-run U.S. returns are not significantly different from global returns. |
Keywords: | Estimation uncertainty; Long-run stock returns; Quantile estimation |
JEL: | C58 G10 |
Date: | 2025–04–28 |
URL: | https://d.repec.org/n?u=RePEc:hhs:gunwpe:0853 |
By: | Davide Luparello |
Abstract: | This technical note provides comprehensive derivations of fundamental equations in two-level nested and sequential logit models for analyzing hierarchical choice structures. We present derivations of the Berry (1994) inversion formula, nested inclusive values computation, and multi-level market share equations, complementing existing literature. While conceptually distinct, nested and sequential logit models share mathematical similarities and, under specific distributional assumptions, yield identical inversion formulas-offering valuable analytical insights. These notes serve as a practical reference for researchers implementing multi-level discrete choice models in empirical applications, particularly in industrial organization and demand estimation contexts. |
Date: | 2025–03 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2503.21808 |
By: | Claudia Kl\"uppelberg; Mario Krali |
Abstract: | We present a methodology for causal risk analysis in a network. Causal dependence is formulated by a max-linear structural equation model, which expresses each node variable as a max-linear function of its parental node variables in a directed acyclic graph and some exogenous innovation. We determine directed~paths~responsible~for extreme risk propagation in the network. We give algorithms for structure learning and parameter estimation and apply them to a network of financial data. |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2504.00523 |