nep-ecm New Economics Papers
on Econometrics
Issue of 2025–10–20
thirty-six papers chosen by
Sune Karlsson, Örebro universitet


  1. Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random By Lorenzo Testa; Edward H. Kennedy; Matthew Reimherr
  2. Identification and Semiparametric Estimation of Conditional Means from Aggregate Data By Cory McCartan; Shiro Kuriwaki
  3. Beyond the Average: Distributional Causal Inference under Imperfect Compliance By Undral Byambadalai; Tomu Hirata; Tatsushi Oka; Shota Yasui
  4. Optimal estimation for regression discontinuity design with binary outcomes By Takuya Ishihara; Masayuki Sawada; Kohei Yata
  5. Time-Varying Heterogeneous Treatment Effects in Event Studies By Irene Botosaru; Laura Liu
  6. Decomposing Co-Movements in Matrix-Valued Time Series: A Pseudo-Structural Reduced-Rank Approach By Alain Hecq; Ivan Ricardo; Ines Wilms
  7. Sensitivity Analysis for Treatment Effects in Difference-in-Differences Models using Riesz Representation By Philipp Bach; Sven Klaassen; Jannis Kueck; Mara Mattes; Martin Spindler
  8. Identifying treatment effects on categorical outcomes in IV models By Onil Boussim
  9. Denoised IPW-Lasso for Heterogeneous Treatment Effect Estimation in Randomized Experiments By Mingqian Guan; Komei Fujita; Naoya Sueishi; Shota Yasui
  10. Joint Inference for the Regression Discontinuity Effect and Its External Validity By Yuta Okamoto
  11. Generalized Covariance Estimator under Misspecification and Constraints By Aryan Manafi Neyazi
  12. Nonparametric and Semiparametric Estimation of Upward Rank Mobility Curves By Tsung-Chih Lai; Jia-Han Shih; Yi-Hau Chen
  13. Box Confidence Depth: Simulation-Based Inference with Hyper-Rectangles By Laura Ventura; Elena Bortolato
  14. A new Combined Bootstrap Method for Long-Memory Time Series By Luisa Bisaglia; Margherita Gerolimetto; Margherita Palomba
  15. Triadic Network Formation By Chris Muris; Cavit Pakel
  16. Risk of Predictive Distributions and Bayesian Model Comparison of Misspecified Models By Yong Li; Zhou Wu; Jun Yu; Tao Zeng
  17. Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation By Masahiro Kato
  18. Boundary estimation in the regression-discontinuity design: Evidence for a merit- and need-based financial aid program By Eugenio Felipe Merlano
  19. Regression Model Selection Under General Conditions By Amaze Lusompa
  20. Lags, Leave-Outs and Fixed Effects By Alexander Chudik; Cameron M. Ellis; Johannes G. Jaspersen
  21. Overidentification testing with weak instruments and heteroskedasticity By Stuart Lane; Frank Windmeijer
  22. Inference on the Distribution of Individual Treatment Effects in Nonseparable Triangular Models By Jun Ma; Vadim Marmer; Zhengfei Yu
  23. Roughness Analysis of Realized Volatility and VIX through Randomized Kolmogorov-Smirnov Distribution By Sergio Bianchi; Daniele Angelini
  24. Evaluating Policy Effects under Network Interference without Network Information: A Transfer Learning Approach By Tadao Hoshino
  25. Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression By Haodong Liang; Yanhao Jin; Krishnakumar Balasubramanian; Lifeng Lai
  26. The Pitfalls of Continuous Heavy-Tailed Distributions in High-Frequency Data Analysis By Vladim\'ir Hol\'y
  27. An Information-Theoretic Approach to Partially Identified Problems By Amos Golan; Jeffrey Perloff
  28. Sensitivity Analysis for Causal ML: A Use Case at Booking.com By Philipp Bach; Victor Chernozhukov; Carlos Cinelli; Lin Jia; Sven Klaassen; Nils Skotara; Martin Spindler
  29. Evaluating efficiency gains in the Linear Probability Model By Tomás Pacheco
  30. Identification and Estimation of Seller Risk Aversion in Ascending Auctions By Nathalie Gimenes; Tonghui Qi; Sorawoot Srisuma
  31. Compositional difference-in-differences for categorical outcomes By Onil Boussim
  32. Debiased Kernel Estimation of Spot Volatility in the Presence of Infinite Variation Jumps By B. Cooper Boniece; Jos\'e E. Figueroa-L\'opez; Tianwei Zhou
  33. Noise estimation of SDE from a single data trajectory By Munawar Ali; Purba Das; Qi Feng; Liyao Gao; Guang Lin
  34. Recidivism and Peer Influence with LLM Text Embeddings in Low Security Correctional Facilities By Shanjukta Nath; Jiwon Hong; Jae Ho Chang; Keith Warren; Subhadeep Paul
  35. Identifying and Quantifying Financial Bubbles with the Hyped Log-Periodic Power Law Model By Zheng Cao; Xingran Shao; Yuheng Yan; Helyette Geman
  36. General CoVaR Based on Entropy Pooling By Yuhong Xu; Xinyao Zhao

  1. By: Lorenzo Testa; Edward H. Kennedy; Matthew Reimherr
    Abstract: The Difference-in-Differences (DiD) method is a fundamental tool for causal inference, yet its application is often complicated by missing data. Although recent work has developed robust DiD estimators for complex settings like staggered treatment adoption, these methods typically assume complete data and fail to address the critical challenge of outcomes that are missing at random (MAR) -- a common problem that invalidates standard estimators. We develop a rigorous framework, rooted in semiparametric theory, for identifying and efficiently estimating the Average Treatment Effect on the Treated (ATT) when either pre- or post-treatment (or both) outcomes are missing at random. We first establish nonparametric identification of the ATT under two minimal sets of sufficient conditions. For each, we derive the semiparametric efficiency bound, which provides a formal benchmark for asymptotic optimality. We then propose novel estimators that are asymptotically efficient, achieving this theoretical bound. A key feature of our estimators is their multiple robustness, which ensures consistency even if some nuisance function models are misspecified. We validate the properties of our estimators and showcase their broad applicability through an extensive simulation study.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.25009
  2. By: Cory McCartan; Shiro Kuriwaki
    Abstract: We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification, which motivates estimators that can efficiently control for many covariates. We propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator also admits a semiparametric sensitivity analysis for violations of the key identifying assumption, as well as asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.20194
  3. By: Undral Byambadalai; Tomu Hirata; Tatsushi Oka; Shota Yasui
    Abstract: We study the estimation of distributional treatment effects in randomized experiments with imperfect compliance. When participants do not adhere to their assigned treatments, we leverage treatment assignment as an instrumental variable to identify the local distributional treatment effect-the difference in outcome distributions between treatment and control groups for the subpopulation of compliers. We propose a regression-adjusted estimator based on a distribution regression framework with Neyman-orthogonal moment conditions, enabling robustness and flexibility with high-dimensional covariates. Our approach accommodates continuous, discrete, and mixed discrete-continuous outcomes, and applies under a broad class of covariate-adaptive randomization schemes, including stratified block designs and simple random sampling. We derive the estimator's asymptotic distribution and show that it achieves the semiparametric efficiency bound. Simulation results demonstrate favorable finite-sample performance, and we demonstrate the method's practical relevance in an application to the Oregon Health Insurance Experiment.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.15594
  4. By: Takuya Ishihara; Masayuki Sawada; Kohei Yata
    Abstract: We develop a finite-sample optimal estimator for regression discontinuity designs when the outcomes are bounded, including binary outcomes as the leading case. Our finite-sample optimal estimator achieves the exact minimax mean squared error among linear shrinkage estimators with nonnegative weights when the regression function of a bounded outcome lies in a Lipschitz class. Although the original minimax problem involves an iterating (n+1)-dimensional non-convex optimization problem where n is the sample size, we show that our estimator is obtained by solving a convex optimization problem. A key advantage of our estimator is that the Lipschitz constant is the only tuning parameter. We also propose a uniformly valid inference procedure without a large-sample approximation. In a simulation exercise for small samples, our estimator exhibits smaller mean squared errors and shorter confidence intervals than conventional large-sample techniques which may be unreliable when the effective sample size is small. We apply our method to an empirical multi-cutoff design where the sample size for each cutoff is small. In the application, our method yields informative confidence intervals, in contrast to the leading large-sample approach.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.18857
  5. By: Irene Botosaru; Laura Liu
    Abstract: This paper examines the identification and estimation of heterogeneous treatment effects in event studies, emphasizing the importance of both lagged dependent variables and treatment effect heterogeneity. We show that omitting lagged dependent variables can induce omitted variable bias in the estimated time-varying treatment effects. We develop a novel semiparametric approach based on a short-T dynamic linear panel model with correlated random coefficients, where the time-varying heterogeneous treatment effects can be modeled by a time-series process to reduce dimensionality. We construct a two-step estimator employing quasi-maximum likelihood for common parameters and empirical Bayes for the heterogeneous treatment effects. The procedure is flexible, easy to implement, and achieves ratio optimality asymptotically. Our results also provide insights into common assumptions in the event study literature, such as no anticipation, homogeneous treatment effects across treatment timing cohorts, and state dependence structure.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.13698
  6. By: Alain Hecq; Ivan Ricardo; Ines Wilms
    Abstract: We propose a pseudo-structural framework for analyzing contemporaneous co-movements in reduced-rank matrix autoregressive (RRMAR) models. Unlike conventional vector-autoregressive (VAR) models that would discard the matrix structure, our formulation preserves it, enabling a decomposition of co-movements into three interpretable components: row-specific, column-specific, and joint (row-column) interactions across the matrix-valued time series. Our estimator admits standard asymptotic inference and we propose a BIC-type criterion for the joint selection of the reduced ranks and the autoregressive lag order. We validate the method's finite-sample performance in terms of estimation accuracy, coverage and rank selection in simulation experiments, including cases of rank misspecification. We illustrate the method's practical usefelness in identifying co-movement structures in two empirical applications: U.S. state-level coincident and leading indicators, and cross-country macroeconomic indicators.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.19911
  7. By: Philipp Bach; Sven Klaassen; Jannis Kueck; Mara Mattes; Martin Spindler
    Abstract: Difference-in-differences (DiD) is one of the most popular approaches for empirical research in economics, political science, and beyond. Identification in these models is based on the conditional parallel trends assumption: In the absence of treatment, the average outcome of the treated and untreated group are assumed to evolve in parallel over time, conditional on pre-treatment covariates. We introduce a novel approach to sensitivity analysis for DiD models that assesses the robustness of DiD estimates to violations of this assumption due to unobservable confounders, allowing researchers to transparently assess and communicate the credibility of their causal estimation results. Our method focuses on estimation by Double Machine Learning and extends previous work on sensitivity analysis based on Riesz Representation in cross-sectional settings. We establish asymptotic bounds for point estimates and confidence intervals in the canonical $2\times2$ setting and group-time causal parameters in settings with staggered treatment adoption. Our approach makes it possible to relate the formulation of parallel trends violation to empirical evidence from (1) pre-testing, (2) covariate benchmarking and (3) standard reporting statistics and visualizations. We provide extensive simulation experiments demonstrating the validity of our sensitivity approach and diagnostics and apply our approach to two empirical applications.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.09064
  8. By: Onil Boussim
    Abstract: This paper provides a nonparametric framework for causal inference with categorical outcomes under binary treatment and binary instrument settings. We decompose the observed joint probability of outcomes and treatment into marginal probabilities of potential outcomes and treatment, and association parameters that capture selection bias due to unobserved heterogeneity. Under a novel identifying assumption, association similarity, which requires the dependence between unobserved factors and potential outcomes to be invariant across treatment states, we achieve point identification of the full distribution of potential outcomes. Recognizing that this assumption may be strong in some contexts, we propose two weaker alternatives: monotonic association, which restricts the direction of selection heterogeneity, and bounded association, which constrains its magnitude. These relaxed assumptions deliver sharp partial identification bounds that nest point identification as a special case and facilitate transparent sensitivity analysis. We illustrate the framework in an empirical application, estimating the causal effect of private health insurance on health outcomes.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.10946
  9. By: Mingqian Guan; Komei Fujita; Naoya Sueishi; Shota Yasui
    Abstract: This paper proposes a new method for estimating conditional average treatment effects (CATE) in randomized experiments. We adopt inverse probability weighting (IPW) for identification; however, IPW-transformed outcomes are known to be noisy, even when true propensity scores are used. To address this issue, we introduce a noise reduction procedure and estimate a linear CATE model using Lasso, achieving both accuracy and interpretability. We theoretically show that denoising reduces the prediction error of the Lasso. The method is particularly effective when treatment effects are small relative to the variability of outcomes, which is often the case in empirical applications. Applications to the Get-Out-the-Vote dataset and Criteo Uplift Modeling dataset demonstrate that our method outperforms fully nonparametric machine learning methods in identifying individuals with higher treatment effects. Moreover, our method uncovers informative heterogeneity patterns that are consistent with previous empirical findings.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.10527
  10. By: Yuta Okamoto
    Abstract: The external validity of regression discontinuity (RD) designs is essential for informing policy and remains an active research area in econometrics and statistics. However, we document that only a limited number of empirical studies explicitly address the external validity of standard RD effects. To advance empirical practice, we propose a simple joint inference procedure for the RD effect and its local external validity, building on Calonico, Cattaneo, and Titiunik (2014, Econometrica) and Dong and Lewbel (2015, Review of Economics and Statistics). We further introduce a locally linear treatment effects assumption, which enhances the interpretability of the treatment effect derivative proposed by Dong and Lewbel. Under this assumption, we establish identification and derive a uniform confidence band for the extrapolated treatment effects. Our approaches require no additional covariates or design features, making them applicable to virtually all RD settings and thereby enhancing the policy relevance of many empirical RD studies. The usefulness of the method is demonstrated through an empirical application, highlighting its complementarity to existing approaches.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.26380
  11. By: Aryan Manafi Neyazi
    Abstract: This paper investigates the properties of the Generalized Covariance (GCov) estimator under misspecification and constraints with application to processes with local explosive patterns, such as causal-noncausal and double autoregressive (DAR) processes. We show that GCov is consistent and has an asymptotically Normal distribution under misspecification. Then, we construct GCov-based Wald-type and score-type tests to test one specification against the other, all of which follow a $\chi^2$ distribution. Furthermore, we propose the constrained GCov (CGCov) estimator, which extends the use of the GCov estimator to a broader range of models with constraints on their parameters. We investigate the asymptotic distribution of the CGCov estimator when the true parameters are far from the boundary and on the boundary of the parameter space. We validate the finite sample performance of the proposed estimators and tests in the context of causal-noncausal and DAR models. Finally, we provide two empirical applications by applying the noncausal model to the final energy demand commodity index and also the DAR model to the US 3-month treasury bill.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.13492
  12. By: Tsung-Chih Lai; Jia-Han Shih; Yi-Hau Chen
    Abstract: We introduce the upward rank mobility curve as a new measure of intergenerational mobility that captures upward movements across the entire parental income distribution. Our approach extends Bhattacharya and Mazumder (2011) by conditioning on a single parental income rank, thereby eliminating aggregation bias. We show that the measure can be characterized solely by the copula of parent and child income, and we propose a nonparametric copula-based estimator with better properties than kernel-based alternatives. For a conditional version of the measure without such a representation, we develop a two-step semiparametric estimator based on distribution regression and establish its asymptotic properties. An application to U.S. data reveals that whites exhibit significant upward mobility dominance over blacks among lower-middle-income families.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.23174
  13. By: Laura Ventura; Elena Bortolato
    Abstract: This work presents a novel simulation-based approach for constructing confidence regions in parametric models, which is particularly suited for generative models and situations where limited data and conventional asymptotic approximations fail to provide accurate results. The method leverages the concept of data depth and depends on creating random hyper-rectangles, i.e. boxes, in the sample space generated through simulations from the model, varying the input parameters. A probabilistic acceptance rule allows to retrieve a Depth-Confidence Distribution for the model parameters from which point estimators as well as calibrated confidence sets can be read-off. The method is designed to address cases where both the parameters and test statistics are multivariate.
    Keywords: Confidence regions, depth functions, Monte Carlo methods, order statistics, simulation-based methods
    JEL: C12 C13 C15
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:bge:wpaper:1518
  14. By: Luisa Bisaglia (University of Padua); Margherita Gerolimetto (Ca’ Foscari University of Venice); Margherita Palomba (University of Padua)
    Abstract: This paper introduces a novel combined bootstrap methodology for the analysis of stationary long-memory time series, addressing the challenges posed by their persistent dependence structures. Unlike existing hybrid approaches that merge algorithms at the procedural level, our method combines independently generated bootstrap samples from a variety of established techniques, including parametric, semi-parametric, and block-based methods, into a unified composite sample. This integration is performed using both simple (mean, median, trimmed mean) and performance-based (correlation, MSE, MAE, regression-based) combination schemes. Through extensive Monte Carlo simulations and empirical applications to the Nile River minima and Microsoft stock returns, we show that the combined bootstrap approach yields improved estimation accuracy for the long-memory parameter d, particularly in terms of root mean squared deviation and confidence interval coverage. The proposed method is shown to mitigate model misspecification risk and improve inference robustness. While our focus is on estimating the long-memory parameter, the approach is general and can be extended to other statistics and dependence structures. This work offers a new perspective on bootstrap methodology and opens avenues for future theoretical and practical advancements.
    Keywords: Bootstrap, Long-memory time series, Pre-filtering, Combinations
    JEL: C22 C15 C13
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:ven:wpaper:2025:19
  15. By: Chris Muris; Cavit Pakel
    Abstract: We study estimation and inference for triadic link formation with dyad-level fixed effects in a nonlinear binary choice logit framework. Dyad-level effects provide a richer and more realistic representation of heterogeneity across pairs of dimensions (e.g. importer-exporter, importer-product, exporter-product), yet their sheer number creates a severe incidental parameter problem. We propose a novel ``hexad logit'' estimator and establish its consistency and asymptotic normality. Identification is achieved through a conditional likelihood approach that eliminates the fixed effects by conditioning on sufficient statistics, in the form of hexads -- wirings that involve two nodes from each part of the network. Our central finding is that dyad-level heterogeneity fundamentally changes how information accumulates. Unlike under node-level heterogeneity, where informative wirings automatically grow with link formation, under dyad-level heterogeneity the network may generate infinitely many links yet asymptotically zero informative wirings. We derive explicit sparsity thresholds that determine when consistency holds and when asymptotic normality is attainable. These results have important practical implications, as they reveal that there is a limit to how granular or disaggregate a dataset one can employ under dyad-level heterogeneity.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.26420
  16. By: Yong Li (School of Economics, Renmin University of China); Zhou Wu (School of Economics, Zhejiang University); Jun Yu (Faculty of Business Administration, University of Macau); Tao Zeng (School of Economics, Zhejiang University)
    Abstract: TMüller (2013, Econometrica, 81(5), 1805-1949) shows that Bayesian inference of parameters of interest in a misspecified model can reduce the asymptotic frequentist risk when the standard posterior is replaced with the sandwich posterior. In this paper, we extend the results in Müller (2013) to Bayesian model comparison. Bayesian model comparison of potentially misspecified models can be conducted in a predictive framework with three alternative predictive distributions, namely, the plug-in predictive distribution, the standard posterior predictive distribution, and the sandwich posterior predictive distribution of Müller (2013). Via the Kullback-Leibler (KL) loss function, it is shown that the sandwich posterior predictive distribution yields a lower asymptotic risk than the standard posterior predictive distribution. Moreover, we provide sufficient conditions under which the sandwich posterior predictive distribution yields a lower asymptotic risk than the plug-in predictive distribution. We then propose two new Bayesian penalized information criteria based on the last two predictive distributions to compare misspecified models and establish their relationship with some existing information criteria. The proposed new information criteria are illustrated in several empirical studies.
    Keywords: AIC, DIC, Information criterion, Model misspecification, Sandwich posterior.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:boa:wpaper:202536
  17. By: Masahiro Kato
    Abstract: This study considers the estimation of the average treatment effect (ATE). For ATE estimation, we estimate the propensity score through direct bias-correction term estimation. Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the observations, where $X_i \in \mathbb{R}^p$ denotes $p$-dimensional covariates, $D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i \in \mathbb{R}$ is an outcome. In ATE estimation, the bias-correction term $h_0(X_i, D_i) = \frac{1[D_i = 1]}{e_0(X_i)} - \frac{1[D_i = 0]}{1 - e_0(X_i)}$ plays an important role, where $e_0(X_i)$ is the propensity score, the probability of being assigned treatment $1$. In this study, we propose estimating $h_0$ (or equivalently the propensity score $e_0$) by directly minimizing the prediction error of $h_0$. Since the bias-correction term $h_0$ is essential for ATE estimation, this direct approach is expected to improve estimation accuracy for the ATE. For example, existing studies often employ maximum likelihood or covariate balancing to estimate $e_0$, but these approaches may not be optimal for accurately estimating $h_0$ or the ATE. We present a general framework for this direct bias-correction term estimation approach from the perspective of Bregman divergence minimization and conduct simulation studies to evaluate the effectiveness of the proposed method.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.22122
  18. By: Eugenio Felipe Merlano
    Abstract: In the conventional regression-discontinuity (RD) design, the probability that units receive a treatment changes discontinuously as a function of one covariate exceeding a threshold or cutoff point. This paper studies an extended RD design where assignment rules simultaneously involve two or more continuous covariates. We show that assignment rules with more than one variable allow the estimation of a more comprehensive set of treatment effects, relaxing in a research-driven style the local and sometimes limiting nature of univariate RD designs. We then propose a flexible nonparametric approach to estimate the multidimensional discontinuity by univariate local linear regression and compare its performance to existing methods. We present an empirical application to a large-scale and countrywide financial aid program for low-income students in Colombia. The program uses a merit-based (academic achievement) and need-based (wealth index) assignment rule to select students for the program. We show that our estimation strategy fully exploits the multidimensional assignment rule and reveals heterogeneous effects along the treatment boundaries.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.09257
  19. By: Amaze Lusompa
    Abstract: Model selection criteria are one of the most important tools in statistics. Proofs showing a model selection criterion is asymptotically optimal are tailored to the type of model (linear regression, quantile regression, penalized regression, etc.), the estimation method (linear smoothers, maximum likelihood, generalized method of moments, etc.), the type of data (i.i.d., dependent, high dimensional, etc.), and the type of model selection criterion. Moreover, assumptions are often restrictive and unrealistic making it a slow and winding process for researchers to determine if a model selection criterion is selecting an optimal model. This paper provides general proofs showing asymptotic optimality for a wide range of model selection criteria under general conditions. This paper not only asymptotically justifies model selection criteria for most situations, but it also unifies and extends a range of previously disparate results.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.14822
  20. By: Alexander Chudik; Cameron M. Ellis; Johannes G. Jaspersen
    Abstract: To avoid endogeneity, financial economists often construct regressors and/or instruments using values from other observations, with lagged and leave-out variables being common examples. We examine the use of such variables in common settings with fixed effects and show that it can induce bias and distort inference. We illustrate the severity of this problem via simulations and with patent examiner data. Even when scrambling the patent examiners, thus removing any instrument validity, the bias leads to a first-stage F-statistic over 1, 000. General and case-specific solutions are provided.
    Keywords: lagged regressors; leave-out instruments; fixed effects; weak exogeneity bias; patents
    JEL: C13 C36 D22 K0
    Date: 2025–09–23
    URL: https://d.repec.org/n?u=RePEc:fip:feddwp:101895
  21. By: Stuart Lane; Frank Windmeijer
    Abstract: Exogeneity is key for IV estimators, which can assessed via overidentification (OID) tests. We discuss the Kleibergen-Paap (KP) rank test as a heteroskedasticity-robust OID test and compare to the typical J-test. We derive the heteroskedastic weak-instrument limiting distributions for J and KP as special cases of the robust score test estimated via 2SLS and LIML respectively. Monte Carlo simulations show that KP usually performs better than J, which is prone to severe size distortions. Test size depends on model parameters not consistently estimable with weak instruments, so a conservative approach is recommended. This generalises recommendations to use LIML-based OID tests under homoskedasticity. We then revisit the classic problem of estimating the elasticity of intertemporal substitution (EIS) in lifecycle consumption models. Lagged macroeconomic indicators should provide naturally valid but frequently weak instruments. The literature provides a wide range of estimates for this parameter, and J frequently rejects the null of valid instruments. J often rejects the null whereas KP does not; we suggest that J over-rejects, sometimes severely. We argue that KP-test should be used over the J-test. We also argue that instrument invalidity/misspecification is unlikely the cause of the range of EIS estimates in the literature.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.21096
  22. By: Jun Ma; Vadim Marmer; Zhengfei Yu
    Abstract: In this paper, we develop inference methods for the distribution of heterogeneous individual treatment effects (ITEs) in the nonseparable triangular model with a binary endogenous treatment and a binary instrument of Vuong and Xu (2017) and Feng, Vuong, and Xu (2019). We focus on the estimation of the cumulative distribution function (CDF) of the ITE, which can be used to address a wide range of practically important questions such as inference on the proportion of individuals with positive ITEs, the quantiles of the distribution of ITEs, and the interquartile range as a measure of the spread of the ITEs, as well as comparison of the ITE distributions across sub-populations. Moreover, our CDF-based approach can deliver more precise results than density-based approach previously considered in the literature. We establish weak convergence to tight Gaussian processes for the empirical CDF and quantile function computed from nonparametric ITE estimates of Feng, Vuong, and Xu (2019). Using those results, we develop bootstrap-based nonparametric inferential methods, including uniform confidence bands for the CDF and quantile function of the ITE distribution.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.15401
  23. By: Sergio Bianchi; Daniele Angelini
    Abstract: We introduce a novel distribution-based estimator for the Hurst parameter of log-volatility, leveraging the Kolmogorov-Smirnov statistic to assess the scaling behavior of entire distributions rather than individual moments. To address the temporal dependence of financial volatility, we propose a random permutation procedure that effectively removes serial correlation while preserving marginal distributions, enabling the rigorous application of the KS framework to dependent data. We establish the asymptotic variance of the estimator, useful for inference and confidence interval construction. From a computational standpoint, we show that derivative-free optimization methods, particularly Brent's method and the Nelder-Mead simplex, achieve substantial efficiency gains relative to grid search while maintaining estimation accuracy. Empirical analysis of the CBOE VIX index and the 5-minute realized volatility of the S&P 500 reveals a statistically significant hierarchy of roughness, with implied volatility smoother than realized volatility. Both measures, however, exhibit Hurst exponents well below one-half, reinforcing the rough volatility paradigm and highlighting the open challenge of disentangling local roughness from long-memory effects in fractional modeling.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.20015
  24. By: Tadao Hoshino
    Abstract: This paper develops a sensitivity analysis framework that transfers the average total treatment effect (ATTE) from source data with a fully observed network to target data whose network is completely unknown. The ATTE represents the average social impact of a policy that assigns the treatment to every individual in the dataset. We postulate a covariate-shift type assumption that both source and target datasets share the same conditional mean outcome. However, because the target network is unobserved, this assumption alone is not sufficient to pin down the ATTE for the target data. To address this issue, we consider a sensitivity analysis based on the uncertainty of the target network's degree distribution, where the extent of uncertainty is measured by the Wasserstein distance from a given reference degree distribution. We then construct bounds on the target ATTE using a linear programming-based estimator. The limiting distribution of the bound estimator is derived via the functional delta method, and we develop a wild bootstrap approach to approximate the distribution. As an empirical illustration, we revisit the social network experiment on farmers' weather insurance adoption in China by Cai et al. (2015).
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.14415
  25. By: Haodong Liang; Yanhao Jin; Krishnakumar Balasubramanian; Lifeng Lai
    Abstract: We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing algorithms that are both statistically efficient and differentially private. We propose a noisy two-state gradient descent algorithm that ensures $\rho$-zero-concentrated differential privacy by injecting carefully calibrated noise into the gradient updates. Our analysis establishes finite-sample convergence rates for the proposed method, showing that the algorithm achieves consistency while preserving privacy. In particular, we derive precise bounds quantifying the trade-off among privacy parameters, sample size, and iteration-complexity. To the best of our knowledge, this is the first work to provide both privacy guarantees and provable convergence rates for instrumental variable regression in linear models. We further validate our theoretical findings with experiments on both synthetic and real datasets, demonstrating that our method offers practical accuracy-privacy trade-offs.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.22794
  26. By: Vladim\'ir Hol\'y
    Abstract: We address the challenges of modeling high-frequency integer price changes in financial markets using continuous distributions, particularly the Student's t-distribution. We demonstrate that traditional GARCH models, which rely on continuous distributions, are ill-suited for high-frequency data due to the discreteness of price changes. We propose a modification to the maximum likelihood estimation procedure that accounts for the discrete nature of observations while still using continuous distributions. Our approach involves modeling the log-likelihood in terms of intervals corresponding to the rounding of continuous price changes to the nearest integer. The findings highlight the importance of adjusting for discreteness in volatility analysis and provide a framework for incroporating any continuous distribution for modeling high-frequency prices.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.09785
  27. By: Amos Golan (American University and Santa Fe Institute); Jeffrey Perloff (University of California, Berkeley)
    Abstract: An information-theoretic maximum entropy (ME) model provides an alternative approach to finding solutions to partially identified models. In these models, we can identify only a solution set rather than point-identifying the parameters of interest, given our limited information. Manski (2021) proposed using statistical decision functions in general, and the minimax-regret (MMR) criterion in particular, to choose a unique solution. Using Manski's simulations for a missing data and a treatment problem, including an empirical example, we show that ME performs the same or better than MMR. In additional simulations, ME dominates various other statistical decision functions. ME has an axiomatic underpinning and is computationally efficient.
    Keywords: information theory, maximum entropy, minimax regret, statistical decision function
    JEL: D81 C15 C44
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:hka:wpaper:20205-009
  28. By: Philipp Bach; Victor Chernozhukov; Carlos Cinelli; Lin Jia; Sven Klaassen; Nils Skotara; Martin Spindler
    Abstract: Causal Machine Learning has emerged as a powerful tool for flexibly estimating causal effects from observational data in both industry and academia. However, causal inference from observational data relies on untestable assumptions about the data-generating process, such as the absence of unobserved confounders. When these assumptions are violated, causal effect estimates may become biased, undermining the validity of research findings. In these contexts, sensitivity analysis plays a crucial role, by enabling data scientists to assess the robustness of their findings to plausible violations of unconfoundedness. This paper introduces sensitivity analysis and demonstrates its practical relevance through a (simulated) data example based on a use case at Booking.com. We focus our presentation on a recently proposed method by Chernozhukov et al. (2023), which derives general non-parametric bounds on biases due to omitted variables, and is fully compatible with (though not limited to) modern inferential tools of Causal Machine Learning. By presenting this use case, we aim to raise awareness of sensitivity analysis and highlight its importance in real-world scenarios.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.09109
  29. By: Tomás Pacheco (Department of Economics, Universidad de San Andrés)
    Abstract: This paper evaluates the efficiency gains of the Adaptive Least Squares (ALS) estimator proposed by Romano and Wolf (2017) in the context of Linear Probability Models (LPM), where heteroskedasticity is inherent to the model. Using empirical applications and Monte Carlo simulations, we compare ALS to OLS and Probit estimators under three strategies for handling predicted probabilities outside the (0, 1) interval: bounding, sigmoid transformation, and trimming. The results show that efficiency gains from ALS are not systematic and depend on the correction method, with the bounding approach yielding the most substantial improvements.
    Keywords: efficiency; linear probability model; weighted least squares
    JEL: C01 C12 C50
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:sad:ypaper:18
  30. By: Nathalie Gimenes; Tonghui Qi; Sorawoot Srisuma
    Abstract: How sellers choose reserve prices is central to auction theory, and the optimal reserve price depends on the seller's risk attitude. Numerous studies have found that observed reserve prices lie below the optimal level implied by risk-neutral sellers, while the theoretical literature suggests that risk-averse sellers can rationalize these empirical findings. In this paper, we develop an econometric model of ascending auctions with a risk-averse seller under independent private values. We provide primitive conditions for the identification of the Arrow-Pratt measures of risk aversion and an estimator for these measures that is consistent and converges in distribution to a normal distribution at the parametric rate under standard regularity conditions. A Monte Carlo study demonstrates good finite-sample performance of the estimator, and we illustrate the approach using data from foreclosure real estate auctions in S\~{a}o Paulo.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.19945
  31. By: Onil Boussim
    Abstract: In difference-in-differences (DiD) settings with categorical outcomes, treatment effects often operate on both total quantities (e.g., voter turnout) and category shares (e.g., vote distribution across parties). In this context, linear DiD models can be problematic: they suffer from scale dependence, may produce negative counterfactual quantities, and are inconsistent with discrete choice theory. We propose compositional DiD (CoDiD), a new method that identifies counterfactual categorical quantities, and thus total levels and shares, under a parallel growths assumption. The assumption states that, absent treatment, each category's size grows or shrinks at the same proportional rate in treated and control groups. In a random utility framework, we show that this implies parallel evolution of relative preferences between any pair of categories. Analytically, we show that it also means the shares are reallocated in the same way in both groups in the absence of treatment. Finally, geometrically, it corresponds to parallel trajectories (or movements) of probability mass functions of the two groups in the probability simplex under Aitchison geometry. We extend CoDiD to i) derive bounds under relaxed assumptions, ii) handle staggered adoption, and iii) propose a synthetic DiD analog. We illustrate the method's empirical relevance through two applications: first, we examine how early voting reforms affect voter choice in U.S. presidential elections; second, we analyze how the Regional Greenhouse Gas Initiative (RGGI) affected the composition of electricity generation across sources such as coal, natural gas, nuclear, and renewables.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.11659
  32. By: B. Cooper Boniece; Jos\'e E. Figueroa-L\'opez; Tianwei Zhou
    Abstract: Volatility estimation is a central problem in financial econometrics, but becomes particularly challenging when jump activity is high, a phenomenon observed empirically in highly traded financial securities. In this paper, we revisit the problem of spot volatility estimation for an It\^o semimartingale with jumps of unbounded variation. We construct truncated kernel-based estimators and debiased variants that extend the efficiency frontier for spot volatility estimation in terms of the jump activity index $Y$, raising the previous bound $Y
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.14285
  33. By: Munawar Ali; Purba Das; Qi Feng; Liyao Gao; Guang Lin
    Abstract: In this paper, we propose a data-driven framework for model discovery of stochastic differential equations (SDEs) from a single trajectory, without requiring the ergodicity or stationary assumption on the underlying continuous process. By combining (stochastic) Taylor expansions with Girsanov transformations, and using the drift function's initial value as input, we construct drift estimators while simultaneously recovering the model noise. This allows us to recover the underlying $\mathbb P$ Brownian motion increments. Building on these estimators, we introduce the first stochastic Sparse Identification of Stochastic Differential Equation (SSISDE) algorithm, capable of identifying the governing SDE dynamics from a single observed trajectory without requiring ergodicity or stationarity. To validate the proposed approach, we conduct numerical experiments with both linear and quadratic drift-diffusion functions. Among these, the Black-Scholes SDE is included as a representative case of a system that does not satisfy ergodicity or stationarity.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.25484
  34. By: Shanjukta Nath; Jiwon Hong; Jae Ho Chang; Keith Warren; Subhadeep Paul
    Abstract: We find AI embeddings obtained using a pre-trained transformer-based Large Language Model (LLM) of 80, 000-120, 000 written affirmations and correction exchanges among residents in low-security correctional facilities to be highly predictive of recidivism. The prediction accuracy is 30\% higher with embedding vectors than with only pre-entry covariates. However, since the text embedding vectors are high-dimensional, we perform Zero-Shot classification of these texts to a low-dimensional vector of user-defined classes to aid interpretation while retaining the predictive power. To shed light on the social dynamics inside the correctional facilities, we estimate peer effects in these LLM-generated numerical representations of language with a multivariate peer effect model, adjusting for network endogeneity. We develop new methodology and theory for peer effect estimation that accommodate sparse networks, multivariate latent variables, and correlated multivariate outcomes. With these new methods, we find significant peer effects in language usage for interaction and feedback.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.20634
  35. By: Zheng Cao; Xingran Shao; Yuheng Yan; Helyette Geman
    Abstract: We propose a novel model, the Hyped Log-Periodic Power Law Model (HLPPL), to the problem of quantifying and detecting financial bubbles, an ever-fascinating one for academics and practitioners alike. Bubble labels are generated using a Log-Periodic Power Law (LPPL) model, sentiment scores, and a hype index we introduced in previous research on NLP forecasting of stock return volatility. Using these tools, a dual-stream transformer model is trained with market data and machine learning methods, resulting in a time series of confidence scores as a Bubble Score. A distinctive feature of our framework is that it captures phases of extreme overpricing and underpricing within a unified structure. We achieve an average yield of 34.13 percentage annualized return when backtesting U.S. equities during the period 2018 to 2024, while the approach exhibits a remarkable generalization ability across industry sectors. Its conservative bias in predicting bubble periods minimizes false positives, a feature which is especially beneficial for market signaling and decision-making. Overall, this approach utilizes both theoretical and empirical advances for real-time positive and negative bubble identification and measurement with HLPPL signals.
    Date: 2025–10
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2510.10878
  36. By: Yuhong Xu; Xinyao Zhao
    Abstract: We propose a general CoVaR framework that extends the traditional CoVaR by incorporating diverse expert views and information, such as asset moment characteristics, quantile insights, and perspectives on the relative loss distribution between two assets. To integrate these expert views effectively while minimizing deviations from the prior distribution, we employ the entropy pooling method to derive the posterior distribution, which in turn enables us to compute the general CoVaR. Assuming bivariate normal distributions, we derive its analytical expressions under various perspectives. Sensitivity analysis reveals that CoVaR exhibits a linear relationship with both the expectations of the variables in the views and the differences in expectations between them. In contrast, CoVaR shows nonlinear dependencies with respect to the variance, quantiles, and correlation within these views. Empirical analysis of the US banking system during the Federal Reserve's interest rate hikes demonstrates the effectiveness of the general CoVaR when expert views are appropriately specified. Furthermore, we extend this framework to the general $\Delta$CoVaR, which allows for the assessment of risk spillover effects from various perspectives.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.21904

This nep-ecm issue is ©2025 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.