nep-ecm 2025-09-15 papers

on Econometrics

Issue of 2025–09–15
thirty-two papers chosen by
Sune Karlsson, Örebro universitet

Efficient Difference-in-Differences and Event Study Estimators By Xiaohong Chen; Pedro H. C. Sant'Anna; Haitian Xie
Estimation in linear models with clustered data By Anna Mikusheva; Mikkel S{\o}lvsten; Baiyun Jing
Bayesian Double Machine Learning for Causal Inference By Francis J. DiTraglia; Laura Liu
Uniform Quasi ML based inference for the panel AR(1) model By Hugo Kruiniger
Partial Identification of Causal Effects for Endogenous Continuous Treatments By Abhinandan Dalal; Eric J. Tchetgen Tchetgen
Estimation of Non-Gaussian SVAR Using Tensor Singular Value Decomposition By Alain Guay; Dalibor Stevanovic
Doubly robust estimation of causal effects for random object outcomes with continuous treatments By Satarupa Bhattacharjee; Bing Li; Xiao Wu; Lingzhou Xue
Inference on Partially Identified Parameters with Separable Nuisance Parameters: a Two-Stage Method By Xunkang Tian
The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters By Shosei Sakaguchi
Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power By Daniel Molitor; Samantha Gold
The exact distribution of the conditional likelihood-ratio test in instrumental variables regression By Malte Londschien
Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study By Harrison Katz; Robert E. Weiss
A Nonparametric Approach to Augmenting a Bayesian VAR with Nonlinear Factors By Todd Clark; Florian Huber; Gary Koop
Bias Correction in Factor-Augmented Regression Models with Weak Factors By Peiyun Jiang; Yoshimasa Uematsu; Takashi Yamagata
Handling Sparse Non-negative Data in Finance By Agostino Capponi; Zhaonan Qu
The purpose of an estimator is what it does: Misspecification, estimands, and over-identification By Isaiah Andrew; Jiafeng Chen; Otavio Tecchio
Finite-Sample Non-Parametric Bounds with an Application to the Causal Effect of Workforce Gender Diversity on Firm Performance By Grace Lordan; Kaveh Salehzadeh Nobari
An Improved Inference for IV Regressions By Liyu Dou; Pengjin Min; Wenjie Wang; Yichong Zhang
A statistician's guide to weak-instrument-robust inference in instrumental variables regression with illustrations in Python By Malte Londschien
A note on simulation methods for the Dirichlet-Laplace prior By Luis Gruber; Gregor Kastner; Anirban Bhattacharya; Debdeep Pati; Natesh Pillai; David Dunson
Data driven modeling of multiple interest rates with generalized Vasicek-type models By Pauliina Ilmonen; Milla Laurikkala; Kostiantyn Ralchenko; Tommi Sottinen; Lauri Viitasaari
Statistical and Methodological Advances in Spatial Economics: A Comprehensive Review of Models, Empirical Strategies, and Policy Evaluation By Gorjian, Mahshid
Corrigendum to Maximum likelihood estimation of the multivariate normal mixture model By Boldea, Otilia; Magnus, Jan R.
Reasonable uncertainty: Confidence intervals in empirical Bayes discrimination detection By Jiaying Gu; Nikolaos Ignatiadis; Azeem M. Shaikh
Machine Learning with Multitype Protected Attributes: Intersectional Fairness through Regularisation By Ho Ming Lee; Katrien Antonio; Benjamin Avanzi; Lorenzo Marchi; Rui Zhou
Robust Parameter Estimation for Financial Data Simulation By Lee, David
Posterior inference of attitude-behaviour relationships using latent class choice models By Akshay Vij; Stephane Hess
Neural L\'evy SDE for State--Dependent Risk and Density Forecasting By Ziyao Wang; Svetlozar T Rachev
On the role of the design phase in a linear regression By Junho Choi
The Statistical Fairness-Accuracy Frontier By Alireza Fallah; Michael I. Jordan; Annie Ulichney
Predicting Stock Market Crash with Bayesian Generalised Pareto Regression By Sourish Das
Efficient two-sample instrumental variable estimators with change points and near-weak identification By Antoine, Bertille; Boldea, Otilia; Zaccaria, Niccolo

Efficient Difference-in-Differences and Event Study Estimators

By:	Xiaohong Chen; Pedro H. C. Sant'Anna; Haitian Xie
Abstract:	This paper investigates efficient Difference-in-Differences (DiD) and Event Study (ES) estimation using short panel data sets within the heterogeneous treatment effect framework, free from parametric functional form assumptions and allowing for variation in treatment timing. We provide an equivalent characterization of the DiD potential outcome model using sequential conditional moment restrictions on observables, which shows that the DiD identification assumptions typically imply nonparametric overidentification restrictions. We derive the semiparametric efficient influence function (EIF) in closed form for DiD and ES causal parameters under commonly imposed parallel trends assumptions. The EIF is automatically Neyman orthogonal and yields the smallest variance among all asymptotically normal, regular estimators of the DiD and ES parameters. Leveraging the EIF, we propose simple-to-compute efficient estimators. Our results highlight how to optimally explore different pre-treatment periods and comparison groups to obtain the tightest (asymptotic) confidence intervals, offering practical tools for improving inference in modern DiD and ES applications even in small samples. Calibrated simulations and an empirical application demonstrate substantial precision gains of our efficient estimators in finite samples.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.17729

Estimation in linear models with clustered data

By:	Anna Mikusheva; Mikkel S{\o}lvsten; Baiyun Jing
Abstract:	We study linear regression models with clustered data, high-dimensional controls, and a complicated structure of exclusion restrictions. We propose a correctly centered internal IV estimator that accommodates a variety of exclusion restrictions and permits within-cluster dependence. The estimator has a simple leave-out interpretation and remains computationally tractable. We derive a central limit theorem for its quadratic form and propose a robust variance estimator. We also develop inference methods that remain valid under weak identification. Our framework extends classical dynamic panel methods to more general clustered settings. An empirical application of a large-scale fiscal intervention in rural Kenya with spatial interference illustrates the approach.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.12860

Bayesian Double Machine Learning for Causal Inference

By:	Francis J. DiTraglia; Laura Liu
Abstract:	This paper proposes a simple, novel, and fully-Bayesian approach for causal inference in partially linear models with high-dimensional control variables. Off-the-shelf machine learning methods can introduce biases in the causal parameter known as regularization-induced confounding. To address this, we propose a Bayesian Double Machine Learning (BDML) method, which modifies a standard Bayesian multivariate regression model and recovers the causal effect of interest from the reduced-form covariance matrix. Our BDML is related to the burgeoning frequentist literature on DML while addressing its limitations in finite-sample inference. Moreover, the BDML is based on a fully generative probability model in the DML context, adhering to the likelihood principle. We show that in high dimensional setups the naive estimator implicitly assumes no selection on observables--unlike our BDML. The BDML exhibits lower asymptotic bias and achieves asymptotic normality and semiparametric efficiency as established by a Bernstein-von Mises theorem, thereby ensuring robustness to misspecification. In simulations, our BDML achieves lower RMSE, better frequentist coverage, and shorter confidence interval width than alternatives from the literature, both Bayesian and frequentist.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.12688

Uniform Quasi ML based inference for the panel AR(1) model

By:	Hugo Kruiniger
Abstract:	This paper proposes new inference methods for panel AR models with arbitrary initial conditions and heteroskedasticity and possibly additional regressors that are robust to the strength of identification. Specifically, we consider several Maximum Likelihood based methods of constructing tests and confidence sets (CSs) and show that (Quasi) LM tests and CSs that use the expected Hessian rather than the observed Hessian of the log-likelihood have correct asymptotic size (in a uniform sense). We derive the power envelope of a Fixed Effects version of such a LM test for hypotheses involving the autoregressive parameter when the average information matrix is estimated by a centered OPG estimator and the model is only second-order identified, and show that it coincides with the maximal attainable power curve in the worst case setting. We also study the empirical size and power properties of these (Quasi) LM tests and CSs.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.20855

Partial Identification of Causal Effects for Endogenous Continuous Treatments

By:	Abhinandan Dalal; Eric J. Tchetgen Tchetgen
Abstract:	No unmeasured confounding is a common assumption when reasoning about counterfactual outcomes, but such an assumption may not be plausible in observational studies. Sensitivity analysis is often employed to assess the robustness of causal conclusions to unmeasured confounding, but existing methods are predominantly designed for binary treatments. In this paper, we provide natural extensions of two extensively used sensitivity frameworks -- the Rosenbaum and Marginal sensitivity models -- to the setting of continuous exposures. Our generalization replaces scalar sensitivity parameters with sensitivity functions that vary with exposure level, enabling richer modeling and sharper identification bounds. We develop a unified pseudo-outcome regression formulation for bounding the counterfactual dose-response curve under both models, and propose corresponding nonparametric estimators which have second order bias. These estimators accommodate modern machine learning methods for obtaining nuisance parameter estimators, which are shown to achieve $L^2$- consistency, minimax rates of convergence under suitable conditions. Our resulting estimators of bounds for the counterfactual dose-response curve are shown to be consistent and asymptotic normal allowing for a user-specified bound on the degree of uncontrolled exposure endogeneity. We also offer a geometric interpretation that relates the Rosenbaum and Marginal sensitivity model and guides their practical usage in global versus targeted sensitivity analysis. The methods are validated through simulations and a real-data application on the effect of second-hand smoke exposure on blood lead levels in children.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.13946

Estimation of Non-Gaussian SVAR Using Tensor Singular Value Decomposition

By:	Alain Guay; Dalibor Stevanovic
Abstract:	This paper introduces a tensor singular value decomposition (TSVD) approach for estimating non-Gaussian Structural Vector Autoregressive (SVAR) models. The proposed methodology applies to both complete and partial identification of structural shocks. The estimation procedure relies on third- and/or fourth-order cumulants. We establish the asymptotic distribution of the estimator and conduct a simulation study to evaluate its finite-sample performance. The results demonstrate that the estimator is highly competitive in small samples compared to alternative methods under complete identification. In cases of partial identification, the estimator also exhibits very good performance in small samples. To illustrate the practical relevance of the procedure under partial identification, two empirical applications are presented. Cet article introduit une approche de décomposition en valeurs singulières tensorielles (TSVD) pour l’estimation des modèles vectoriels autorégressifs structurels (SVAR) non gaussiens. La méthodologie proposée s’applique aussi bien à l’identification complète qu’à l’identification partielle des chocs structurels. La procédure d’estimation repose sur les cumulants d’ordre trois et/ou quatre. Nous établissons la distribution asymptotique de l’estimateur et menons une étude de simulation afin d’évaluer ses performances en petits échantillons. Les résultats démontrent que l’estimateur est particulièrement compétitif dans les petits échantillons par rapport aux méthodes alternatives en cas d’identification complète. Dans les situations d’identification partielle, l’estimateur présente également de très bonnes performances en petits échantillons. Afin d’illustrer la pertinence pratique de la procédure en contexte d’identification partielle, deux applications empiriques sont présentées.
Keywords:	Non-Gaussian SVAR, tensor decomposition, cumulants, SVAR non gaussien, décomposition tensorielle, cumulants
JEL:	C12 C32 C51
Date:	2025–09–02
URL:	https://d.repec.org/n?u=RePEc:cir:cirwor:2025s-26

Doubly robust estimation of causal effects for random object outcomes with continuous treatments

By:	Satarupa Bhattacharjee; Bing Li; Xiao Wu; Lingzhou Xue
Abstract:	Causal inference is central to statistics and scientific discovery, enabling researchers to identify cause-and-effect relationships beyond associations. While traditionally studied within Euclidean spaces, contemporary applications increasingly involve complex, non-Euclidean data structures that reside in abstract metric spaces, known as random objects, such as images, shapes, networks, and distributions. This paper introduces a novel framework for causal inference with continuous treatments applied to non-Euclidean data. To address the challenges posed by the lack of linear structures, we leverage Hilbert space embeddings of the metric spaces to facilitate Fr\'echet mean estimation and causal effect mapping. Motivated by a study on the impact of exposure to fine particulate matter on age-at-death distributions across U.S. counties, we propose a nonparametric, doubly-debiased causal inference approach for outcomes as random objects with continuous treatments. Our framework can accommodate moderately high-dimensional vector-valued confounders and derive efficient influence functions for estimation to ensure both robustness and interpretability. We establish rigorous asymptotic properties of the cross-fitted estimators and employ conformal inference techniques for counterfactual outcome prediction. Validated through numerical experiments and applied to real-world environmental data, our framework extends causal inference methodologies to complex data structures, broadening its applicability across scientific disciplines.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.22754

Inference on Partially Identified Parameters with Separable Nuisance Parameters: a Two-Stage Method

By:	Xunkang Tian
Abstract:	This paper develops a two-stage method for inference on partially identified parameters in moment inequality models with separable nuisance parameters. In the first stage, the nuisance parameters are estimated separately, and in the second stage, the identified set for the parameters of interest is constructed using a refined chi-squared test with variance correction that accounts for the first-stage estimation error. We establish the asymptotic validity of the proposed method under mild conditions and characterize its finite-sample properties. The method is broadly applicable to models where direct elimination of nuisance parameters is difficult or introduces conservativeness. Its practical performance is illustrated through an application: structural estimation of entry and exit costs in the U.S. vehicle market based on Wollmann (2018).
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.19853

The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters

By:	Shosei Sakaguchi
Abstract:	This paper investigates the identification power gained by combining experimental data, in which treatment is randomized, with observational data, in which treatment is self-selected, for distributional treatment effect (DTE) parameters. While experimental data identify average treatment effects, many DTE parameters, such as the distribution of individual treatment effects, are only partially identified. We examine whether, and how, combining the two data sources tightens the identified set for these parameters. For broad classes of DTE parameters, we derive sharp bounds under the combined data and clarify the mechanism through which the data combination tightens the identified set relative to using experimental data alone. Our analysis highlights that self-selection in the observational data is a key source of identification power. We also characterize necessary and sufficient conditions under which the combined data shrink the identified set, showing that such shrinkage generally occurs unless selection-on-observables holds in the observational data. We also propose a linear programming approach to compute the sharp bounds, which can accommodate additional structural restrictions such as mutual stochastic monotonicity of potential outcomes and the generalized Roy model. An empirical application using data on negative campaign advertisements in a U.S. presidential election illustrates the practical relevance of the approach.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.12206

Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power

By:	Daniel Molitor; Samantha Gold
Abstract:	Adaptive experiments such as multi-armed bandits offer efficiency gains over traditional randomized experiments but pose two major challenges: invalid inference on the Average Treatment Effect (ATE) due to adaptive sampling and low statistical power for sub-optimal treatments. We address both issues by extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision. Second, we introduce MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference. Both methods retain MAD's core advantage of constructing asymptotic confidence sequences (CSs) that allow researchers to continuously monitor ATE estimates and stop data collection once a desired precision or significance criterion is met. Empirically, we validate both methods using simulations and real-world data. In simulations, MADCovar reduces CS width by up to $60\%$ relative to MAD. In a large-scale political RCT with $\approx32, 000$ participants, MADCovar achieves similar precision gains. MADMod improves statistical power and inferential precision across all treatment arms, particularly for suboptimal treatments. Simulations show that MADMod sharply reduces Type II error while preserving the efficiency benefits of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments more practical, reliable, and efficient for applied researchers across many domains. Our proposed methods are implemented through an open-source software package.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.20523

The exact distribution of the conditional likelihood-ratio test in instrumental variables regression

By:	Malte Londschien
Abstract:	We derive the exact asymptotic distribution of the conditional likelihood-ratio test in instrumental variables regression under weak instrument asymptotics and for multiple endogenous variables. The distribution is conditional on all eigenvalues of the concentration matrix, rather than only the smallest eigenvalue as in an existing asymptotic upper bound. This exact characterization leads to a substantially more powerful test if there are differently identified endogenous variables. We provide computational methods implementing the test and demonstrate the power gains through numerical analysis.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.04144

Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study

By:	Harrison Katz; Robert E. Weiss
Abstract:	High-dimensional vector autoregressive (VAR) models offer a versatile framework for multivariate time series analysis, yet face critical challenges from over-parameterization and uncertain lag order. In this paper, we systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and normal) and two frequentist regularization approaches (ridge and nonparametric shrinkage) under three carefully crafted simulation scenarios. These scenarios encompass (i) overfitting in a low-dimensional setting, (ii) sparse high-dimensional processes, and (iii) a combined scenario where both large dimension and overfitting complicate inference. We evaluate each method in quality of parameter estimation (root mean squared error, coverage, and interval length) and out-of-sample forecasting (one-step-ahead forecast RMSE). Our findings show that local-global Bayesian methods, particularly the horseshoe, dominate in maintaining accurate coverage and minimizing parameter error, even when the model is heavily over-parameterized. Frequentist ridge often yields competitive point forecasts but underestimates uncertainty, leading to sub-nominal coverage. A real-data application using macroeconomic variables from Canada illustrates how these methods perform in practice, reinforcing the advantages of local-global priors in stabilizing inference when dimension or lag order is inflated.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.05489

A Nonparametric Approach to Augmenting a Bayesian VAR with Nonlinear Factors

By:	Todd Clark; Florian Huber; Gary Koop
Abstract:	This paper proposes a Vector Autoregression augmented with nonlinear factors that are modeled nonparametrically using regression trees. There are four main advantages of our model. First, modeling potential nonlinearities nonparametrically lessens the risk of mis-specification. Second, the use of factor methods ensures that departures from linearity are modeled parsimoniously. In particular, they exhibit functional pooling where a small number of nonlinear factors are used to model common nonlinearities across variables. Third, Bayesian computation using MCMC is straightforward even in very high dimensional models, allowing for efficient, equation by equation estimation, thus avoiding computational bottlenecks that arise in popular alternatives such as the time varying parameter VAR. Fourth, existing methods for identifying structural economic shocks in linear factor models can be adapted for the nonlinear case in a straightforward fashion using our model. Exercises involving artificial and macroeconomic data illustrate the properties of our model and its usefulness for forecasting and structural economic analysis.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.13972

Bias Correction in Factor-Augmented Regression Models with Weak Factors

By:	Peiyun Jiang; Yoshimasa Uematsu; Takashi Yamagata
Abstract:	In this paper, we study the asymptotic bias of the factor-augmented regression estimator and its reduction, which is augmented by the $r$ factors extracted from a large number of $N$ variables with $T$ observations. In particular, we consider general weak latent factor models with $r$ signal eigenvalues that may diverge at different rates, $N^{\alpha _{k}}$, $0
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.02066

Handling Sparse Non-negative Data in Finance

By:	Agostino Capponi; Zhaonan Qu
Abstract:	We show that Poisson regression, though often recommended over log-linear regression for modeling count and other non-negative variables in finance and economics, can be far from optimal when heteroskedasticity and sparsity -- two common features of such data -- are both present. We propose a general class of moment estimators, encompassing Poisson regression, that balances the bias-variance trade-off under these conditions. A simple cross-validation procedure selects the optimal estimator. Numerical simulations and applications to corporate finance data reveal that the best choice varies substantially across settings and often departs from Poisson regression, underscoring the need for a more flexible estimation framework.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.01478

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

By:	Isaiah Andrew; Jiafeng Chen; Otavio Tecchio
Abstract:	In over-identified models, misspecification -- the norm rather than exception -- fundamentally changes what estimators estimate. Different estimators imply different estimands rather than different efficiency for the same target. A review of recent applications of generalized method of moments in the American Economic Review suggests widespread acceptance of this fact: There is little formal specification testing and widespread use of estimators that would be inefficient were the model correct, including the use of "hand-selected" moments and weighting matrices. Motivated by these observations, we review and synthesize recent results on estimation under model misspecification, providing guidelines for transparent and robust empirical research. We also provide a new theoretical result, showing that Hansen's J-statistic measures, asymptotically, the range of estimates achievable at a given standard error. Given the widespread use of inefficient estimators and the resulting researcher degrees of freedom, we thus particularly recommend the broader reporting of J-statistics.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.13076

Finite-Sample Non-Parametric Bounds with an Application to the Causal Effect of Workforce Gender Diversity on Firm Performance

By:	Grace Lordan; Kaveh Salehzadeh Nobari
Abstract:	Classical Manski bounds identify average treatment effects under minimal assumptions but, in finite samples, assume that latent conditional expectations are bounded by the sample's own extrema or that the population extrema are known a priori -- often untrue in firm-level data with heavy tails. We develop a finite-sample, concentration-driven band (concATE) that replaces that assumption with a Dvoretzky--Kiefer--Wolfowitz tail bound, combines it with delta-method variance, and allocates size via Bonferroni. The band extends to a group-sequential design that controls the family-wise error when the first ``significant'' diversity threshold is data-chosen. Applied to 945 listed firms (2015 Q2--2022 Q1), concATE shows that senior-level gender diversity raises Tobin's Q once representation exceeds approximately 30\% in growth sectors and approximately 65\% in cyclical sectors.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.01622

An Improved Inference for IV Regressions

By:	Liyu Dou; Pengjin Min; Wenjie Wang; Yichong Zhang
Abstract:	Researchers often report empirical results that are based on low-dimensional IVs, such as the shift-share IV, together with many IVs. Could we combine these results in an efficient way and take advantage of the information from both sides? In this paper, we propose a combination inference procedure to solve the problem. Specifically, we consider a linear combination of three test statistics: a standard cluster-robust Wald statistic based on the low-dimensional IVs, a leave-one-cluster-out Lagrangian Multiplier (LM) statistic, and a leave-one-cluster-out Anderson-Rubin (AR) statistic. We first establish the joint asymptotic normality of the Wald, LM, and AR statistics and derive the corresponding limit experiment under local alternatives. Then, under the assumption that at least the low-dimensional IVs can strongly identify the parameter of interest, we derive the optimal combination test based on the three statistics and establish that our procedure leads to the uniformly most powerful (UMP) unbiased test among the class of tests considered. In particular, the efficiency gain from the combined test is of ``free lunch" in the sense that it is always at least as powerful as the test that is only based on the low-dimensional IVs or many IVs.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.23816

A statistician's guide to weak-instrument-robust inference in instrumental variables regression with illustrations in Python

By:	Malte Londschien
Abstract:	We provide an overview of results relating to estimation and weak-instrument-robust inference in instrumental variables regression. Methods are implemented in the ivmodels software package for Python, which we use to illustrate results.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.12474

A note on simulation methods for the Dirichlet-Laplace prior

By:	Luis Gruber; Gregor Kastner; Anirban Bhattacharya; Debdeep Pati; Natesh Pillai; David Dunson
Abstract:	Bhattacharya et al. (2015, Journal of the American Statistical Association 110(512): 1479-1490) introduce a novel prior, the Dirichlet-Laplace (DL) prior, and propose a Markov chain Monte Carlo (MCMC) method to simulate posterior draws under this prior in a conditionally Gaussian setting. The original algorithm samples from conditional distributions in the wrong order, i.e., it does not correctly sample from the joint posterior distribution of all latent variables. This note details the issue and provides two simple solutions: A correction to the original algorithm and a new algorithm based on an alternative, yet equivalent, formulation of the prior. This corrigendum does not affect the theoretical results in Bhattacharya et al. (2015).
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.11982

Data driven modeling of multiple interest rates with generalized Vasicek-type models

By:	Pauliina Ilmonen; Milla Laurikkala; Kostiantyn Ralchenko; Tommi Sottinen; Lauri Viitasaari
Abstract:	The Vasicek model is a commonly used interest rate model, and there exist many extensions and generalizations of it. However, most generalizations of the model are either univariate or assume the noise process to be Gaussian, or both. In this article, we study a generalized multivariate Vasicek model that allows simultaneous modeling of multiple interest rates while making minimal assumptions. In the model, we only assume that the noise process has stationary increments with a suitably decaying autocovariance structure. We provide estimators for the unknown parameters and prove their consistencies. We also derive limiting distributions for each estimator and provide theoretical examples. Furthermore, the model is tested empirically with both simulated data and real data.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.03208

Statistical and Methodological Advances in Spatial Economics: A Comprehensive Review of Models, Empirical Strategies, and Policy Evaluation

By:	Gorjian, Mahshid
Abstract:	This study brings together current advances in the statistical and methodological foundations of spatial economics, focusing on the use of quantitative models and empirical approaches to investigate the distribution of economic activity over geographic space. We combine classical principles with modern approaches that emphasize causal identification, structural estimation, and the use of statistical and computational tools such as spatial econometrics, machine learning, and big data analytics. The study focuses on methodological challenges in spatial data analysis, such as spatial autocorrelation, high dimensionality, and the use of Geographic Information Systems (GIS), while also discussing advances in the design and estimation of quantitative spatial models. The focus is on contemporary empirical applications that use natural experiments, quasi-experimental approaches, and advanced econometric tools to examine the effects of agglomeration, market access, and infrastructure policy. Despite significant advances, significant challenges remain in resilient model identification, dynamic analysis, and the integration of statistical approaches with new types of geographic data. This page focuses on statistical methodologies and serves as a resource for economists and the broader statistics community interested in spatial modeling, causal inference, and policy evaluation.
Keywords:	statistical methodology, causal inference, spatial econometrics, machine learning, quantitative models, spatial statistics, GIS.
JEL:	C01 C1
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125636

Corrigendum to Maximum likelihood estimation of the multivariate normal mixture model

By:	Boldea, Otilia (Tilburg University, School of Economics and Management); Magnus, Jan R. (Tilburg University, School of Economics and Management)
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:tiu:tiutis:bbee6f79-03a9-46fa-b365-4b42973cccf8

Reasonable uncertainty: Confidence intervals in empirical Bayes discrimination detection

By:	Jiaying Gu; Nikolaos Ignatiadis; Azeem M. Shaikh
Abstract:	We revisit empirical Bayes discrimination detection, focusing on uncertainty arising from both partial identification and sampling variability. While prior work has mostly focused on partial identification, we find that some empirical findings are not robust to sampling uncertainty. To better connect statistical evidence to the magnitude of real-world discriminatory behavior, we propose a counterfactual odds-ratio estimand with a attractive properties and interpretation. Our analysis reveals the importance of careful attention to uncertainty quantification and downstream goals in empirical Bayes analyses.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.13110

Machine Learning with Multitype Protected Attributes: Intersectional Fairness through Regularisation

By:	Ho Ming Lee; Katrien Antonio; Benjamin Avanzi; Lorenzo Marchi; Rui Zhou
Abstract:	Ensuring equitable treatment (fairness) across protected attributes (such as gender or ethnicity) is a critical issue in machine learning. Most existing literature focuses on binary classification, but achieving fairness in regression tasks-such as insurance pricing or hiring score assessments-is equally important. Moreover, anti-discrimination laws also apply to continuous attributes, such as age, for which many existing methods are not applicable. In practice, multiple protected attributes can exist simultaneously; however, methods targeting fairness across several attributes often overlook so-called "fairness gerrymandering", thereby ignoring disparities among intersectional subgroups (e.g., African-American women or Hispanic men). In this paper, we propose a distance covariance regularisation framework that mitigates the association between model predictions and protected attributes, in line with the fairness definition of demographic parity, and that captures both linear and nonlinear dependencies. To enhance applicability in the presence of multiple protected attributes, we extend our framework by incorporating two multivariate dependence measures based on distance covariance: the previously proposed joint distance covariance (JdCov) and our novel concatenated distance covariance (CCdCov), which effectively address fairness gerrymandering in both regression and classification tasks involving protected attributes of various types. We discuss and illustrate how to calibrate regularisation strength, including a method based on Jensen-Shannon divergence, which quantifies dissimilarities in prediction distributions across groups. We apply our framework to the COMPAS recidivism dataset and a large motor insurance claims dataset.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.08163

Robust Parameter Estimation for Financial Data Simulation

By:	Lee, David
Abstract:	Financial market data are known to be far from normal and replete with outliers, i.e., “dirty” data that contain errors. Data errors introduce extreme or aberrant data points that can significantly distort parameter estimation results. This paper proposes a robust estimation approach to achieve stable and accurate results. The robust estimation approach is particularly applicable for financial data that often features the three situations we are protecting against: occasional rogue values (outliers), small errors and underlying non-normality.
Keywords:	robust parameter estimation, financial market data, market data simulation, risk factor.
JEL:	C13 C15 C53 C63 G17
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125703

Posterior inference of attitude-behaviour relationships using latent class choice models

By:	Akshay Vij; Stephane Hess
Abstract:	The link between attitudes and behaviour has been a key topic in choice modelling for two decades, with the widespread application of ever more complex hybrid choice models. This paper proposes a flexible and transparent alternative framework for empirically examining the relationship between attitudes and behaviours using latent class choice models (LCCMs). Rather than embedding attitudinal constructs within the structural model, as in hybrid choice frameworks, we recover class-specific attitudinal profiles through posterior inference. This approach enables analysts to explore attitude-behaviour associations without the complexity and convergence issues often associated with integrated estimation. Two case studies are used to demonstrate the framework: one on employee preferences for working from home, and another on public acceptance of COVID-19 vaccines. Across both studies, we compare posterior profiling of indicator means, fractional multinomial logit (FMNL) models, factor-based representations, and hybrid specifications. We find that posterior inference methods provide behaviourally rich insights with minimal additional complexity, while factor-based models risk discarding key attitudinal information, and fullinformation hybrid models offer little gain in explanatory power and incur substantially greater estimation burden. Our findings suggest that when the goal is to explain preference heterogeneity, posterior inference offers a practical alternative to hybrid models, one that retains interpretability and robustness without sacrificing behavioural depth.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.08373

Neural L\'evy SDE for State--Dependent Risk and Density Forecasting

By:	Ziyao Wang; Svetlozar T Rachev
Abstract:	Financial returns are known to exhibit heavy tails, volatility clustering and abrupt jumps that are poorly captured by classical diffusion models. Advances in machine learning have enabled highly flexible functional forms for conditional means and volatilities, yet few models deliver interpretable state--dependent tail risk, capture multiple forecast horizons and yield distributions amenable to backtesting and execution. This paper proposes a neural L\'evy jump--diffusion framework that jointly learns, as functions of observable state variables, the conditional drift, diffusion, jump intensity and jump size distribution. We show how a single shared encoder yields multiple forecasting heads corresponding to distinct horizons (daily, weekly, etc.), facilitating multi--horizon density forecasts and risk measures. The state vector includes conventional price and volume features as well as novel complexity measures such as permutation entropy and recurrence quantification analysis determinism, which quantify predictability in the underlying process. Estimation is based on a quasi--maximum likelihood approach that separates diffusion and jump contributions via bipower variation weights and incorporates monotonicity and smoothness regularisation to ensure identifiability. A cost--aware portfolio optimiser translates the model's conditional densities into implementable trading strategies under leverage, turnover and no--trade--band constraints. Extensive empirical analyses on cross--sectional equity data demonstrate improved calibration, sharper tail control and economically significant risk reduction relative to baseline diffusive and GARCH benchmarks. The proposed framework is therefore an interpretable, testable and practically deployable method for state--dependent risk and density forecasting.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.01041

On the role of the design phase in a linear regression

By:	Junho Choi
Abstract:	The "design phase" refers to a stage in observational studies, during which a researcher constructs a subsample that achieves a better balance in covariate distributions between the treated and untreated units. In this paper, we study the role of this preliminary phase in the context of linear regression, offering a justification for its utility. To that end, we first formalize the design phase as a process of estimand adjustment via selecting a subsample. Then, we show that covariate balance of a subsample is indeed a justifiable criterion for guiding the selection: it informs on the maximum degree of model misspecification that can be allowed for a subsample, when a researcher wishes to restrict the bias of the estimand for the parameter of interest within a target level of precision. In this sense, the pursuit of a balanced subsample in the design phase is interpreted as identifying an estimand that is less susceptible to bias in the presence of model misspecification. Also, we demonstrate that covariate imbalance can serve as a sensitivity measure in regression analysis, and illustrate how it can structure a communication between a researcher and the readers of her report.
Date:	2025–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2509.01861

The Statistical Fairness-Accuracy Frontier

By:	Alireza Fallah; Michael I. Jordan; Annie Ulichney
Abstract:	Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions -- an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.17622

Predicting Stock Market Crash with Bayesian Generalised Pareto Regression

By:	Sourish Das
Abstract:	This paper develops a Bayesian Generalised Pareto Regression (GPR) model to forecast extreme losses in Indian equity markets, with a focus on the Nifty 50 index. Extreme negative returns, though rare, can cause significant financial disruption, and accurate modelling of such events is essential for effective risk management. Traditional Generalised Pareto Distribution (GPD) models often ignore market conditions; in contrast, our framework links the scale parameter to covariates using a log-linear function, allowing tail risk to respond dynamically to market volatility. We examine four prior choices for Bayesian regularisation of regression coefficients: Cauchy, Lasso (Laplace), Ridge (Gaussian), and Zellner's g-prior. Simulation results suggest that the Cauchy prior delivers the best trade-off between predictive accuracy and model simplicity, achieving the lowest RMSE, AIC, and BIC values. Empirically, we apply the model to large negative returns (exceeding 5%) in the Nifty 50 index. Volatility measures from the Nifty 50, S&P 500, and gold are used as covariates to capture both domestic and global risk drivers. Our findings show that tail risk increases significantly with higher market volatility. In particular, both S&P 500 and gold volatilities contribute meaningfully to crash prediction, highlighting global spillover and flight-to-safety effects. The proposed GPR model offers a robust and interpretable approach for tail risk forecasting in emerging markets. It improves upon traditional EVT-based models by incorporating real-time financial indicators, making it useful for practitioners, policymakers, and financial regulators concerned with systemic risk and stress testing.
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2506.17549

Efficient two-sample instrumental variable estimators with change points and near-weak identification

By:	Antoine, Bertille; Boldea, Otilia (Tilburg University, School of Economics and Management); Zaccaria, Niccolo (Tilburg University, School of Economics and Management)
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:tiu:tiutis:a546c23b-272e-4ba9-8b94-a7aeb888847f

This nep-ecm issue is ©2025 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.