|
on Econometrics |
By: | Junjie Li; Yukitoshi Matsushita |
Abstract: | This article develops a covariate balancing approach for the estimation of treatment effects on the treated (ATT) in a difference-in-differences (DID) research design when panel data are available. We show that the proposed covariate balancing propensity score (CBPS) DID estimator possesses several desirable properties: (i) local efficiency, (ii) double robustness in terms of consistency, (iii) double robustness in terms of inference, and (iv) faster convergence to the ATT compared to the augmented inverse probability weighting (AIPW) DID estimators when both working models are locally misspecified. These latter two characteristics set the CBPS DID estimator apart from the AIPW DID estimator theoretically. Simulation studies and an empirical study demonstrate the desirable finite sample performance of the proposed estimator. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.02097 |
By: | Antonia Antweiler; Joachim Freyberger |
Abstract: | This paper examines estimation of skill formation models, a critical component in understanding human capital development and its effects on individual outcomes. Existing estimators are either based on moment conditions and only applicable in specific settings or rely on distributional approximations that often do not align with the model. Our method employs an iterative likelihood-based procedure, which flexibly estimates latent variable distributions and recursively incorporates model restrictions across time periods. This approach reduces computational complexity while accommodating nonlinear production functions and measurement systems. Inference can be based on a bootstrap procedure that does not require re-estimating the model for bootstrap samples. Monte Carlo simulations and an empirical application demonstrate that our estimator outperforms existing methods, whose estimators can be substantially biased or noisy. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.18995 |
By: | Chris Muris; Cavit Pakel; Qichen Zhang |
Abstract: | We consider ordered logit models for directed network data that allow for flexible sender and receiver fixed effects that can vary arbitrarily across outcome categories. This structure poses a significant incidental parameter problem, particularly challenging under network sparsity or when some outcome categories are rare. We develop the first estimation method for this setting by extending tetrad-differencing conditional maximum likelihood (CML) techniques from binary choice network models. This approach yields conditional probabilities free of the fixed effects, enabling consistent estimation even under sparsity. Applying the CML principle to ordered data yields multiple likelihood contributions corresponding to different outcome thresholds. We propose and analyze two distinct estimators based on aggregating these contributions: an Equally-Weighted Tetrad Logit Estimator (ETLE) and a Pooled Tetrad Logit Estimator (PTLE). We prove PTLE is consistent under weaker identification conditions, requiring only sufficient information when pooling across categories, rather than sufficient information in each category. Monte Carlo simulations confirm the theoretical preference for PTLE, and an empirical application to friendship networks among Dutch university students demonstrates the method's value. Our approach reveals significant positive homophily effects for gender, smoking behavior, and academic program similarities, while standard methods without fixed effects produce counterintuitive results. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.16689 |
By: | Oguzhan Akgun; Alain Pirotte; Giovanni Urga; Zhenlin Yang |
Abstract: | This paper proposes a selective inference procedure for testing equal predictive ability in panel data settings with unknown heterogeneity. The framework allows predictive performance to vary across unobserved clusters and accounts for the data-driven selection of these clusters using the Panel Kmeans Algorithm. A post-selection Wald-type statistic is constructed, and valid $p$-values are derived under general forms of autocorrelation and cross-sectional dependence in forecast loss differentials. The method accommodates conditioning on covariates or common factors and permits both strong and weak dependence across units. Simulations demonstrate the finite-sample validity of the procedure and show that it has very high power. An empirical application to exchange rate forecasting using machine learning methods illustrates the practical relevance of accounting for unknown clusters in forecast evaluation. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.14621 |
By: | Facundo Arga\~naraz; Juan Carlos Escanciano |
Abstract: | Developing robust inference for models with nonparametric Unobserved Heterogeneity (UH) is both important and challenging. We propose novel Debiased Machine Learning (DML) procedures for valid inference on functionals of UH, allowing for partial identification of multivariate target and high-dimensional nuisance parameters. Our main contribution is a full characterization of all relevant Neyman-orthogonal moments in models with nonparametric UH, where relevance means informativeness about the parameter of interest. Under additional support conditions, orthogonal moments are globally robust to the distribution of the UH. They may still involve other high-dimensional nuisance parameters, but their local robustness reduces regularization bias and enables valid DML inference. We apply these results to: (i) common parameters, average marginal effects, and variances of UH in panel data models with high-dimensional controls; (ii) moments of the common factor in the Kotlarski model with a factor loading; and (iii) smooth functionals of teacher value-added. Monte Carlo simulations show substantial efficiency gains from using efficient orthogonal moments relative to ad-hoc choices. We illustrate the practical value of our approach by showing that existing estimates of the average and variance effects of maternal smoking on child birth weight are robust. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.13788 |
By: | Clara Augustin; Daniel Gutknecht; Cenchen Liu |
Abstract: | This paper examines the identification and estimation of treatment effects in staggered adoption designs -- a common extension of the canonical Difference-in-Differences (DiD) model to multiple groups and time-periods -- in the presence of (time varying) misclassification of the treatment status as well as of anticipation. We demonstrate that standard estimators are biased with respect to commonly used causal parameters of interest under such forms of misspecification. To address this issue, we provide modified estimators that recover the Average Treatment Effect of observed and true switching units, respectively. Additionally, we suggest a testing procedure aimed at detecting the timing and extent of misclassification and anticipation effects. We illustrate the proposed methods with an application to the effects of an anti-cheating policy on school mean test scores in high stakes national exams in Indonesia. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20415 |
By: | Zhiren Ma; Qian Zhao; Riquan Zhang; Zhaoxing Gao |
Abstract: | This paper proposes a novel diffusion-index model for forecasting when predictors are high-dimensional matrix-valued time series. We apply an $\alpha$-PCA method to extract low-dimensional matrix factors and build a bilinear regression linking future outcomes to these factors, estimated via iterative least squares. To handle weak factor structures, we introduce a supervised screening step to select informative rows and columns. Theoretical properties, including consistency and asymptotic normality, are established. Simulations and real data show that our method significantly improves forecast accuracy, with the screening procedure providing additional gains over standard benchmarks in out-of-sample mean squared forecast error. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.04259 |
By: | Jan Ditzen; Yiannis Karavias |
Abstract: | The past 20 years have brought fundamental advances in modeling unobserved heterogeneity in panel data. Interactive Fixed Effects (IFE) proved to be a foundational framework, generalizing the standard one-way and two-way fixed effects models by allowing the unit-specific unobserved heterogeneity to be interacted with unobserved time-varying common factors, allowing for more general forms of omitted variables. The IFE framework laid the theoretical foundations for other forms of heterogeneity, such as grouped fixed effects (GFE) and non-separable two-way fixed effects (NSTW). The existence of IFE, GFE or NSTW has significant implications for identification, estimation, and inference, leading to the development of many new estimators for panel data models. This paper provides an accessible review of the new estimation methods and their associated diagnostic tests, and offers a guide to empirical practice. In two separate empirical investigations we demonstrate that there is empirical support for the new forms of fixed effects and that the results can differ significantly from those obtained using traditional fixed effects estimators. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.19099 |
By: | Yambolov, Andrian |
Abstract: | When economic analysis requires simultaneous inference across multiple variables and time horizons, this paper shows that conventional pointwise quantiles in Bayesian structural vector autoregressions significantly understate the uncertainty of impulse responses. The performance of recently proposed joint inference methods, which produce noticeably different error band estimates, is evaluated, and calibration routines are suggested to ensure that they achieve the intended nominal probability coverage. Two practical applications illustrate the implications of these findings: (i) within a structural vector autoregression, the fiscal multiplier exhibits error bands that are 51% to 91% wider than previous estimates, and (ii) a pseudo-out-of-sample projection exercise for inflation and gross domestic product shows that joint inference methods could effectively summarize uncertainty for forecasts as well. These results underscore the importance of using joint inference methods for more robust econometric analysis. JEL Classification: C22, C32, C52 |
Keywords: | forecasts, impulse responses, joint inference, pointwise inference, vector autoregressions |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:ecb:ecbwps:20253100 |
By: | Philipp Alexander Schwarz; Oliver Schacht; Sven Klaassen; Johannes Oberpriller; Martin Spindler |
Abstract: | The RDD (regression discontinuity design) is a widely used framework for identification and estimation of causal effects at a cutoff of a single running variable. Practical settings, in particular those encountered in production systems, often involve decision-making defined by multiple thresholds and criteria. Common MRD (multi-score RDD) approaches transform these to a one-dimensional design, to employ identification and estimation results. However, this practice can introduce non-compliant behavior. We develop theoretical tools to identify and reduce some of this "fuzziness" when estimating the cutoff-effect on compliers of sub-rules. We provide a sound definition and categorization of unit behavior types for multi-dimensional cutoff-rules, extending existing categorizations. We identify conditions for the existence and identification of the cutoff-effect on complier in multiple dimensions, and specify when identification remains stable after excluding nevertaker and alwaystaker. Further, we investigate how decomposing cutoff-rules into simpler parts alters the unit behavior. This allows identification and removal of non-compliant units potentially improving estimates. We validate our framework on simulated and real-world data from opto-electronic semiconductor manufacturing. Our empirical results demonstrate the usability for refining production policies. Particularly we show that our approach decreases the estimation variance, highlighting the practical value of the MRD framework in manufacturing. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.15692 |
By: | Degui Li; Yayi Yan; Qiwei Yao |
Abstract: | In this paper, we consider the nonstationary matrix-valued time series with common stochastic trends. Unlike the traditional factor analysis which flattens matrix observations into vectors, we adopt a matrix factor model in order to fully explore the intrinsic matrix structure in the data, allowing interaction between the row and column stochastic trends, and subsequently improving the estimation convergence. It also reduces the computation complexity in estimation. The main estimation methodology is built on the eigenanalysis of sample row and column covariance matrices when the nonstationary matrix factors are of full rank and the idiosyncratic components are temporally stationary, and is further extended to tackle a more flexible setting when the matrix factors are cointegrated and the idiosyncratic components may be nonstationary. Under some mild conditions which allow the existence of weak factors, we derive the convergence theory for the estimated factor loading matrices and nonstationary factor matrices. In particular, the developed methodology and theory are applicable to the general case of heterogeneous strengths over weak factors. An easy-to-implement ratio criterion is adopted to consistently estimate the size of latent factor matrix. Both simulation and empirical studies are conducted to examine the numerical performance of the developed model and methodology in finite samples. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.11358 |
By: | Angelo Forino; Andrea Mercatanti; Giacomo Morelli |
Abstract: | Factor models for longitudinal data, where policy adoption is unconfounded with respect to a low-dimensional set of latent factor loadings, have become increasingly popular for causal inference. Most existing approaches, however, rely on a causal finite-sample approach or computationally intensive methods, limiting their applicability and external validity. In this paper, we propose a novel causal inference method for panel data based on inverse propensity score weighting where the propensity score is a function of latent factor loadings within a framework of causal inference from super-population. The approach relaxes the traditional restrictive assumptions of causal panel methods, while offering advantages in terms of causal interpretability, policy relevance, and computational efficiency. Under standard assumptions, we outline a three-step estimation procedure for the ATT and derive its large-sample properties using Mestimation theory. We apply the method to assess the causal effect of the Paris Agreement, a policy aimed at fostering the transition to a low-carbon economy, on European stock returns. Our empirical results suggest a statistically significant and negative short-run effect on the stock returns of firms that issued green bonds. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.08764 |
By: | Tomu Hirata; Undral Byambadalai; Tatsushi Oka; Shota Yasui; Shingo Uto |
Abstract: | We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.07738 |
By: | Peter C. B. Phillips (Yale University); Liang Jiang (Fudan University) |
Abstract: | This paper is part of a joint study of parametric autoregression with cross section curve time series, focussing on unit root (UR) nonstationary curve data autoregression. The Hilbert space setting extends scalar UR and local UR models to accommodate high dimensional cross section dependent data under very general conditions. New limit theory is introduced that involves two parameter Gaussian processes that generalize the standard UR and local UR asymptotics. Bias expansions provide extensions of the well-known results in scalar autoregression and fixed effect dynamic panels to functional dynamic regressions. Semiparametric and ADF-type UR tests are developed with corresponding limit theory that enables time series inference with high dimensional curve cross section data, allowing also for functional fixed effects and deterministic trends. The asymptotics reveal the effects of general forms of cross section dependence in wide nonstationary panel data modeling and show dynamic panel regression limit theory as a special limiting case of curve time series asymptotics. Simulations provide evidence of the impact of curve cross section data on estimation and test performance and the adequacy of the asymptotics. An empirical illustration of the methodology is provided to assess the presence of time series nonstationarity in household Engel curves among ageing seniors in Singapore using the Singapore life panel dataset. |
Date: | 2025–08–10 |
URL: | https://d.repec.org/n?u=RePEc:cwl:cwldpp:2454 |
By: | William W. Wang; Ali Jadbabaie |
Abstract: | It is commonly accepted that some phenomena are social: for example, individuals' smoking habits often correlate with those of their peers. Such correlations can have a variety of explanations, such as direct contagion or shared socioeconomic circumstances. The network linear-in-means model is a workhorse statistical model which incorporates these peer effects by including average neighborhood characteristics as regressors. Although the model's parameters are identifiable under mild structural conditions on the network, it remains unclear whether identification ensures reliable estimation in the "infill" asymptotic setting, where a single network grows in size. We show that when covariates are i.i.d. and the average network degree of nodes increases with the population size, standard estimators suffer from bias or slow convergence rates due to asymptotic collinearity induced by network averaging. As an alternative, we demonstrate that linear-in-sums models, which are based on aggregate rather than average neighborhood characteristics, do not exhibit such issues as long as the network degrees have some nontrivial variation, a condition satisfied by most network models. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.04897 |
By: | Dilip M. Nachane (Indira Gandhi Institute of Development Research) |
Abstract: | The maximum entropy principle is characterized as assuming the least about the unknown parameters in a statistical model. In its applied manifestations, it uses all the available information and makes the fewest possible assumptions regarding the unavailable information. The application of this principle to parametric spectrum estimation leads to an autoregressive transfer function. By appeal to a well known theorem in stochastic processes, a rational transfer function leads to a factorizable spectrum. This result combined with a classical theorem of analysis (due to Szego") forms the basis for two important algorithms for estimating the autoregressive spectrum viz. the Levinson-Durbin and Burg algorithms. The latter leads to estimators which are asymptotically MLEs (maximum likelihood estimators). |
Keywords: | Entropy, Jaynes' Principle, autoregressive spectrum, spectral factorization, Levinson, Durbin |
JEL: | C22 C32 |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:ind:igiwpp:2025-020 |
By: | Luigi Garzon; Vitor Possebom |
Abstract: | We analyze heterogenous, nonlinear treatment effects in shift-share designs with exogenous shares. We employ a triangular model and correct for treatment endogeneity using a control function. Our tools identify four target parameters. Two of them capture the observable heterogeneity of treatment effects, while one summarizes this heterogeneity in a single measure. The last parameter analyzes counterfactual, policy-relevant treatment assignment mechanisms. We propose flexible parametric estimators for these parameters and apply them to reevaluate the impact of Chinese imports on U.S. manufacturing employment. Our results highlight substantial treatment effect heterogeneity, which is not captured by commonly used shift-share tools. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.21915 |
By: | Jonas Esser; Mateus Maia; Judith Bosmans; Johanna van Dongen |
Abstract: | Healthcare decision-making often requires selecting among treatment options under budget constraints, particularly when one option is more effective but also more costly. Cost-effectiveness analysis (CEA) provides a framework for evaluating whether the health benefits of a treatment justify its additional costs. A key component of CEA is the estimation of treatment effects on both health outcomes and costs, which becomes challenging when using observational data, due to potential confounding. While advanced causal inference methods exist for use in such circumstances, their adoption in CEAs remains limited, with many studies relying on overly simplistic methods such as linear regression or propensity score matching. We believe that this is mainly due to health economists being generally unfamiliar with superior methodology. In this paper, we address this gap by introducing cost-effectiveness researchers to modern nonparametric regression models, with a particular focus on Bayesian Additive Regression Trees (BART). We provide practical guidance on how to implement BART in CEAs, including code examples, and discuss its advantages in producing more robust and credible estimates from observational data. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.03511 |
By: | Haojie Liu; Zihan Lin |
Abstract: | We introduce Galerkin-ARIMA, a novel time-series forecasting framework that integrates Galerkin projection techniques with the classical ARIMA model to capture potentially nonlinear dependencies in lagged observations. By replacing the fixed linear autoregressive component with a spline-based basis expansion, Galerkin-ARIMA flexibly approximates the underlying relationship among past values via ordinary least squares, while retaining the moving-average structure and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions for both the AR and MA components using two-stage Galerkin projections, establish conditions for asymptotic unbiasedness and consistency, and analyze the bias-variance trade-off under basis-size growth. Complexity analysis reveals that, for moderate basis dimensions, our approach can substantially reduce computational cost compared to maximum-likelihood ARIMA estimation. Through extensive simulations on four synthetic processes-including noisy ARMA, seasonal, trend-AR, and nonlinear recursion series-we demonstrate that Galerkin-ARIMA matches or closely approximates ARIMA's forecasting accuracy while achieving orders-of-magnitude speedups in rolling forecasting tasks. These results suggest that Galerkin-ARIMA offers a powerful, efficient alternative for modeling complex time series dynamics in high-volume or real-time applications. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.07469 |
By: | Yong Li; Sushanta K. Mallick; Tao Zeng; Junxing Zhang |
Abstract: | Optimal data detection in massive multiple-input multiple-output (MIMO) systems often requires prohibitively high computational complexity. A variety of detection algorithms have been proposed in the literature, offering different trade-offs between complexity and detection performance. In recent years, Variational Bayes (VB) has emerged as a widely used method for addressing statistical inference in the context of massive data. This study focuses on misspecified models and examines the risk functions associated with predictive distributions derived from variational posterior distributions. These risk functions, defined as the expectation of the Kullback-Leibler (KL) divergence between the true data-generating density and the variational predictive distributions, provide a framework for assessing predictive performance. We propose two novel information criteria for predictive model comparison based on these risk functions. Under certain regularity conditions, we demonstrate that the proposed information criteria are asymptotically unbiased estimators of their respective risk functions. Through comprehensive numerical simulations and empirical applications in economics and finance, we demonstrate the effectiveness of these information criteria in comparing misspecified models in the context of massive data. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.00763 |
By: | Yu Hao; Hiroyuki Kasahara; Katsumi Shimotsu |
Abstract: | This paper investigates how the discount factor and payoff functions can be identified in stationary infinite-horizon dynamic discrete choice models. In single-agent models, we show that common nonparametric assumptions on per-period payoffs -- such as homogeneity of degree one, monotonicity, concavity, zero cross-differences, and complementarity -- provide identifying restrictions on the discount factor. These restrictions take the form of polynomial equalities and inequalities with degrees bounded by the cardinality of the state space. These restrictions also identify payoff functions under standard normalization at one action. In dynamic game models, we show that firm-specific discount factors can be identified using assumptions such as irrelevance of other firms' lagged actions, exchangeability, and the independence of adjustment costs from other firms' actions. Our results demonstrate that widely used nonparametric assumptions in economic analysis can provide substantial identifying power in dynamic structural models. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.19814 |
By: | JosŽ-Antonio Esp’n-S‡nchez (Yale University); Charles Hodgson (Yale University); Kevin OÕNeill (The MITRE Corporation) |
Abstract: | We propose a new way to obtain identification results using order statistics as finite mixtures with two key properties: i) the weights are known integer numbers; and ii) the elements of the mixture are the distributions of the maximum over a subset of the original random variables. We leverage Exponentiated Distributions (ED), which extend extreme value theory results. ED are max-stable, and we show that finite mixtures of ED are linearly independent. This enables us to derive non-parametric identification results and extend commonly known results using Gumbel and FrŽchet distributions, both examples of ED. The results have broad applications in auctions, discrete-choice, and other settings where maximum or minimum choices play a central role. We illustrate the usefulness of our results by proposing new estimators for auctions with bidder-level heterogeneity. |
Date: | 2025–08–15 |
URL: | https://d.repec.org/n?u=RePEc:cwl:cwldpp:2455 |
By: | Jialuo Chen; Zhaoxing Gao; Ruey S. Tsay |
Abstract: | We investigate forward variable selection for ultra-high dimensional linear regression using a Gram-Schmidt orthogonalization procedure. Unlike the commonly used Forward Regression (FR) method, which computes regression residuals using an increasing number of selected features, or the Orthogonal Greedy Algorithm (OGA), which selects variables based on their marginal correlations with the residuals, our proposed Gram-Schmidt Forward Regression (GSFR) simplifies the selection process by evaluating marginal correlations between the residuals and the orthogonalized new variables. Moreover, we introduce a new model size selection criterion that determines the number of selected variables by detecting the most significant change in their unique contributions, effectively filtering out redundant predictors along the selection path. While GSFR is theoretically equivalent to FR except for the stopping rule, our refinement and the newly proposed stopping rule significantly improve computational efficiency. In ultra-high dimensional settings, where the dimensionality far exceeds the sample size and predictors exhibit strong correlations, we establish that GSFR achieves a convergence rate comparable to OGA and ensures variable selection consistency under mild conditions. We demonstrate the proposed method {using} simulations and real data examples. Extensive numerical studies show that GSFR outperforms commonly used methods in ultra-high dimensional variable selection. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.04668 |
By: | Federico Echenique; Alireza Fallah; Michael I. Jordan |
Abstract: | We propose a general methodology for recovering preference parameters from data on choices and response times. Our methods yield estimates with fast ($1/n$ for $n$ data points) convergence rates when specialized to the popular Drift Diffusion Model (DDM), but are broadly applicable to generalizations of the DDM as well as to alternative models of decision making that make use of response time data. The paper develops an empirical application to an experiment on intertemporal choice, showing that the use of response times delivers predictive accuracy and matters for the estimation of economically relevant parameters. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20403 |
By: | Senan Hogan-Hennessy |
Abstract: | Natural experiments are a cornerstone of applied economics, providing settings for estimating causal effects with a compelling argument for treatment randomisation, but give little indication of the mechanisms behind causal effects. Causal Mediation (CM) provides a framework to analyse mechanisms by identifying the average direct and indirect effects (CM effects), yet conventional CM methods require the relevant mediator is as-good-as-randomly assigned. When people choose the mediator based on costs and benefits (whether to visit a doctor, to attend university, etc.), this assumption fails and conventional CM analyses are at risk of bias. I propose a control function strategy that uses instrumental variation in mediator take-up costs, delivering unbiased direct and indirect effects when selection is driven by unobserved gains. The method identifies CM effects via the marginal effect of the mediator, with parametric or semi-parametric estimation that is simple to implement in two stages. Applying these methods to the Oregon Health Insurance Experiment reveals a substantial portion of the Medicaid lottery's effect on self-reported health and happiness flows through increased healthcare usage -- an effect that a conventional CM analysis would mistake. This approach gives applied researchers an alternative method to estimate CM effects when an initial treatment is quasi-randomly assigned, but the mediator is not, as is common in natural experiments. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.05449 |
By: | Sourojyoti Barick |
Abstract: | This paper provides insight into the estimation and asymptotic behavior of parameters in interest rate models, focusing primarily on the Cox-Ingersoll-Ross (CIR) process and its extension -- the more general Chan-Karolyi-Longstaff-Sanders (CKLS) framework ($\alpha\in[0.5, 1]$). The CIR process is widely used in modeling interest rates which possess the mean reverting feature. An Extension of CIR model, CKLS model serves as a foundational case for analyzing more complex dynamics. We employ Euler-Maruyama discretization to transform the continuous-time stochastic differential equations (SDEs) of these models into a discretized form that facilitates efficient simulation and estimation of parameters using linear regression techniques. We established the strong consistency and asymptotic normality of the estimators for the drift and volatility parameters, providing a theoretical underpinning for the parameter estimation process. Additionally, we explore the boundary behavior of these models, particularly in the context of unattainability at zero and infinity, by examining the scale and speed density functions associated with generalized SDEs involving polynomial drift and diffusion terms. Furthermore, we derive sufficient conditions for the existence of a stationary distribution within the CKLS framework and the corresponding stationary density function; and discuss its dependence on model parameters for $\alpha\in[0.5, 1]$. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.10041 |
By: | Xuan Liu |
Abstract: | Moment matching is an easy-to-implement and usually effective method to reduce variance of Monte Carlo simulation estimates. On the other hand, there is no guarantee that moment matching will always reduce simulation variance for general integration problems at least asymptotically, i.e. when the number of samples is large. We study the characterization of conditions on a given underlying distribution $X$ under which asymptotic variance reduction is guaranteed for a general integration problem $\mathbb{E}[f(X)]$ when moment matching techniques are applied. We show that a sufficient and necessary condition for such asymptotic variance reduction property is $X$ being a normal distribution. Moreover, when $X$ is a normal distribution, formulae for efficient estimation of simulation variance for (first and second order) moment matching Monte Carlo are obtained. These formulae allow estimations of simulation variance as by-products of the simulation process, in a way similar to variance estimations for plain Monte Carlo. Moreover, we propose non-linear moment matching schemes for any given continuous distribution such that asymptotic variance reduction is guaranteed. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.03790 |
By: | Filippo Palomba |
Abstract: | This paper investigates the challenges of optimal online policy learning under missing data. State-of-the-art algorithms implicitly assume that rewards are always observable. I show that when rewards are missing at random, the Upper Confidence Bound (UCB) algorithm maintains optimal regret bounds; however, it selects suboptimal policies with high probability as soon as this assumption is relaxed. To overcome this limitation, I introduce a fully nonparametric algorithm-Doubly-Robust Upper Confidence Bound (DR-UCB)-which explicitly models the form of missingness through observable covariates and achieves a nearly-optimal worst-case regret rate of $\widetilde{O}(\sqrt{T})$. To prove this result, I derive high-probability bounds for a class of doubly-robust estimators that hold under broad dependence structures. Simulation results closely match the theoretical predictions, validating the proposed framework. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.19596 |
By: | Hanna L. Adam; Mario Larch; Michael Nower |
Abstract: | Performing an analysis of variance (ANOVA) on a large dataset spanning many dimensions becomes computationally challenging or even infeasible. We develop a new, fast procedure, ANOVA-HDFE, which uses sequential linear regressions and builds on recent advances in regression analysis with high-dimensional fixed effects (HDFE). It accommodates both balanced and unbalanced settings with many categorical and continuous covariates, while also allowing for high-dimensional fixed effects. Applying ANOVA-HDFE to bilateral trade flows, we find that 60% of the variation is at the country or country-time level. Moreover, a substantial proportion of the pair-specific variation remains unexplained by standard trade cost proxy variables. |
Keywords: | analysis of variance, high-dimensional fixed effects, large data, variation in high dimensions, variation of bilateral trade flows, asymmetric trade costs, ANOVA-HDFE |
JEL: | F14 C23 C55 F16 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12055 |
By: | André Luiz Ferreira (Universidade Federal do Pará); André Luis Squarize Chagas (Departmento de Economia, FEA-USP); Carlos Roberto Azzoni (Departmento de Economia, FEA-USP) |
Abstract: | This paper develops a spatial stochastic frontier framework for panel data that jointly accounts for spatial dependence and heteroskedastic technical inefficiency. Inefficiency and noise components are parameterized using scaling functions, while spatial dependence is modeled through both a spatial lag (SF-SLM) and a spatial Durbin specification (SF-SDM). Maximum likelihood estimation is implemented by explicitly incorporating the spatial autoregressive process into the log-likelihood function. A key innovation of this study is the use of the spatial multiplier to decompose estimated technical inefficiency into three components: (i) own inefficiency, (ii) spill-in effects (feedback of a unit’s inefficiency on itself through spatial interactions), and (iii) spillover effects (inefficiency transmitted from neighboring regions). This approach extends the stochastic frontier literature by showing that inefficiency is not purely local but can propagate across space. The method is applied to the Brazilian food manufacturing industry (2007–2018). Likelihood ratio tests confirm that spatial models outperform the nonspatial specification, with SF-SDM providing the best fit and more stable inefficiency estimates. Results reveal that, for an average region, approximately 9% of inefficiency is due to spillovers from neighbors, while 0.2% is explained by spill-in effects. Ignoring spatial structure would therefore overestimate region-specific inefficiency and underestimate the role of interregional linkages. The proposed framework offers a flexible tool for analyzing productive efficiency in spatially interconnected settings and provides new insights for regional policy and future research. |
Keywords: | Spatial stochastic frontier; Maximum likelihood estimator; Technical inefficiency; spatial spillover |
JEL: | C23 C51 R12 R15 L66 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ris:nereus:021487 |
By: | Tobias Adrian; Yuya Sasaki; Yulong Wang |
Abstract: | The Growth-at-Risk (GaR) framework has garnered attention in recent econometric literature, yet current approaches implicitly assume a constant Pareto exponent. We introduce novel and robust econometrics to estimate the tails of GaR based on a rigorous theoretical framework and establish validity and effectiveness. Simulations demonstrate consistent outperformance relative to existing alternatives in terms of predictive accuracy. We perform a long-term GaR analysis that provides accurate and insightful predictions, effectively capturing financial anomalies better than current methods. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.00263 |
By: | Yan Xu; Bo Zhou |
Abstract: | Networks are central to many economic and organizational applications, including workplace team formation, social platform recommendations, and classroom friendship development. In these settings, networks are modeled as graphs, with agents as nodes, agent pairs as edges, and edge weights capturing pairwise production or interaction outcomes. This paper develops an adaptive, or \textit{online}, policy that learns to form increasingly effective networks as data accumulates over time, progressively improving total network output measured by the sum of edge weights. Our approach builds on the weighted stochastic block model (WSBM), which captures agents' unobservable heterogeneity through discrete latent types and models their complementarities in a flexible, nonparametric manner. We frame the online network formation problem as a non-standard \textit{batched multi-armed bandit}, where each type pair corresponds to an arm, and pairwise reward depends on type complementarity. This strikes a balance between exploration -- learning latent types and complementarities -- and exploitation -- forming high-weighted networks. We establish two key results: a \textit{batched local asymptotic normality} result for the WSBM and an asymptotic equivalence between maximum likelihood and variational estimates of the intractable likelihood. Together, they provide a theoretical foundation for treating variational estimates as normal signals, enabling principled Bayesian updating across batches. The resulting posteriors are then incorporated into a tailored maximum-weight matching problem to determine the policy for the next batch. Simulations show that our algorithm substantially improves outcomes within a few batches, yields increasingly accurate parameter estimates, and remains effective even in nonstationary settings with evolving agent pools. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.18961 |
By: | Xinbing Kong; Cheng Liu; Bin Wu |
Abstract: | Asynchronous trading in high-frequency financial markets introduces significant biases into econometric analysis, distorting risk estimates and leading to suboptimal portfolio decisions. Existing synchronization methods, such as the previous-tick approach, suffer from information loss and create artificial price staleness. We introduce a novel framework that recasts the data synchronization challenge as a constrained matrix completion problem. Our approach recovers the potential matrix of high-frequency price increments by minimizing its nuclear norm -- capturing the underlying low-rank factor structure -- subject to a large-scale linear system derived from observed, asynchronous price changes. Theoretically, we prove the existence and uniqueness of our estimator and establish its convergence rate. A key theoretical insight is that our method accurately and robustly leverages information from both frequently and infrequently traded assets, overcoming a critical difficulty of efficiency loss in traditional methods. Empirically, using extensive simulations and a large panel of S&P 500 stocks, we demonstrate that our method substantially outperforms established benchmarks. It not only achieves significantly lower synchronization errors, but also corrects the bias in systematic risk estimates (i.e., eigenvalues) and the estimate of betas caused by stale prices. Crucially, portfolios constructed using our synchronized data yield consistently and economically significant higher out-of-sample Sharpe ratios. Our framework provides a powerful tool for uncovering the true dynamics of asset prices, with direct implications for high-frequency risk management, algorithmic trading, and econometric inference. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.12220 |