|
on Econometrics |
| By: | Yanli Lin (Economics Programme, University of Western Australia); Yichun Song (Center for Industrial and Business Organization and Institute for Advanced Economic Research, Dongbei University of Finance and Economics) |
| Abstract: | This paper develops a new, instrument-free semi-parametric copula framework for a spatial autoregressive (SAR) model to address endogeneity stemming from an endogenous spatial weights matrix, endogenous regressors, or both. Moving beyond conventional Gaussian copulas, we develop a flexible estimator based on the Student’s t copula with an unknown degrees-of-freedom (df) parameter, which nests the Gaussian case and allows the data to reveal the presence of tail dependence. We propose a sieve maximum likelihood estimator (MLE) that jointly estimates all structural, copula, and non-parametric marginal parameters, and establish that this joint estimator is consistent, asymptotically normal, and – unlike prevailing multi-stage copula-correction methods – semiparametrically efficient. Monte Carlo simulations underscore the flexibility of our approach, showing that copula misspecification inflates bias and variance, whereas joint estimation improves efficiency. In an empirical application to regional productivity spillovers, we find evidence of tail dependence and demonstrate that our method offers a credible alternative to approaches that rely on hard-to-verify excluded instruments |
| Keywords: | Spatial autoregressive model, Endogenous spatial weights matrix, Endogenous regressors, Copula method, Sieve maximum likelihood estimation |
| JEL: | C31 C51 |
| Date: | 2025 |
| URL: | https://d.repec.org/n?u=RePEc:uwa:wpaper:25-07 |
| By: | Prosper Dovonon; Nikolay Gospodinov |
| Abstract: | This paper studies the limiting behavior of the test for instrument exogeneity in linear models when there is uncertainty about the strength of the identification signal. We consider the test for conditional moment restrictions with an expanding set of constructed instruments. We establish the uniform validity of the standard normal asymptotic approximation, under the null, of this specification test over all possible degrees of model identification. As a result, this allows the researcher to use standard inference for testing instrument exogeneity without the need of any prior knowledge if the instruments are strong, semi-strong, weak, or completely irrelevant. Furthermore, we show that the test is consistent regardless of the instrument strength; i.e., even in cases (weak and completely irrelevant instruments) where the standard tests fail to exhibit asymptotic power. To obtain these results, we characterize the rate of the estimator under a drifting sequence for the identification signal. We illustrate the appealing properties of the test in simulations and an empirical application. |
| Keywords: | linear instrumental variables (IV) model; conditional test for instrument exogeneity; uniform inference; instrument strength; generalized method of moments (GMM) estimator; drifting sequences; expanding set of basis functions |
| JEL: | C12 C14 C26 C52 |
| Date: | 2025–09–25 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedawp:101963 |
| By: | Bach, Philipp; Klaaßen, Sven; Kueck, Jannis; Mattes, Mara; Spindler, Martin |
| Abstract: | Difference-in-differences (DiD) is one of the most popular approaches for empirical research in economics, political science, and beyond. Identification in these models is based on the conditional parallel trends assumption: In the absence of treatment, the average outcome of the treated and untreated group are assumed to evolve in parallel over time, conditional on pre-treatment covariates. We introduce a novel approach to sensitivity analysis for DiD models that assesses the robustness of DiD estimates to violations of this assumption due to unobservable confounders, allowing researchers to transparently assess and communicate the credibility of their causal estimation results. Our method focuses on estimation by Double Machine Learning and extends previous work on sensitivity analysis based on Riesz Representation in cross-sectional settings. We establish asymptotic bounds for point estimates and confidence intervals in the canonical 2 × 2 setting and group-time causal parameters in settings with staggered treatment adoption. Our approach makes it possible to relate the formulation of parallel trends violation to empirical evidence from (1) pre-testing, (2) covariate benchmarking and (3) standard reporting statistics and visualizations. We provide extensive simulation experiments demonstrating the validity of our sensitivity approach and diagnostics and apply our approach to two empirical applications. |
| Keywords: | Sensitivity Analysis, Difference-in-differences, Double Machine Learning, Riesz Representation, Causal Inference |
| Date: | 2025 |
| URL: | https://d.repec.org/n?u=RePEc:zbw:fubsbe:330188 |
| By: | Abhimanyu Gupta; Myung Hwan Seo |
| Abstract: | We develop a class of optimal tests for a structural break occurring at an unknown date in infinite and growing-order time series regression models, such as AR($\infty$), linear regression with increasingly many covariates, and nonparametric regression. Under an auxiliary i.i.d. Gaussian error assumption, we derive an average power optimal test, establishing a growing-dimensional analog of the exponential tests of Andrews and Ploberger (1994) to handle identification failure under the null hypothesis of no break. Relaxing the i.i.d. Gaussian assumption to a more general dependence structure, we establish a functional central limit theorem for the underlying stochastic processes, which features an extra high-order serial dependence term due to the growing dimension. We robustify our test both against this term and finite sample bias and illustrate its excellent performance and practical relevance in a Monte Carlo study and a real data empirical example. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.12262 |
| By: | Susan Athey; Guido Imbens; Zhaonan Qu; Davide Viviano |
| Abstract: | This paper studies estimation of causal effects in a panel data setting. We introduce a new estimator, the Triply RObust Panel (TROP) estimator, that combines (i) a flexible model for the potential outcomes based on a low-rank factor structure on top of a two-way-fixed effect specification, with (ii) unit weights intended to upweight units similar to the treated units and (iii) time weights intended to upweight time periods close to the treated time periods. We study the performance of the estimator in a set of simulations designed to closely match several commonly studied real data sets. We find that there is substantial variation in the performance of the estimators across the settings considered. The proposed estimator outperforms two-way-fixed-effect/difference-in-differences, synthetic control, matrix completion and synthetic-difference-in-differences estimators. We investigate what features of the data generating process lead to this performance, and assess the relative importance of the three components of the proposed estimator. We have two recommendations. Our preferred strategy is that researchers use simulations closely matched to the data they are interested in, along the lines discussed in this paper, to investigate which estimators work well in their particular setting. A simpler approach is to use more robust estimators such as synthetic difference-in-differences or the new triply robust panel estimator which we find to substantially outperform two-way fixed effect estimators in many empirically relevant settings. |
| Date: | 2025–08 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.21536 |
| By: | Christoph Breunig; Ruixuan Liu; Zhengfei Yu |
| Abstract: | We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.24634 |
| By: | Alexander Almeida; Susan Athey; Guido Imbens; Eva Lestant; Alexia Olaizola |
| Abstract: | This paper studies variance estimators in panel data settings. There has been a recent surge in research on panel data models with a number of new estimators proposed. However, there has been less attention paid to the quantification of the precision of these estimators. Of the variance estimators that have been proposed, their relative merits are not well understood. In this paper we develop a common framework for comparing some of the proposed variance estimators for generic point estimators. We reinterpret three commonly used approaches as targeting different conditional variances under an exchangeability assumption. We find that the estimators we consider are all valid on average, but that their performance in terms of power differs substantially depending on the heteroskedasticity structure of the data. Building on these insights, we propose a new variance estimator that flexibly accounts for heteroskedasticity in both the unit and time dimensions, and delivers superior statistical power in realistic panel data settings. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.11841 |
| By: | Tatsuru Kikuchi |
| Abstract: | This paper develops a nonparametric framework for identifying and estimating spatial boundaries of treatment effects in settings with geographic spillovers. While atmospheric dispersion theory predicts exponential decay of pollution under idealized assumptions, these assumptions -- steady winds, homogeneous atmospheres, flat terrain -- are systematically violated in practice. I establish nonparametric identification of spatial boundaries under weak smoothness and monotonicity conditions, propose a kernel-based estimator with data-driven bandwidth selection, and derive asymptotic theory for inference. Using 42 million satellite observations of NO$_2$ concentrations near coal plants (2019-2021), I find that nonparametric kernel regression reduces prediction errors by 1.0 percentage point on average compared to parametric exponential decay assumptions, with largest improvements at policy-relevant distances: 2.8 percentage points at 10 km (near-source impacts) and 3.7 percentage points at 100 km (long-range transport). Parametric methods systematically underestimate near-source concentrations while overestimating long-range decay. The COVID-19 pandemic provides a natural experiment validating the framework's temporal sensitivity: NO$_2$ concentrations dropped 4.6\% in 2020, then recovered 5.7\% in 2021. These results demonstrate that flexible, data-driven spatial methods substantially outperform restrictive parametric assumptions in environmental policy applications. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.12289 |
| By: | Zhentao Shi; Yishu Wang |
| Abstract: | We leverage an ensemble of many regressors, the number of which can exceed the sample size, for economic prediction. An underlying latent factor structure implies a dense regression model with highly correlated covariates. We propose the L2-relaxation method for estimating the regression coefficients and extrapolating the out-of-sample (OOS) outcomes. This framework can be applied to policy evaluation using the panel data approach (PDA), where we further establish inference for the average treatment effect. In addition, we extend the traditional single unit setting in PDA to allow for many treated units with a short post-treatment period. Monte Carlo simulations demonstrate that our approach exhibits excellent finite sample performance for both OOS prediction and policy evaluation. We illustrate our method with two empirical examples: (i) predicting China's producer price index growth rate and evaluating the effect of real estate regulations, and (ii) estimating the impact of Brexit on the stock returns of British and European companies. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.12183 |
| By: | Sung Jae Jun; Sokbae Lee |
| Abstract: | We develop a framework for identifying and estimating persuasion effects in regression discontinuity (RD) designs. The RD persuasion rate measures the probability that individuals at the threshold would take the action if exposed to a persuasive message, given that they would not take the action without exposure. We present identification results for both sharp and fuzzy RD designs, derive sharp bounds under various data scenarios, and extend the analysis to local compliers. Estimation and inference rely on local polynomial regression, enabling straightforward implementation with standard RD tools. Applications to public health and media illustrate its empirical relevance. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.26517 |
| By: | Zhuoxun Li; Clifford M. Hurvich |
| Abstract: | In this paper, we propose a new heteroskedasticity and autocorrelation consistent covariance matrix estimator based on the prewhitened kernel estimator and a localized leave-one-out frequency domain cross-validation (FDCV). We adapt the cross-validated log likelihood (CVLL) function to simultaneously select the order of the prewhitening vector autoregression (VAR) and the bandwidth. The prewhitening VAR is estimated by the Burg method without eigen adjustment as we find the eigen adjustment rule of Andrews and Monahan (1992) can be triggered unnecessarily and harmfully when regressors have nonzero mean. Through Monte Carlo simulations and three empirical examples, we illustrate the flaws of eigen adjustment and the reliability of our method. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.23256 |
| By: | Bauer, Dietmar; del Barrio Castro, Tomás |
| Abstract: | Economic time series often show a strong persistency as well as seasonal variations that are appropri ately modelled using seasonal unit root models in addition to deterministic components. In many cases di¤erent variables within a vector time series are driven by identical common trends and cycles leading to cointegration. This paper investigates the consequences for the properties of vector processes when some components are aggregated in time. This may involve moving from a fully observed system that is seasonally cointegrated at a frequency !k = 2 k=S with k = 1;:::;(S 1)=2 where S is the number of seasons per year, to a system with time series sampled at high sampling rate (HSR) observed for S seasons per year and time series with low sampling rate (LSR) observed SA seasons per year, such that SA = S=Q and Q is an integer. The (partial) aggregation has implications on the unit root and cointegration properties: Aggregation potentially shifts the frequency of the unit roots. This may lead to an aliasing e¤ect wherein common cycles to di¤erent unit roots become aligned and cannot be separated any more, in turn impacting cointegrating relations. This paper uses the triangular systems representations in the bivariate case as well as the state space framework (in a general setting) to investigate the e¤ect of aggregation on the unit root properties of multivariate time series. The main results indicate under which assumptions and in which situations the analysis of the integration and cointegration properties of time series with mixed sampling rate relates to the same properties of the underyling data generating process. The results also discuss full aggregation of all components. These results lead to the proposal of an e¤ective econometric strategy for detecting cointegration at the various sampling rates, as is demonstrated in a simulation exercise. Finally an empirical application with monthly data of arrivals and departures of the Mallorca Airport, also illustrate the ndings collected in the present work. |
| Keywords: | Seasonal Cointegration, Polynomial cointegration, Periodic Cointegration, Mixed-Frequency, Aggregation, Demodulator operator |
| JEL: | C22 C32 |
| Date: | 2025–09–05 |
| URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:126066 |
| By: | Tatsuru Kikuchi |
| Abstract: | I develop a nonparametric framework for identifying spatial boundaries of treatment effects without imposing parametric functional form restrictions. The method employs local linear regression with data-driven bandwidth selection to flexibly estimate spatial decay patterns and detect treatment effect boundaries. Monte Carlo simulations demonstrate that the nonparametric approach exhibits lower bias and correctly identifies the absence of boundaries when none exist, unlike parametric methods that may impose spurious spatial patterns. I apply this framework to bank branch openings during 2015--2020, matching 5, 743 new branches to 5.9 million mortgage applications across 14, 209 census tracts. The analysis reveals that branch proximity significantly affects loan application volume (8.5\% decline per 10 miles) but not approval rates, consistent with branches stimulating demand through local presence while credit decisions remain centralized. Examining branch survival during the digital transformation era (2010--2023), I find a non-monotonic relationship with area income: high-income areas experience more closures despite conventional wisdom. This counterintuitive pattern reflects strategic consolidation of redundant branches in over-banked wealthy urban areas rather than discrimination against poor neighborhoods. Controlling for branch density, urbanization, and competition, the direct income effect diminishes substantially, with branch density emerging as the primary determinant of survival. These findings demonstrate the necessity of flexible nonparametric methods for detecting complex spatial patterns that parametric models would miss, and challenge simplistic narratives about banking deserts by revealing the organizational complexity underlying spatial consolidation decisions. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.13148 |
| By: | Yasin Simsek |
| Abstract: | Spot covariance estimation is commonly based on high-frequency open-to-close return data over short time windows, but such approaches face a trade-off between statistical accuracy and localization. In this paper, I introduce a new estimation framework using high-frequency candlestick data, which include open, high, low, and close prices, effectively addressing this trade-off. By exploiting the information contained in candlesticks, the proposed method improves estimation accuracy relative to benchmarks while preserving local structure. I further develop a test for spot covariance inference based on candlesticks that demonstrates reasonable size control and a notable increase in power, particularly in small samples. Motivated by recent work in the finance literature, I empirically test the market neutrality of the iShares Bitcoin Trust ETF (IBIT) using 1-minute candlestick data for the full year of 2024. The results show systematic deviations from market neutrality, especially in periods of market stress. An event study around FOMC announcements further illustrates the new method's ability to detect subtle shifts in response to relatively mild information events. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.12911 |
| By: | Christopher P. Chambers; Yusufcan Masatlioglu; Ruodu Wang |
| Abstract: | A common theme underlying many problems in statistics and economics involves the determination of a systematic method of selecting a joint distribution consistent with a specified list of categorical marginals, some of which have an ordinal structure. We propose guidance in narrowing down the set of possible methods by introducing Invariant Aggregation (IA), a natural property that requires merging adjacent categories in one marginal not to alter the joint distribution over unaffected values. We prove that a model satisfies IA if and only if it is a copula model. This characterization ensures i) robustness against data manipulation and survey design, and ii) allows seamless incorporation of new variables. Our results provide both theoretical clarity and practical safeguards for inference under marginal constraints. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.15165 |
| By: | Stephane Hess; Sander van Cranenburgh |
| Abstract: | Models allowing for random heterogeneity, such as mixed logit and latent class, are generally observed to obtain superior model fit and yield detailed insights into unobserved preference heterogeneity. Using theoretical arguments and two case studies on revealed and stated choice data, this paper highlights that these advantages do not translate into any benefits in forecasting, whether looking at prediction performance or the recovery of market shares. The only exception arises when using conditional distributions in making predictions for the same individuals included in the estimation sample, which obviously precludes any out-of-sample forecasting. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.09185 |
| By: | Jose Blanchet; Mark S. Squillante; Mario Szegedy; Guanyang Wang |
| Abstract: | This tutorial paper introduces quantum approaches to Monte Carlo computation with applications in computational finance. We outline the basics of quantum computing using Grover's algorithm for unstructured search to build intuition. We then move slowly to amplitude estimation problems and applications to counting and Monte Carlo integration, again using Grover-type iterations. A hands-on Python/Qiskit implementation illustrates these concepts applied to finance. The paper concludes with a discussion on current challenges in scaling quantum simulation techniques. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.18614 |
| By: | Daniel Cunha Oliveira; Grover Guzman; Nick Firoozye |
| Abstract: | Robust optimization provides a principled framework for decision-making under uncertainty, with broad applications in finance, engineering, and operations research. In portfolio optimization, uncertainty in expected returns and covariances demands methods that mitigate estimation error, parameter instability, and model misspecification. Traditional approaches, including parametric, bootstrap-based, and Bayesian methods, enhance stability by relying on confidence intervals or probabilistic priors but often impose restrictive assumptions. This study introduces a non-parametric bootstrap framework for robust optimization in financial decision-making. By resampling empirical data, the framework constructs flexible, data-driven confidence intervals without assuming specific distributional forms, thus capturing uncertainty in statistical estimates, model parameters, and utility functions. Treating utility as a random variable enables percentile-based optimization, naturally suited for risk-sensitive and worst-case decision-making. The approach aligns with recent advances in robust optimization, reinforcement learning, and risk-aware control, offering a unified perspective on robustness and generalization. Empirically, the framework mitigates overfitting and selection bias in trading strategy optimization and improves generalization in portfolio allocation. Results across portfolio and time-series momentum experiments demonstrate that the proposed method delivers smoother, more stable out-of-sample performance, offering a practical, distribution-free alternative to traditional robust optimization methods. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.12725 |
| By: | Jesus Felipe; John McCombie; Aashish Mehta |
| Abstract: | For decades, the literature on the estimation of production functions has focused on the elimination of endogeneity biases through different estimation procedures to obtain the correct factor elasticities and other relevant parameters. Theoretical discussions of the problem correctly assume that production functions are relationships among physical inputs and output. However, in practice, they are most often estimated using deflated monetary values for output (value added or gross output) and capital. This introduces two additional problems--an errors-in-variables problem, and a tendency to recover the factor shares in value added instead of their elasticities. The latter problem derives from the fact that the series used are linked through the accounting identity that links value added to the sum of the wage bill and profits. Using simulated data from a cross-sectional Cobb-Douglas production function in physical terms from which we generate the corresponding series in monetary values, we show that the coefficients of labor and capital derived from the monetary series will be (a) biased relative to the elasticities by simultaneity and by the error that results from proxying physical output and capital with their monetary values; and (b) biased relative to the factor shares in value added as a result of a peculiar form of omitted variables bias. We show what these biases are and conclude that estimates of production functions obtained using monetary values are likely to be closer to the factor shares than to the factor elasticities. An alternative simulation that does not assume the existence of a physical production function confirms that estimates from the value data series will converge to the factor shares when cross-sectional variation in the factor prices is small. This is, again, the result of the fact that the estimated relationship is an approximation to the distributional accounting identity. |
| Keywords: | Endogeneity; Monetary Values; Physical Quantities; Production Functions |
| JEL: | C18 C81 C82 |
| Date: | 2024–01 |
| URL: | https://d.repec.org/n?u=RePEc:lev:wrkpap:wp_1036 |
| By: | Alexandra Piller (Study Center Gerzensee and University of Bern); Marc Schranz (University of Bern); Larissa Schwaller (University of Bern) |
| Abstract: | Identifying the causal effects of monetary policy is challenging due to the endogeneity of policy decisions. In recent years, high-frequency monetary policy surprises have become a popular identification strategy. To serve as a valid instrument, monetary policy surprises must be correlated with the true policy shock (relevant) while remaining uncorrelated with other shocks (exogenous). However, market-based monetary policy surprises around Federal Open Market Committee (FOMC) announcements often suffer from weak relevance and endogeneity concerns. This paper explores whether text analysis methods applied to central bank communication can help mitigate these concerns. We adopt two complementary approaches. First, to improve instrument relevance, we extend the dataset of monetary policy surprises from FOMC announcements to policy-relevant speeches by the Federal Reserve Board chair and vice chair. Second, using natural language processing techniques, we predict changes in market expectations from central bank communication, isolating the component of monetary policy surprises driven solely by communication. The resulting language-driven monetary policy surprises exhibit stronger instrument relevance, mitigate endogeneity concerns and produce impulse responses that align with standard macroeconomic theory. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:szg:worpap:2505 |