nep-ecm 2025-01-06 papers

on Econometrics

Issue of 2025–01–06
seventeen papers chosen by
Sune Karlsson, Örebro universitet

High-Dimensional Time-Varying Coefficient Estimation By Donggyu Kim
Weak-Identification-Robust Bootstrap Tests after Pretesting for Exogeneity By Doko Tchatoka, Firmin; Wang, Wenjie
Testing for Endogeneity: A Moment-Based Bayesian Approach By Siddhartha Chib; Minchul Shin; Anna Simoni
Factor and Idiosyncratic VAR-Ito Volatility Models for Heavy-Tailed High-Frequency Financial Data By Jianqing Fan; Donggyu Kim; Minseok Shin; Yazhen Wang
Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations with a Factor Structure By Donggyu Kim; Minseok Shin
Robust High-Dimensional Time-Varying Coefficient Estimation By Donggyu Kim; Minseok Shin
Multivariate Rough Volatility By Ranieri Dugo; Giacomo Giorgio; Paolo Pigato
Boosting GMM with Many Instruments When Some Are Invalid and/or Irrelevant By Hao Hao; Tae-Hwy Lee
Adaptive Robust Large Volatility Matrix Estimation Based on High-Frequency Financial Data By Jianqing Fan; Donggyu Kim; Minseok Shin
Modeling Common Bubbles: A Mixed Causal Non-Causal Dynamic Factor Model By Gabriele Mingoli
Asymmetric AdaBoost for Maximum Score Estimation of High-dimensional Binary Choice Regression Models By Jianghao Chu; Tae-Hwy Lee; Aman Ullah
Estimating the variance of covariate-adjusted estimators of average treatment effects in clinical trials with binary endpoints By Magirr, Dominic; Wang, Craig; Przybylski, Alexander; Baillie, Mark
Robust Realized Integrated Beta Estimator with Application to Dynamic Analysis of Integrated Beta By Donggyu Kim; Minseog Oh; Yazhen Wang
Variable Inputs Allocation among Crops: A Time-Varying Random Parameters Approach By Koutchade, Obafémi Philippe; Carpentier, Alain; Féménia, Fabienne
Property of Inverse Covariance Matrix-based Financial Adjacency Matrix for Detecting Local Groups By Donggyu Kim; Minseog Oh
Quantitative Urban Economics By Stephen J. Redding
Glass Box Machine Learning and Corporate Bond Returns By Sebastian Bell; Ali Kakhbod; Martin Lettau; Abdolreza Nazemi

High-Dimensional Time-Varying Coefficient Estimation

By:	Donggyu Kim (Department of Economics, University of California Riverside)
Abstract:	In this paper, we develop a novel high-dimensional time-varying coefficient estimation method, based on high-dimensional Itô diffusion processes. To account for high-dimensional time-varying coefficients, we first estimate local (or instantaneous) coefficients using a time localized Dantzig selection scheme under a sparsity condition, which results in biased local coefficient estimators due to the regularization. To handle the bias, we propose a debiasing scheme, which provides well-performing unbiased local coefficient estimators. With the unbiased local coefficient estimators, we estimate the integrated coefficient, and to further account for the sparsity of the coefficient process, we apply thresholding schemes. We call this Thresholding dEbiased Dantzig (TED). We establish asymptotic properties of the proposed TED estimator. In the empirical analysis, we apply the TED procedure to analyzing high-dimensional factor models using high-frequency data.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202416

Weak-Identification-Robust Bootstrap Tests after Pretesting for Exogeneity

By:	Doko Tchatoka, Firmin; Wang, Wenjie
Abstract:	Pretesting for exogeneity has become a routine in many empirical applications involving instrumental variables (IVs) to decide whether the ordinary least squares (OLS) or the IV-based method is appropriate. Guggenberger (2010) shows that the second-stage t-test – based on the outcome of a Durbin-Wu-Hausman type pretest for exogeneity in the first stage – has extreme size distortion with asymptotic size equal to 1, even when the IVs are strong. In this paper, we propose a novel two-stage test procedure that switches between the OLS-based statistic and the weak-IV-robust statistic. Furthermore, we develop a size-corrected wild bootstrap approach, which combines certain wild bootstrap critical values along with an appropriate size-correction method. We establish uniform validity of this procedure under conditional heteroskedasticity in the sense that the resulting tests achieve correct asymptotic size no matter the identification is strong or weak. Monte Carlo simulations confirm our theoretical findings. In particular, our proposed method has remarkable power gains over the standard weak-identification-robust test.
Keywords:	DWH Pretest; Shrinkage; Weak Instruments; Asymptotic Size; Wild Bootstrap; Bonferroni-based Sizecorrection.
JEL:	C26
Date:	2024–12–20
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:123060

Testing for Endogeneity: A Moment-Based Bayesian Approach

By:	Siddhartha Chib; Minchul Shin; Anna Simoni
Abstract:	A standard assumption in the Bayesian estimation of linear regression models is that the regressors are exogenous in the sense that they are uncorrelated with the model error term. In practice, however, this assumption can be invalid. In this paper, under the rubric of the exponentially tilted empirical likelihood, we develop a Bayes factor test for endogeneity that compares a base model that is correctly specified under exogeneity but misspecified under endogeneity against an extended model that is correctly specified in either case. We provide a comprehensive study of the log-marginal exponentially tilted empirical likelihood. We demonstrate that our testing procedure is consistent from a frequentist point of view: as the sample becomes large, it almost surely selects the base model if and only if the regressors are exogenous, and the extended model if and only if the regressors are endogenous. The methods are illustrated with simulated data, and problems concerning the causal effect of automobile prices on automobile demand and the causal effect of potentially endogenous airplane ticket prices on passenger volume
Keywords:	Bayesian inference; Causal inference; Exponentially tilted empirical likelihood; Endogeneity; Exogeneity; Instrumental variables; Marginal likelihood; Posterior consistency
Date:	2024–11–25
URL:	https://d.repec.org/n?u=RePEc:fip:fedpwp:99168

Factor and Idiosyncratic VAR-Ito Volatility Models for Heavy-Tailed High-Frequency Financial Data

By:	Jianqing Fan; Donggyu Kim (Department of Economics, University of California Riverside); Minseok Shin; Yazhen Wang
Abstract:	This paper introduces a novel Ito diffusion process for both factor and idiosyncratic volatilities whose eigenvalues follow the vector auto-regressive (VAR) model. We call it the factor and idiosyncratic VAR-Ito (FIVAR-Ito) model. The FIVAR-Ito model considers dynamics of the factor and idiosyncratic volatilities and involve many parameters. In addition, the empirical studies have shown that the financial returns often exhibit heavy tails. To address these two issues simultaneously, we propose a penalized optimization procedure with a truncation scheme for a parameter estimation. We apply the proposed parameter estimation procedure to predicting large volatility matrices and investigate its asymptotic properties. Using high-frequency trading data, the proposed method is applied to large volatility matrix prediction and minimum variance portfolio allocation.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202415

Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations with a Factor Structure

By:	Donggyu Kim (Department of Economics, University of California Riverside); Minseok Shin
Abstract:	In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations with a factor structure. In high-frequency finance, we often observe that noises dominate the signal of underlying true processes and that covariates exhibit a factor structure due to their strong dependence. Thus, we cannot apply usual regression procedures to analyze high-frequency observations. To handle the noises, we first employ a smoothing method for the observed dependent and covariate processes. Then, to handle the strong dependence of the covariate processes, we apply Principal Component Analysis (PCA) and transform the highly correlated covariate structure into a weakly correlated structure. However, the variables from PCA still contain non-negligible noises. To manage these non negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Factor Adjusted Thresholded dEbiased Nonconvex LASSO (FATEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the FATEN-LASSO estimator and discusses a nonconvex optimization algorithm.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202418

Robust High-Dimensional Time-Varying Coefficient Estimation

By:	Donggyu Kim (Department of Economics, University of California Riverside); Minseok Shin
Abstract:	In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedure such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ Huber loss and truncation scheme to handle heavy-tailed observations, while â„“1-regularization is adopted to overcome the curse of dimensionality. To account for the time-varying coefficient, we estimate local coefficients which are biased due to the â„“1-regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large number property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We show that the RED LASSO estimator can achieve a near-optimal convergence rate. In the empirical study, we apply the RED-LASSO procedure to the high-dimensional integrated coefficient estimation using high-frequency trading data.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202417

Multivariate Rough Volatility

By:	Ranieri Dugo (DEF, University of Rome "Tor Vergata"); Giacomo Giorgio (Dept of Mathematics, University of Rome "Tor Vergata"); Paolo Pigato (DEF, University of Rome "Tor Vergata")
Abstract:	Motivated by empirical evidence from the joint behavior of realized volatility time series, we propose to model the joint dynamics of log-volatilities using a multivariate fractional Ornstein-Uhlenbeck process. This model is a multivariate version of the Rough Fractional Stochastic Volatility model proposed in Gatheral, Jaisson, and Rosenbaum, Quant. Finance, 2018. It allows for different Hurst exponents in the different marginal components and non trivial interdependencies. We discuss the main features of the model and propose an estimator that jointly identifies its parameters. We derive the asymptotic theory of the estimator and perform a simulation study that confirms the asymptotic theory in finite sample. We carry out an extensive empirical investigation on all realized volatility time series covering the entire span of about two decades in the Oxford-Man realized library. Our analysis shows that these time series are strongly correlated and can exhibit asymmetries in their cross-covariance structure, accurately captured by our model. These asymmetries lead to spillover effects that we analyse theoretically within the model and then using our empirical estimates. Moreover, in accordance with the existing literature, we observe behaviors close to non-stationarity and rough trajectories.
Keywords:	stochastic volatility, rough volatility, realized volatility, multivariate time series, volatility spillovers, mean reversion.
JEL:	C32 C51 C58 G17
Date:	2024–12–20
URL:	https://d.repec.org/n?u=RePEc:rtv:ceisrp:589

Boosting GMM with Many Instruments When Some Are Invalid and/or Irrelevant

By:	Hao Hao (Global Data Insight & Analytics, Ford Motor Company, Michigan); Tae-Hwy Lee (Department of Economics, University of California Riverside)
Abstract:	When the endogenous variable is an unknown function of observable instruments, Â its conditional mean can be approximated using the sieve functions of observable instruments.Â We propose a novel instrument selection method, Double-criteria BoostingÂ (DB), that consistently selects only valid and relevant instruments from a large setÂ of candidate instruments. In the Monte Carlo simulation, we compare GMM usingÂ DB (DB-GMM) with other estimation methods and demonstrate that DB-GMM givesÂ lower bias and RMSE. In the empirical application to the automobile demand, the DBGMMÂ estimator is suggesting a more elastic estimate of the price elasticity of demandÂ than the standard 2SLS estimator.
Keywords:	Causal inference with high dimensional instruments; Irrelevant instruments; Invalid instruments; Instrument Selection; Machine Learning; Boosting.
JEL:	C1 C5
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202411

Adaptive Robust Large Volatility Matrix Estimation Based on High-Frequency Financial Data

By:	Jianqing Fan; Donggyu Kim (Department of Economics, University of California Riverside); Minseok Shin
Abstract:	Several novel statistical methods have been developed to estimate large integrated volatility matrices based on high-frequency financial data. To investigate their asymptotic behaviors, they require a sub-Gaussian or finite high-order moment assumption for observed log-returns, which cannot account for the heavy-tail phenomenon of stock-returns. Recently, a robust estimator was developed to handle heavy-tailed distributions with some bounded fourth-moment assumption. However, we often observe that log-returns have heavier tail distribution than the finite fourth-moment and that the degrees of heaviness of tails are heterogeneous across asset and over time. In this paper, to deal with the heterogeneous heavy-tailed distributions, we develop an adaptive robust integrated volatility estimator that employs pre-averaging and truncation schemes based on jump-diffusion processes. We call this an adaptive robust pre-averaging realized volatility (ARP) estimator. We show that the ARP estimator has a sub-Weibull tail concentration with only finite 2Î±-th moments for any Î± > 1. In addition, we establish matching upper and lower bounds to show that the ARP estimation procedure is optimal. To estimate large integrated volatility matrices using the approximate factor model, the ARP estimator is further regularized using the principal orthogonal complement thresholding (POET) method. The numerical study is conducted to check the finite sample performance of the ARP estimator.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202419

Modeling Common Bubbles: A Mixed Causal Non-Causal Dynamic Factor Model

By:	Gabriele Mingoli (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	This paper introduces a novel dynamic factor model designed to capture common locally explosive episodes, also known as common bubbles, within large-dimensional, potentially non-stationary time series. The model leverages a lower-dimensional set of factors exhibiting locally explosive behavior to identify common extreme events. Modeling these explosive behaviors allows to predict systemic risk and test for the emergence of common bubbles. The dynamics of the explosive factors are modeled using mixed causal non-causal models, a class of heavy-tailed autoregressive models that allow processes to depend on their future values through a lead polynomial. The paper establishes the asymptotic properties of the model and provides sufficient conditions for consistency of the estimated factors and parameters. A Monte Carlo simulation confirms the good finite sample properties of the estimator, while an empirical analysis highlights its practical effectiveness. Specifically, the model accurately identifies the common explosive component in monthly stock prices of NASDAQ-listed energy companies during the financial crisis in 2008 and predicts its evolution significantly outperforming alternative forecasting methods.
JEL:	C22 C38 C53
Date:	2024–11–29
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20240072

Asymmetric AdaBoost for Maximum Score Estimation of High-dimensional Binary Choice Regression Models

By:	Jianghao Chu (JPMorgan Chase & Co., Jersey City, NJ 07310); Tae-Hwy Lee (Department of Economics, University of California Riverside); Aman Ullah (Department of Economics, University of California, Riverside)
Abstract:	Carter Hill's numerous contributions (books and articles) in econometrics stand out especially in pedagogy. An important aspect of his pedagogy is to integrate "theory and practice'' of econometrics, as coined into the titles of his popular books. The new methodology we propose in this paper is consistent with these contributions of Carter Hill. In particular, we bring the maximum score regression of \citet{Manski1975, Manski1985} to high dimension in theory and show that the "Asymmetric AdaBoost'' provides the algorithmic implementation of the high dimensional maximum score regression in practice. Recent advances in machine learning research have not only expanded the horizon of econometrics by providing new methods but also provided the algorithmic aspects of many of traditional econometrics methods. For example, Adaptive Boosting (AdaBoost) introduced by \citet{Freund1996} has gained enormous success in binary/discrete classification/prediction. In this paper, we introduce the ``Asymmetric AdaBoost'' and relate it to the maximum score regression in the algorithmic perspective. The Asymmetric AdaBoost solves high-dimensional binary classification/prediction problem with state-dependent loss functions. Asymmetric AdaBoost produces a nonparametric classifier via minimizing the "asymmetric exponential risk'' which is a convex surrogate of the non-convex 0-1 risk. The convex risk function gives a huge computational advantage over non-convex risk functions of \citet{Manski1975, Manski1985} especially when the data is high-dimensional. The resulting nonparametric classifier is more robust than the parametric classifiers whose performance depends on the correct specification of the model. We show that the risk of the classifier that Asymmetric AdaBoost produces approaches the Bayes risk which is the infimum of risk that can be achieved by all classifiers. Monte Carlo experiments show that the Asymmetric AdaBoost performs better than the commonly used LASSO-regularized logistic regression when parametric assumption is violated and sample size is large. We apply the Asymmetric AdaBoost to predict business cycle turning points as in \citet{Ng2014a}.
Keywords:	AdaBoost, Asymmetric Loss, Maximum Score Estimation, Binary Choice Models, High Dimensional Predictors
JEL:	C1 C5
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202414

Estimating the variance of covariate-adjusted estimators of average treatment effects in clinical trials with binary endpoints

By:	Magirr, Dominic; Wang, Craig (Novartis); Przybylski, Alexander; Baillie, Mark
Abstract:	Covariate-adjusted estimators of average treatment effects in clinical trials are typically more efficient than unadjusted estimators. Recent guidance from the FDA is highly detailed regarding the appropriate use of covariate adjustment for point estimation. Less direction is provided, however, on how to estimate the variance of such estimators. In this paper, we demonstrate that a precise description of the estimand is necessary to avoid ambiguity when comparing variance estimators for average treatment effects involving binary endpoints. When considering the suitability of a proposed estimand, together with a corresponding variance estimator, it is important to consider that the patients enrolled in clinical trials are typically a convenience sample. Since there is no unique way to map this process into formal statistical assumptions, it follows that a range of estimands, and therefore a range of variance estimators, may be acceptable. We aim to highlight through simulation results how the properties of proposed variance estimators differ, as well as the underlying reasons.
Date:	2024–12–23
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:k56v8

Robust Realized Integrated Beta Estimator with Application to Dynamic Analysis of Integrated Beta

By:	Donggyu Kim (Department of Economics, University of California Riverside); Minseog Oh; Yazhen Wang
Abstract:	In this paper, we develop a robust non-parametric realized integrated beta estimator using high-frequency financial data contaminated by microstructure noise, which is robust to the stylized features, such as the time-varying beta and the price-dependent and autocorrelated microstructure noise. With this robust realized integrated beta estimator, we investigate dynamic structures of integrated betas and find a persistent autoregressive structure. To model this dynamic structure, we utilize the autoregressivemoving-average (ARMA) model for daily integrated market betas. We call this the dynamic realized beta (DR Beta). Then, we propose a quasi-likelihood procedure for estimating the parameters of the ARMA model with the robust realized integrated beta estimator as the proxy. We establish asymptotic theorems for the proposed estimator and conduct a simulation study to check the performance of finite samples of the estimator. The proposed DR Beta model with the robust realized beta estimator is also illustrated by using data from the E-mini S&P 500 index futures and the top 50 large trading volume stocks from the S&P 500 and an application to constructing market-neutral portfolios.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202422

Variable Inputs Allocation among Crops: A Time-Varying Random Parameters Approach

By:	Koutchade, Obafémi Philippe; Carpentier, Alain; Féménia, Fabienne
Abstract:	In this paper, we propose an approach to allocate input uses among crops produced by farmers, based on panel data that includes input use aggregated at the farm-level. Our proposed approach simultaneously allows for (i) controlling for observed and unobserved farm heterogeneity, (ii) accounting for the potential dependence of input uses on acreage decisions, and (iii) ensuring consistent values of input use estimates. These are significant issues commonly faced in the estimation of input allocation equations. The approach is based on a model of input allocation derived from accounting identities, where unobserved input uses per crop are treated as time-varying random parameters. We estimate our model on a sample of French farms’ accounting data, by relying on an extension of the Stochastic Approximation of Expectation Maximization algorithm. Our results show good performance of our approach in accurately allocating input uses among crops, for the crops the most frequently produced in our data sample in particular.
Keywords:	Crop Production/Industries, Production Economics, Research Methods/ Statistical Methods
Date:	2024–12–12
URL:	https://d.repec.org/n?u=RePEc:ags:inrasl:348476

Property of Inverse Covariance Matrix-based Financial Adjacency Matrix for Detecting Local Groups

By:	Donggyu Kim (Department of Economics, University of California Riverside); Minseog Oh
Abstract:	In financial applications, we often observe both global and local factors that are modeled by a multi-level factor model. When detecting unknown local group memberships under such a model, employing a covariance matrix as an adjacency matrix for local group memberships is inadequate due to the predominant effect of global factors. Thus, to detect a local group structure more effectively, this study introduces an inverse covariance matrix-based financial adjacency matrix (IFAM) that utilizes negative values of the inverse covariance matrix. We show that IFAM ensures that the edge density between different groups vanishes, while that within the same group remains non-vanishing. This reduces falsely detected connections and helps identify local group membership accurately. To estimate IFAM under the multi-level factor model, we introduce a factor-adjusted GLASSO estimator to address the prevalent global factor effect in the inverse covariance matrix. An empirical study using returns from international stocks across 20 financial markets demonstrates that incorporating IFAM effectively detects latent local groups, which helps improve the minimum variance portfolio allocation performance.
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:ucr:wpaper:202420

Quantitative Urban Economics

By:	Stephen J. Redding (Princeton University, NBER and CEPR)
Abstract:	This paper reviews recent quantitative urban models. These models are sufficiently rich to capture observed features of the data, such as many asymmetric locations and a rich geography of the transport network. Yet these models remain sufficiently tractable as to permit an analytical characterization of their theoretical properties. With only a small number of structural parameters (elasticities) to be estimated, they lend themselves to transparent identification. As they rationalize the observed spatial distribution of economic activity within cities, they can be used to undertake counterfactuals for the impact of empirically-realistic public-policy interventions on this observed distribution. Empirical applications include estimating the strength of agglomeration economies and evaluating the impact of transport infrastructure improvements (e.g., railroads, roads, Rapid Bus Transit Systems), zoning and land use regulations, place-based policies, and new technologies such as remote working.
Keywords:	cities, commuting, transportation, urban economics
JEL:	R32 R41 R52
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:pri:cepsud:340

Glass Box Machine Learning and Corporate Bond Returns

By:	Sebastian Bell; Ali Kakhbod; Martin Lettau; Abdolreza Nazemi
Abstract:	Machine learning methods in asset pricing are often criticized for their black box nature. We study this issue by predicting corporate bond returns using interpretable machine learning on a high-dimensional bond characteristics dataset. We achieve state-of-the-art performance while maintaining an interpretable model structure, overcoming the accuracy-interpretability trade-off. The estimation uncovers nonlinear relationships and economically meaningful interactions in bond pricing, notably related to term structure and macroeconomic uncertainty. Subsample analysis reveals stronger sensitivities to these effects for small firms and long-maturity bonds. Finally, we demonstrate how interpretable models enhance transparency in portfolio construction by providing ex ante insights into portfolio composition.
JEL:	C45 C55 G11 G12
Date:	2024–12
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:33320

This nep-ecm issue is ©2025 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.