nep-ecm New Economics Papers
on Econometrics
Issue of 2025–03–17
twenty papers chosen by
Sune Karlsson, Örebro universitet


  1. Minimum Distance Estimation of Quantile Panel Data Models By Blaise Melly; Martina Pons
  2. Bayesian inference for dynamic spatial quantile models with interactive effects By Tomohiro Ando; Jushan Bai; Kunpeng Li; Yong Song
  3. Difference-in-Differences and Changes-in-Changes with Sample Selection By Javier Viviens
  4. Comment on "Generic machine learning inference on heterogeneous treatment effects in randomized experiments." By Kosuke Imai; Michael Lingzhi Li
  5. Double Robust, Flexible Adjustment Methods for Causal Inference: An Overview and an Evaluation By Hoffmann, Nathan Isaac
  6. Grouped fixed effects regularization for binary choice models By Claudia Pigini; Alessandro Pionati; Francesco Valentini
  7. Using quantile time series and historical simulation to forecast financial risk multiple steps ahead By Richard Gerlach; Antonio Naimoli; Giuseppe Storti
  8. A sliced Wasserstein and diffusion approach to random coefficient models By Keunwoo Lim; Ting Ye; Fang Han
  9. Vector Copula Variational Inference and Dependent Block Posterior Approximations By Yu Fu; Michael Stanley Smith; Anastasios Panagiotelis
  10. The Uncertainty of Machine Learning Predictions in Asset Pricing By Yuan Liao; Xinjie Ma; Andreas Neuhierl; Linda Schilling
  11. Structural breaks detection and variable selection in dynamic linear regression via the Iterative Fused LASSO in high dimension By Angelo Milfont; Alvaro Veiga
  12. Estimating Parameters of Structural Models Using Neural Networks By Yanhao; Wei; Zhenling Jiang
  13. White Noise and Its Misapplications: Impacts on Time Series Model Adequacy and Forecasting By Hossein Hassani; Leila Marvian Mashhad; Manuela Royer-Carenzi; Mohammad Reza Yeganegi; Nadejda Komendantova
  14. Scenario Analysis with Multivariate Bayesian Machine Learning Models By Michael Pfarrhofer; Anna Stelzer
  15. Uniform Limit Theory for Network Data By Yuya Sasaki
  16. Poverty Mapping in the Age of Machine Learning By Corral Rodas, Paul Andres; Henderson, Heath Linn; Segovia Juarez, Sandra Carolina
  17. Forecasting realized volatility in the stock market: a path-dependent perspective By Xiangdong Liu; Sicheng Fu; Shaopeng Hong
  18. Dynamic Factor Correlation Model By Chen Tong; Peter Reinhard Hansen
  19. Event history analysis with two time scales. An application to transitions out of cohabitation By Carollo, Angela; Putter, Hein; Eilers, Paul H. C.; Gampe, Jutta
  20. FactorGCL: A Hypergraph-Based Factor Model with Temporal Residual Contrastive Learning for Stock Returns Prediction By Yitong Duan; Weiran Wang; Jian Li

  1. By: Blaise Melly; Martina Pons
    Abstract: We propose a minimum distance estimation approach for quantile panel data models where unit effects may be correlated with covariates. This computationally efficient method involves two stages: first, computing quantile regression within each unit, then applying GMM to the first-stage fitted values. Our estimators apply to (i) classical panel data, tracking units over time, and (ii) grouped data, where individual-level data are available, but treatment varies at the group level. Depending on the exogeneity assumptions, this approach provides quantile analogs of classic panel data estimators, including fixed effects, random effects, between, and Hausman-Taylor estimators. In addition, our method offers improved precision for grouped (instrumental) quantile regression compared to existing estimators. We establish asymptotic properties as the number of units and observations per unit jointly diverge to infinity. Additionally, we introduce an inference procedure that automatically adapts to the potentially unknown convergence rate of the estimator. Monte Carlo simulations demonstrate that our estimator and inference procedure perform well in finite samples, even when the number of observations per unit is moderate. In an empirical application, we examine the impact of the food stamp program on birth weights. We find that the program's introduction increased birth weights predominantly at the lower end of the distribution, highlighting the ability of our method to capture heterogeneous effects across the outcome distribution.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.18242
  2. By: Tomohiro Ando; Jushan Bai; Kunpeng Li; Yong Song
    Abstract: With the rapid advancement of information technology and data collection systems, large-scale spatial panel data presents new methodological and computational challenges. This paper introduces a dynamic spatial panel quantile model that incorporates unobserved heterogeneity. The proposed model captures the dynamic structure of panel data, high-dimensional cross-sectional dependence, and allows for heterogeneous regression coefficients. To estimate the model, we propose a novel Bayesian Markov Chain Monte Carlo (MCMC) algorithm. Contributions to Bayesian computation include the development of quantile randomization, a new Gibbs sampler for structural parameters, and stabilization of the tail behavior of the inverse Gaussian random generator. We establish Bayesian consistency for the proposed estimation method as both the time and cross-sectional dimensions of the panel approach infinity. Monte Carlo simulations demonstrate the effectiveness of the method. Finally, we illustrate the applicability of the approach through a case study on the quantile co-movement structure of the gasoline market.
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.00772
  3. By: Javier Viviens
    Abstract: Sample selection arises endogenously in causal research when the treatment affects whether certain units are observed. It is a common pitfall in longitudinal studies, particularly in settings where treatment assignment is confounded. In this paper, I highlight the drawbacks of one of the most popular identification strategies in such settings: Difference-in-Differences (DiD). Specifically, I employ principal stratification analysis to show that the conventional ATT estimand may not be well defined, and the DiD estimand cannot be interpreted causally without additional assumptions. To address these issues, I develop an identification strategy to partially identify causal effects on the subset of units with well-defined and observed outcomes under both treatment regimes. I adapt Lee bounds to the Changes-in-Changes (CiC) setting (Athey & Imbens, 2006), leveraging the time dimension of the data to relax the unconfoundedness assumption in the original trimming strategy of Lee (2009). This setting has the DiD identification strategy as a particular case, which I also implement in the paper. Additionally, I explore how to leverage multiple sources of sample selection to relax the monotonicity assumption in Lee (2009), which may be of independent interest. Alongside the identification strategy, I present estimators and inference results. I illustrate the relevance of the proposed methodology by analyzing a job training program in Colombia.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.08614
  4. By: Kosuke Imai; Michael Lingzhi Li
    Abstract: We analyze the split-sample robust inference (SSRI) methodology proposed by Chernozhukov, Demirer, Duflo, and Fernandez-Val (CDDF) for quantifying uncertainty in heterogeneous treatment effect estimation. While SSRI effectively accounts for randomness in data splitting, its computational cost can be prohibitive when combined with complex machine learning (ML) models. We present an alternative randomization inference (RI) approach that maintains SSRI's generality without requiring repeated data splitting. By leveraging cross-fitting and design-based inference, RI achieves valid confidence intervals while significantly reducing computational burden. We compare the two methods through simulation, demonstrating that RI retains statistical efficiency while being more practical for large-scale applications.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.06758
  5. By: Hoffmann, Nathan Isaac
    Abstract: Double robust methods for flexible covariate adjustment in causal inference have proliferated in recent years. Despite their apparent advantages, these methods are rarely used by social scientists. It is also unclear whether these methods actually outperform more traditional methods in finite samples. This paper has two aims: It is a guide to some of the latest methods in double robust, flexible covariate adjustment using machine learning, and it compares these methods to more traditional statistical methods and flexible "single robust" methods. It does this by using both simulated data where the treatment effect estimate is known, and then by replicating some prominent articles in sociology that use simpler methods. Double robust methods covered include Augmented Inverse Probability Weighting (AIPW), Targeted Maximum Likelihood Estimation (TMLE), and Double/Debiased Machine Learning (DML). Results suggest that some of these methods do outperform traditional methods in a wide range of simulations, but only slightly. In particular, the top performers are TMLE and AIPW in conjunction with flexible machine learning estimators. But G-computation with the same flexible estimators obtains almost identical results, and simple regression methods have only slightly higher bias and are much more computationally efficient. In the article replications, the application of double robust methods substantively changes some, but not all, of the results, highlighting the importance of comparing performance from multiple estimators.
    Date: 2023–08–29
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:dzayg_v1
  6. By: Claudia Pigini; Alessandro Pionati; Francesco Valentini
    Abstract: We study the application of the Grouped Fixed Effects (GFE) estimator (Bonhomme et al., ECMTA 90(2):625-643, 2022) to binary choice models for network and panel data. This approach discretizes unobserved heterogeneity via k-means clustering and performs maximum likelihood estimation, reducing the number of fixed effects in finite samples. This regularization helps analyze small/sparse networks and rare events by mitigating complete separation, which can lead to data loss. We focus on dynamic models with few state transitions and network formation models for sparse networks. The effectiveness of this method is demonstrated through simulations and real data applications.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.06446
  7. By: Richard Gerlach; Antonio Naimoli; Giuseppe Storti
    Abstract: A method for quantile-based, semi-parametric historical simulation estimation of multiple step ahead Value-at-Risk (VaR) and Expected Shortfall (ES) models is developed. It uses the quantile loss function, analogous to how the quasi-likelihood is employed by standard historical simulation methods. The returns data are scaled by the estimated quantile series, then resampling is employed to estimate the forecast distribution one and multiple steps ahead, allowing tail risk forecasting. The proposed method is applicable to any data or model where the relationship between VaR and ES does not change over time and can be extended to allow a measurement equation incorporating realized measures, thus including Realized GARCH and Realized CAViaR type models. Its finite sample properties, and its comparison with existing historical simulation methods, are evaluated via a simulation study. A forecasting study assesses the relative accuracy of the 1% and 2.5% VaR and ES one-day-ahead and ten-day-ahead forecasting results for the proposed class of models compared to several competitors.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.20978
  8. By: Keunwoo Lim; Ting Ye; Fang Han
    Abstract: We propose a new minimum-distance estimator for linear random coefficient models. This estimator integrates the recently advanced sliced Wasserstein distance with the nearest neighbor methods, both of which enhance computational efficiency. We demonstrate that the proposed method is consistent in approximating the true distribution. Additionally, our formulation encourages a diffusion process-based algorithm, which holds independent interest and potential for broader applications.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.04654
  9. By: Yu Fu; Michael Stanley Smith; Anastasios Panagiotelis
    Abstract: Variational inference (VI) is a popular method to estimate statistical and econometric models. The key to VI is the selection of a tractable density to approximate the Bayesian posterior. For large and complex models a common choice is to assume independence between multivariate blocks in a partition of the parameter space. While this simplifies the problem it can reduce accuracy. This paper proposes using vector copulas to capture dependence between the blocks parsimoniously. Tailored multivariate marginals are constructed using learnable cyclically monotone transformations. We call the resulting joint distribution a ``dependent block posterior'' approximation. Vector copula models are suggested that make tractable and flexible variational approximations. They allow for differing marginals, numbers of blocks, block sizes and forms of between block dependence. They also allow for solution of the variational optimization using fast and efficient stochastic gradient methods. The efficacy and versatility of the approach is demonstrated using four different statistical models and 16 datasets which have posteriors that are challenging to approximate. In all cases, our method produces more accurate posterior approximations than benchmark VI methods that either assume block independence or factor-based dependence, at limited additional computational cost.
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.01072
  10. By: Yuan Liao; Xinjie Ma; Andreas Neuhierl; Linda Schilling
    Abstract: Machine learning in asset pricing typically predicts expected returns as point estimates, ignoring uncertainty. We develop new methods to construct forecast confidence intervals for expected returns obtained from neural networks. We show that neural network forecasts of expected returns share the same asymptotic distribution as classic nonparametric methods, enabling a closed-form expression for their standard errors. We also propose a computationally feasible bootstrap to obtain the asymptotic distribution. We incorporate these forecast confidence intervals into an uncertainty-averse investment framework. This provides an economic rationale for shrinkage implementations of portfolio selection. Empirically, our methods improve out-of-sample performance.
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.00549
  11. By: Angelo Milfont; Alvaro Veiga
    Abstract: We aim to develop a time series modeling methodology tailored to high-dimensional environments, addressing two critical challenges: variable selection from a large pool of candidates, and the detection of structural break points, where the model's parameters shift. This effort centers on formulating a least squares estimation problem with regularization constraints, drawing on techniques such as Fused LASSO and AdaLASSO, which are well-established in machine learning. Our primary achievement is the creation of an efficient algorithm capable of handling high-dimensional cases within practical time limits. By addressing these pivotal challenges, our methodology holds the potential for widespread adoption. To validate its effectiveness, we detail the iterative algorithm and benchmark its performance against the widely recognized Path Algorithm for Generalized Lasso. Comprehensive simulations and performance analyses highlight the algorithm's strengths. Additionally, we demonstrate the methodology's applicability and robustness through simulated case studies and a real-world example involving a stock portfolio dataset. These examples underscore the methodology's practical utility and potential impact across diverse high-dimensional settings.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.20816
  12. By: Yanhao (Max); Wei; Zhenling Jiang
    Abstract: We study an alternative use of machine learning. We train neural nets to provide the parameter estimate of a given (structural) econometric model, for example, discrete choice or consumer search. Training examples consist of datasets generated by the econometric model under a range of parameter values. The neural net takes the moments of a dataset as input and tries to recognize the parameter value underlying that dataset. Besides the point estimate, the neural net can also output statistical accuracy. This neural net estimator (NNE) tends to limited-information Bayesian posterior as the number of training datasets increases. We apply NNE to a consumer search model. It gives more accurate estimates at lighter computational costs than the prevailing approach. NNE is also robust to redundant moment inputs. In general, NNE offers the most benefits in applications where other estimation approaches require very heavy simulation costs. We provide code at: https://nnehome.github.io.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.04945
  13. By: Hossein Hassani; Leila Marvian Mashhad; Manuela Royer-Carenzi (I2M - Institut de Mathématiques de Marseille - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Mohammad Reza Yeganegi; Nadejda Komendantova
    Abstract: This paper contributes significantly to time series analysis by discussing the empirical properties of white noise and their implications for model selection. This paper illustrates the ways in which the standard assumptions about white noise typically fail in practice, with a special emphasis on striking differences in sample ACF and PACF. Such findings prove particularly important when assessing model adequacy and discerning between residuals of different models, especially ARMA processes. This study addresses issues involving testing procedures, for instance, the Ljung–Box test, to select the correct time series model determined in the review. With the improvement in understanding the features of white noise, this work enhances the accuracy of modeling diagnostics toward real forecasting practice, which gives it applied value in time series analysis and signal processing.
    Keywords: time series analysis, model selection, Hassani -1/2 theorem, white noise, ARMA, Gaussian, Ljung-Box test
    Date: 2025–02–05
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-04937317
  14. By: Michael Pfarrhofer; Anna Stelzer
    Abstract: We present an econometric framework that adapts tools for scenario analysis, such as variants of conditional forecasts and impulse response functions, for use with dynamic nonparametric multivariate models. We demonstrate the utility of our approach with simulated data and three real-world applications: (1) scenario-based conditional forecasts aligned with Federal Reserve stress test assumptions, measuring (2) macroeconomic risk under varying financial conditions, and (3) asymmetric effects of US-based financial shocks and their international spillovers. Our results indicate the importance of nonlinearities and asymmetries in dynamic relationships between macroeconomic and financial variables.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.08440
  15. By: Yuya Sasaki
    Abstract: I present a novel uniform law of large numbers (ULLN) for network-dependent data. While Kojevnikov, Marmer, and Song (KMS, 2021) provide a comprehensive suite of limit theorems and a robust variance estimator for network-dependent processes, their analysis focuses on pointwise convergence. On the other hand, uniform convergence is essential for nonlinear estimators such as M and GMM estimators (e.g., Newey and McFadden, 1994, Section 2). Building on KMS, I establish the ULLN under network dependence and demonstrate its utility by proving the consistency of both M and GMM estimators. A byproduct of this work is a novel maximal inequality for network data, which may prove useful for future research beyond the scope of this paper.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.00290
  16. By: Corral Rodas, Paul Andres; Henderson, Heath Linn; Segovia Juarez, Sandra Carolina
    Abstract: Recent years have witnessed considerable methodological advances in poverty mapping, much of which has focused on the application of modern machine-learning approaches to remotely sensed data. Poverty maps produced with these methods generally share a common validation procedure, which assesses model performance by comparing subnational machine-learning-based poverty estimates with survey-based, direct estimates. Although unbiased, survey-based estimates at a granular level can be imprecise measures of true poverty rates, meaning that it is unclear whether the validation procedures used in machine-learning approaches are informative of actual model performance. This paper examines the credibility of existing approaches to model validation by constructing a pseudo-census from the Mexican Intercensal Survey of 2015, which is used to conduct several design-based simulation experiments. The findings show that the validation procedure often used for machine-learning approaches can be misleading in terms of model assessment since it yields incorrect information for choosing what may be the best set of estimates across different methods and scenarios. Using alternative validation methods, the paper shows that machine-learning-based estimates can rival traditional, more data intensive poverty mapping approaches. Further, the closest approximation to existing machine-learning approaches, using publicly available geo-referenced data, performs poorly when evaluated against “true” poverty rates and fails to outperform traditional poverty mapping methods in targeting simulations.
    Date: 2023–05–01
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10429
  17. By: Xiangdong Liu; Sicheng Fu; Shaopeng Hong
    Abstract: Volatility forecasting in financial markets is a topic that has received more attention from scholars. In this paper, we propose a new volatility forecasting model that combines the heterogeneous autoregressive (HAR) model with a family of path-dependent volatility models (HAR-PD). The model utilizes the long- and short-term memory properties of price data to capture volatility features and trend features. By integrating the features of path-dependent volatility into the HAR model family framework, we develop a new set of volatility forecasting models. And, we propose a HAR-REQ model based on the empirical quartile as a threshold, which exhibits stronger forecasting ability compared to the HAR-REX model. Subsequently, the predictive performance of the HAR-PD model family is evaluated by statistical tests using data from the Chinese stock market and compared with the basic HAR model family. The empirical results show that the HAR-PD model family has higher forecasting accuracy compared to the underlying HAR model family. In addition, robustness tests confirm the significant predictive power of the HAR-PD model family.
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.00851
  18. By: Chen Tong; Peter Reinhard Hansen
    Abstract: We introduce a new dynamic factor correlation model with a novel variation-free parametrization of factor loadings. The model is applicable to high dimensions and can accommodate time-varying correlations, heterogeneous heavy-tailed distributions, and dependent idiosyncratic shocks, such as those observed in returns on stocks in the same subindustry. We apply the model to a "small universe" with 12 asset returns and to a "large universe" with 323 asset returns. The former facilitates a comprehensive empirical analysis and comparisons and the latter demonstrates the flexibility and scalability of the model.
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2503.01080
  19. By: Carollo, Angela (Max Planck Institute for Demographic Research); Putter, Hein; Eilers, Paul H. C.; Gampe, Jutta
    Abstract: Event history models are based on transition rates between states and, to define such hazards of experiencing an event, the time scale over which the process evolves needs to be identified. In many applications, however, more than one time scale might be of importance. Here we demonstrate how to model a hazard jointly over two time dimensions. The model assumes a smooth bivariate hazard function, and the function is estimated by two-dimensional P-splines. We provide an R-package TwoTimeScales for the analysis of event history data with two time scales. As an example, we model transitions from cohabitation to marriage or separation simultaneously over the age of the individual and the duration of the cohabitation. We use data from the German Family Panel (pairfam) and demonstrate that considering the two time scales as equally important provides additional insights about the transition from cohabitation to marriage or separation.
    Date: 2023–05–18
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:4ewv3_v1
  20. By: Yitong Duan; Weiran Wang; Jian Li
    Abstract: As a fundamental method in economics and finance, the factor model has been extensively utilized in quantitative investment. In recent years, there has been a paradigm shift from traditional linear models with expert-designed factors to more flexible nonlinear machine learning-based models with data-driven factors, aiming to enhance the effectiveness of these factor models. However, due to the low signal-to-noise ratio in market data, mining effective factors in data-driven models remains challenging. In this work, we propose a hypergraph-based factor model with temporal residual contrastive learning (FactorGCL) that employs a hypergraph structure to better capture high-order nonlinear relationships among stock returns and factors. To mine hidden factors that supplement human-designed prior factors for predicting stock returns, we design a cascading residual hypergraph architecture, in which the hidden factors are extracted from the residual information after removing the influence of prior factors. Additionally, we propose a temporal residual contrastive learning method to guide the extraction of effective and comprehensive hidden factors by contrasting stock-specific residual information over different time periods. Our extensive experiments on real stock market data demonstrate that FactorGCL not only outperforms existing state-of-the-art methods but also mines effective hidden factors for predicting stock returns.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.05218

This nep-ecm issue is ©2025 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.