nep-ecm New Economics Papers
on Econometrics
Issue of 2026–03–02
25 papers chosen by
Sune Karlsson, Örebro universitet


  1. Earning While Learning: How to Run Batched Bandit Experiments By Kemper, Jan; Rostam-Afschar, Davud
  2. High-dimensional inference for Model Averaging estimators By Léonard, Lise; Pircalabelu, Eugen; von Sachs, Rainer
  3. Inference under First-Order Degeneracy By Xinyue Bei; Manu Navjeevan
  4. A penalized least squares estimator for extreme-value mixture models By Mourahib, Anas; Kiriliouk, Anna; Segers, Johan
  5. Two-way Clustering Robust Variance Estimator in Quantile Regression Models By Ulrich Hounyo; Jiahao Lin
  6. Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models By Hongyu Chen; David Simchi-Levi; Ruoxuan Xiong
  7. Consistency of M-estimators for non-identically distributed data: the case of fixed-design distributional regression By Bücher, Axel; Segers, Johan; Staud, Torben
  8. When composite likelihood meets stochastic approximation By Alfonzetti, Giuseppe; Bellio, Ruggero; Chen, Yunxiao; Moustaki, Irini
  9. Jackknife Inference for Fixed Effects Models By Ayden Higgins
  10. On testing Kronecker product structure in tensor factor models By Cen, Zetai; Lam, Clifford
  11. Fixed-Horizon Self-Normalized Inference for Adaptive Experiments via Martingale AIPW/DML with Logged Propensities By Gabriel Saco
  12. Causal Identification in Multi-Task Demand Learning with Confounding By Varun Gupta; Vijay Kamble
  13. Nonparametric Spatial Frontier Models for Productivity Analysis: Evidence from EU Regions By Mastromarco, Camilla; Simar, Léopold
  14. Generative modeling for the bootstrap By Leon Tran; Ting Ye; Peng Ding; Fang Han
  15. Wasserstein boosting trees algorithm for count data, with application to claim frequencies in motor insurance By Denuit, Michel; Michaelides, Marie; Trufin, Julien; Verelst, Harrison
  16. Sustainable Investment: ESG Impacts on Large Portfolio By Ruike Wu; Yonghe Lu; Yanrong Yang
  17. Insurance risk classification with Generalized Gaussian Process Regression models By Hainaut, Donatien; Denuit, Michel
  18. Model Restrictiveness in Functional and Structural Settings By Drew Fudenberg; Wayne Yuan Gao; Zhiheng You
  19. Using Prior Studies to Design Experiments: An Empirical Bayes Approach By Zhiheng You
  20. Nansde-net: A neural sde framework for generating time series with memory By Hiromu Ozai; Kei Nakagawa
  21. Wasserstein–Aitchison GAN for angular measures of multivariate extremes By Lhaut, Stéphane; Rootzén, Holger; Segers, Johan
  22. Identification of Child Penalties By Dor Leventer
  23. Forecasting the Evolving Composition of Inbound Tourism Demand: A Bayesian Compositional Time Series Approach Using Platform Booking Data By Harrison Katz
  24. Model selection confidence sets for time series models with applications to electricity load data By Piersilvio De Bortoli; Davide Ferrari; Francesco Ravazzolo; Luca Rossini
  25. Winner's Curse Drives False Promises in Data-Driven Decisions: A Case Study in Refugee Matching By Hamsa Bastani; Osbert Bastani; Bryce McLaughlin

  1. By: Kemper, Jan; Rostam-Afschar, Davud
    Abstract: Researchers typically collect experimental data sequentially, allowing early outcome observations and adaptive treatment assignment to reduce exposure to inferior treatments. This article reviews multi-armed-bandit adaptive experimental designs that balance exploration and exploitation. Because adaptively collected experimental data through bandit algorithms violate standard asymptotics, inference is challenging. We implement an estimator that yields valid heteroskedasticity-robust confidence intervals in batched bandit designs and compare coverage in Monte Carlo simulations. We introduce bbandits for Stata, a tool for designing experiments via simulation, running interactive bandit experiments, and implementing and analyzing adaptively collected data. bbandits includes three common assignment algorithms-e-first, e-greedy, and Thompson sampling-and supports estimation, inference, and visualization.
    Keywords: Randomized controlled trial, causal inference, multi-armed bandits, experimental design, machine learning
    JEL: C1 C11 C12 C13 C15 C18 C8 C87 C88 C9 D83
    Date: 2026
    URL: https://d.repec.org/n?u=RePEc:zbw:glodps:1717
  2. By: Léonard, Lise (Université catholique de Louvain, LIDAM/ISBA, Belgium); Pircalabelu, Eugen (Université catholique de Louvain, LIDAM/ISBA, Belgium); von Sachs, Rainer (Université catholique de Louvain, LIDAM/ISBA, Belgium)
    Abstract: Selection methods for high-dimensional models are well developed, but they do not take into account the choice of the model, which leads to an underestimated variance. We propose a procedure for high-dimensional model averaging that allows inference even when the number of predictors is greater than the sample size. The proposed estimator is constructed from the debiased Lasso and the weights are chosen to reduce the prediction risk associated with them. We derive the asymptotic distribution of the estimator within a high-dimensional framework and offer guarantees for the minimal loss prediction obtained from the weights. With this, in contrast to existing approaches, our proposed method combines the advantages of model averaging with the possibility of inference based on asymptotic normality. In a simulation study and on a real, high-dimensional dataset, the estimator shows a smaller prediction risk than its competitors.
    Keywords: Debiased Lasso ; High-Dimensional Inference ; Model Averaging ; Prediction Risk
    Date: 2025–06–11
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025014
  3. By: Xinyue Bei; Manu Navjeevan
    Abstract: We study inference in models where a transformation of parameters exhibits first-order degeneracy -- that is, its gradient is zero or close to zero, making the standard delta method invalid. A leading example is causal mediation analysis, where the indirect effect is a product of coefficients and the gradient degenerates near the origin. In these local regions of degeneracy the limiting behaviors of plug-in estimators depend on nuisance parameters that are not consistently estimable. We show that this failure is intrinsic -- around points of degeneracy, both regular and quantile-unbiased estimation are impossible. Despite these restrictions, we develop minimum-distance methods that deliver uniformly valid confidence intervals. We establish sufficient conditions under which standard chi-square critical values remain valid, and propose a simple bootstrap procedure when they are not. We demonstrate favorable power in simulations and in an empirical application linking teacher gender attitudes to student outcomes.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.07377
  4. By: Mourahib, Anas (Université catholique de Louvain, LIDAM/ISBA, Belgium); Kiriliouk, Anna (UNamur); Segers, Johan (Université catholique de Louvain, LIDAM/ISBA, Belgium)
    Abstract: Estimating the parameters of max-stable parametric models poses significant challenges, particularly when some parameters lie on the boundary of the parameter space. This situation arises when a subset of variables exhibits extreme values simultaneously, while the remaining variables do not—a phenomenon referred to as an extreme direction in the literature. In this paper, we propose a novel estimator for the parameters of a general parametric mixture model, incorporating a penalization approach based on a pseudo-norm. This penalization plays a crucial role in accurately identifying parameters at the boundary of the parameter space. Additionally, our estimator comes with a data-driven algorithm to detect groupsof variables corresponding to extreme directions. We assess the performance of our estimator in terms of both parameter estimation and the identification of extreme directions through extensive simulation studies. Finally, we apply our methods to data on river discharges and financial portfolio losses.
    Date: 2025–06–19
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025015
  5. By: Ulrich Hounyo; Jiahao Lin
    Abstract: We study inference for linear quantile regression with two-way clustered data. Using a separately exchangeable array framework and a projection decomposition of the quantile score, we characterize regime-dependent convergence rates and establish a self-normalized Gaussian approximation. We propose a two-way cluster-robust sandwich variance estimator with a kernel-based density ``bread'' and a projection-matched ``meat'', and prove consistency and validity of inference in Gaussian regimes. We also show an impossibility result for uniform inference in a non-Gaussian interaction regime.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.16376
  6. By: Hongyu Chen; David Simchi-Levi; Ruoxuan Xiong
    Abstract: Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.16061
  7. By: Bücher, Axel; Segers, Johan (Université catholique de Louvain, LIDAM/ISBA, Belgium); Staud, Torben
    Abstract: This paper explores strong and weak consistency of M-estimators for non-identically distributed data, extending prior work. Emphasis is given to scenarios where data is viewed as a triangular array, which encompasses distributional regression models with non-random covariates. Primitive conditions are established for specific applications, such as estimation based on minimizing empirical proper scoring rules or conditional maximum likelihood. A key motivation is addressing challenges in extreme value statistics, where parameter-dependent supports can cause criterion functions to attain the value −∞, hindering the application of existing theorems.
    Keywords: Block-Maxima Method ; Conditional Maximum Likelihood ; Distributional Regression ; Minimum Scoring Rule Estimation ; Non-Random Covariates
    Date: 2025–11–17
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025021
  8. By: Alfonzetti, Giuseppe; Bellio, Ruggero; Chen, Yunxiao; Moustaki, Irini
    Abstract: A composite likelihood is an inference function derived by multiplying a set of likelihood components. This approach provides a flexible framework for drawing inferences when the likelihood function of a statistical model is computationally intractable. While composite likelihood has computational advantages, it can still be demanding when dealing with numerous likelihood components and a large sample size. This article tackles this challenge by employing an approximation of the conventional composite likelihood estimator based on a stochastic optimization procedure. This novel estimator is shown to be asymptotically normally distributed around the true parameter. In particular, based on the relative divergent rate of the sample size and the number of iterations of the optimization, the variance of the limiting distribution is shown to compound for two sources of uncertainty: the sampling variability of the data and the optimization noise, with the latter depending on the sampling distribution used to construct the stochastic gradients. The advantages of the proposed framework are illustrated through simulation studies on two working examples: an Ising model for binary data and a gamma frailty model for count data. Finally, a real-data application is presented, showing its effectiveness in a large-scale mental health survey. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
    Keywords: exchangeable variables central limit theorem; Ising model; gamma frailty model; pairwise likelihood; stochastic gradient; central limit theorem
    JEL: C1
    Date: 2025–09–30
    URL: https://d.repec.org/n?u=RePEc:ehl:lserod:126177
  9. By: Ayden Higgins
    Abstract: This paper develops a general method of inference for fixed effects models which is (i) automatic, (ii) computationally inexpensive, and (iii) highly model agnostic. Specifically, we show how to combine a collection of subsample estimators into a self-normalised jackknife $t$-statistic, from which hypothesis tests, confidence intervals, and $p$-values are readily obtained.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.21903
  10. By: Cen, Zetai; Lam, Clifford
    Abstract: We propose a test for the Kronecker product structure of a factor loading matrix implied by a tensor factor model with Tucker decomposition in the common component. By defining a Kronecker product structure set, we determine whether a tensor time series has a Kronecker product structure, equivalent to its ability to decompose the series according to a tensor factor model. Our test is built on analysing and comparing the residuals from fitting a full tensor factor model, and the residuals from fitting a factor model on a reshaped version of the data. In the most extreme case, the reshaping is the vectorization of the tensor data, and the factor loading matrix in such a case can be general if there is no Kronecker product structure present. Our test is also generalized to the Khatri–Rao product structure in a tensor factor model with canonical polyadic decomposition. Theoretical results are developed through asymptotic normality results on estimated residuals. Numerical experiments suggest that the size of the tests approaches the pre-set nominal value as the sample size or the order of the tensor increases, while the power increases with mode dimensions and the number of combined modes. We demonstrate our tests through extensive real data examples.
    Keywords: factor-structured idiosyncratic error; tensor refold; tensor reshape; weak factor
    JEL: C1
    Date: 2025–12–31
    URL: https://d.repec.org/n?u=RePEc:ehl:lserod:129613
  11. By: Gabriel Saco
    Abstract: Adaptive randomized experiments update treatment probabilities as data accrue, but still require an end-of-study interval for the average treatment effect (ATE) at a prespecified horizon. Under adaptive assignment, propensities can keep changing, so the predictable quadratic variation of AIPW/DML score increments may remain random. When no deterministic variance limit exists, Wald statistics normalized by a single long-run variance target can be conditionally miscalibrated given the realized variance regime. We assume no interference, sequential randomization, i.i.d. arrivals, and executed overlap on a prespecified scored set, and we require two auditable pipeline conditions: the platform logs the executed randomization probability for each unit, and the nuisance regressions used to score unit $t$ are constructed predictably from past data only. These conditions make the centered AIPW/DML scores an exact martingale difference sequence. Using self-normalized martingale limit theory, we show that the Studentized statistic, with variance estimated by realized quadratic variation, is asymptotically N(0, 1) at the prespecified horizon, even without variance stabilization. Simulations validate the theory and highlight when standard fixed-variance Wald reporting fails.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.15559
  12. By: Varun Gupta; Vijay Kamble
    Abstract: We study a canonical multi-task demand learning problem motivated by retail pricing, in which a firm seeks to estimate heterogeneous linear price-response functions across a large collection of decision contexts. Each context is characterized by rich observable covariates yet typically exhibits only limited historical price variation, motivating the use of multi-task learning to borrow strength across tasks. A central challenge in this setting is endogeneity: historical prices are chosen by managers or algorithms and may be arbitrarily correlated with unobserved, task-level demand determinants. Under such confounding by latent fundamentals, commonly used approaches, such as pooled regression and meta-learning, fail to identify causal price effects. We propose a new estimation framework that achieves causal identification despite arbitrary dependence between prices and latent task structure. Our approach, Decision-Conditioned Masked-Outcome Meta-Learning (DCMOML), involves carefully designing the information set of a meta-learner to leverage cross-task heterogeneity while accounting for endogenous decision histories. Under a mild restriction on price adaptivity in each task, we establish that this method identifies the conditional mean of the task-specific causal parameters given the designed information set. Our results provide guarantees for large-scale demand estimation with endogenous prices and small per-task samples, offering a principled foundation for deploying causal, data-driven pricing models in operational environments.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.09969
  13. By: Mastromarco, Camilla; Simar, Léopold (Université catholique de Louvain, LIDAM/ISBA, Belgium)
    Abstract: This paper proposes a novel nonparametric panel data framework for estimating conditional production frontiers and efficiency measures that explicitly accounts for spatial interdependencies. By integrating recent advances in nonparametric frontier estimation with spatial panel data analysis, the proposed approach offers a flexible and robust framework for assessing productivity and efficiency in the presence of spatial interactions, explicitly accounting for both global and local spatial effects. By extending recently developed tools for estimating Malmquist productivity indices to conditional nonparametric frontier efficiency models, we provide a refined decomposition of productivity growth into technological change, efficiency change, and scale effects within a fully nonparametric framework. Applying this framework to a comprehensive dataset on European regions, we provide new evidence on spatial patterns of productivity growth and efficiency dynamics across the EU. The results reveal marked heterogeneity in regional performance and highlight the crucial role of spatial spillovers in shaping productivity outcomes. Ignoring these interdependencies can lead to mismeasurement of productivity trends, reinforcing the value of our proposed spatial nonparametric frontier approach for policy and performance analysis.
    Keywords: Nonparametric Conditional Frontier ; Panel Data Model ; Spatial Dependence ; Productivity Analysis ; Malmquist Productivity Index ; EU Regional Performance
    JEL: C14 C13 C33 D24 O47
    Date: 2025–11–13
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025020
  14. By: Leon Tran; Ting Ye; Peng Ding; Fang Han
    Abstract: Generative modeling builds on and substantially advances the classical idea of simulating synthetic data from observed samples. This paper shows that this principle is not only natural but also theoretically well-founded for bootstrap inference: it yields statistically valid confidence intervals that apply simultaneously to both regular and irregular estimators, including settings in which Efron's bootstrap fails. In this sense, the generative modeling-based bootstrap can be viewed as a modern version of the smoothed bootstrap: it could mitigate the curse of dimensionality and remain effective in challenging regimes where estimators may lack root-$n$ consistency or a Gaussian limit.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.17052
  15. By: Denuit, Michel (Université catholique de Louvain, LIDAM/ISBA, Belgium); Michaelides, Marie (Heriot-Watt University); Trufin, Julien (ULB); Verelst, Harrison (Detralytics)
    Abstract: This paper proposes a variant of the well-known boosting trees algorithm to estimate conditional distributions. Since regression trees partition observations into subgroups, the corresponding empirical distributions can be used to define the splitting criterion. Precisely, the parametric approach using Poisson deviance is replaced with a non-parametric one maximizing probabilistic distances between empirical distributions in child nodes. Proceeding inthis way, the actuary obtains an estimated conditional distribution for the response, from which a conditional mean can be derived as well as any other quantity of interest in risk management. The numerical performances of the proposed method are assessed with simulated data while a case study demonstrates its usefulness for insurance applications.
    Keywords: Wasserstein distance ; regression trees ; boosting ; conditional distribution ; count data
    Date: 2025–11–06
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025024
  16. By: Ruike Wu; Yonghe Lu; Yanrong Yang
    Abstract: This paper investigates the impact of environmental, social, and governance (ESG) constraint on a regularized mean-variance (MV) portfolio optimization problem in a large-dimensional setting, in which a positive definite regularization matrix is imposed on the sample covariance matrix. We first derive the asymptotic results for the out-of-sample (OOS) Sharpe ratio (SR) of the proposed portfolio, which help quantify the impact of imposing an ESG-level constraint as well as the effect of estimation error arising from the sample mean estimation of the assets' ESG score. Furthermore, to study the influence of the choices of the regularization matrix, we develop an estimator for the OOS Sharpe ratio. The corresponding asymptotic properties of the Sharpe ratio estimator are established based on random matrix theory. Simulation results show that the proposed estimators perform close to the corresponding oracle level. Moreover, we numerically investigate the impact of various forms of regularization matrices on the OOS SR, which provides useful guidance for practical implementation. Finally, based on OOS SR estimator, we propose an adaptive regularized portfolio which uses the best regularization matrix yielding the highest estimated SR (among a set of candidates) at each decision node. Empirical evidence based on the S\&P 500 index demonstrates that the proposed adaptive ESG-constrained portfolio achieves a high OOS SR while satisfying the required ESG level, offering a practically effective approach for sustainable investment.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.14439
  17. By: Hainaut, Donatien (Université catholique de Louvain, LIDAM/ISBA, Belgium); Denuit, Michel (Université catholique de Louvain, LIDAM/ISBA, Belgium)
    Abstract: This paper proposes a new approach to risk classification based on Generalized Gaussian Process Regression (GGPR). The response under consideration obeys a distribution belonging to the Exponential Dispersion (ED) family. It typically corresponds to a claim count or a claim severity in the context of insurance studies. GGPR is a supervised machine learning method with Bayesian flavor. Individual random effects obeying a multivariate Normal distribution are connected with the help of their covariance matrix built from a so-called kernel function. The latter enforces smoothness, borrowing information from similar risk profiles. Bayesian Generalized Linear Models (GLMs) and Generalized Additive Models (GAMs) are recovered as special cases, assuming a highly-structured prior covariance matrix. Compared to the existing literature, this paper innovates to account for the specificity of data entering insurance studies. First, proper risk exposures are included in model formulation and development. Second, parameters are estimated by minimizing deviance instead of an approximated loglikelihood. Third, categorical features that are often encountered in insurance data bases are coded with the help of an embedding method based on Burt matrices. Fourth, K-means clustering is used to reduce the dimension of the problem and create model points within large insurance portfolios. Numerical illustrations performed on publicly available insurance data sets illustrate the relevance of the GGPR approach to risk classification. Benchmarked against the classical GLM, the performances of GGPR turn out to be excellent given its reduced number of parameters. This suggests that GGPR nicely enriches the actuarial toolkit by providing preliminary predictions that can then be structured with additive scores like those entering GLMs and GAMs.
    Keywords: Exponential Dispersion family ; Mixed models ; Risk classification ; Categorical embedding ; Burt distance ; Model points
    Date: 2025–03–06
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025004
  18. By: Drew Fudenberg; Wayne Yuan Gao; Zhiheng You
    Abstract: We generalize the notion of model restrictiveness in Fudenberg, Gao and Liang (2026) to a wider range of economic models with semi/non-parametric and structural ingredients. We show how restrictiveness can be defined and computed in infinite-dimensional settings using Gaussian process priors (including with shape restrictions) and other alternativess in Bayesian nonparametrics. We also extend the restrictiveness framework to structural models with endogeneity, instrumental variables, multiple equilibria, and nonparametric nuisance components. We discuss the importance of the user-specific choice of discrepancy functions in the context of Rademacher complexity and GMM criterion function, and relate restrictiveness to the limit of the average-case learning curve in machine learning. We consider applications to: (1) preferences under risk, (2) exogenous multinomial choice, and (3) multinomial choice with endogenous prices: for (1), we obtain results consistent with those in Fudenberg, Gao and Liang (2026); for (2) and (3), our findings show that nested logit and mixed logit exhibit similar restrictiveness under standard parametric specifications, and that IV exogeneity conditions substantially increase overall restrictiveness while altering model rankings.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.07688
  19. By: Zhiheng You
    Abstract: We develop an empirical Bayes framework for experimental design that leverages information from prior related studies. When a researcher has access to estimates from previous studies on similar parameters, they can use empirical Bayes to estimate an informative prior over the parameter of interest in the new study. We show how this prior can be incorporated into a decision-theoretic experimental design framework to choose optimal design. The approach is illustrated via propensity score designs in stratified randomized experiments. Our theoretical results show that the empirical Bayes design achieves oracle-optimal performance as the number of prior studies grows, and characterize the rate at which regret vanishes. To illustrate the approach, we present two empirical applications--oncology drug trials and the Tennessee Project STAR experiment. Our framework connects the Bayesian meta-analysis literature to experimental design and provides practical guidance for researchers seeking to design more efficient experiments.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.20581
  20. By: Hiromu Ozai; Kei Nakagawa
    Abstract: Modeling time series with long- or short-memory characteristics is a fundamental challenge in many scientific and engineering domains. While fractional Brownian motion has been widely used as a noise source to capture such memory effects, its incompatibility with It\^o calculus limits its applicability in neural stochastic differential equation~(SDE) frameworks. In this paper, we propose a novel class of noise, termed Neural Network-kernel ARMA-type noise~(NA-noise), which is an It\^o-process-based alternative capable of capturing both long- and short-memory behaviors. The kernel function defining the noise structure is parameterized via neural networks and decomposed into a product form to preserve the Markov property. Based on this noise process, we develop NANSDE-Net, a generative model that extends Neural SDEs by incorporating NA-noise. We prove the theoretical existence and uniqueness of the solution under mild conditions and derive an efficient backpropagation scheme for training. Empirical results on both synthetic and real-world datasets demonstrate that NANSDE-Net matches or outperforms existing models, including fractional SDE-Net, in reproducing long- and short-memory features of the data, while maintaining computational tractability within the It\^o calculus framework.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.08182
  21. By: Lhaut, Stéphane (Université catholique de Louvain, LIDAM/ISBA, Belgium); Rootzén, Holger (Chalmers University of Technology); Segers, Johan (KU Leuven)
    Abstract: Economically responsible mitigation of multivariate extreme risks – extreme rainfall in a large area, huge variations of many stock prices, widespread breakdowns in transportation systems – requires estimates of the probabilities that such risks will materialize in the future. This paper develops a new method, Wasserstein–Aitchison Generative Adversarial Networks (WA-GAN), which provides simulated values of future d-dimensional multivariate extreme events and which hence can be used to give estimates of such probabilities. The main hypothesis is that, after transforming the observations to the unit-Pareto scale, their distribution is regularly varying in the sense that the distributions of their radial and angular components (with respect to the L1-norm) converge and become asymptotically independent as the radius gets large. The method is a combination of standard extreme value analysis modeling of the tails of the marginal distributions with nonparametric GAN modeling of the angular distribution. For the latter, the angular values are transformed to Aitchison coordinates in a full (d−1)-dimensional linear space, and a Wasserstein GAN is trained on these coordinates and used to generate new values. A reverse transformation is then applied to these values and gives simulated values on the original data scale. The method shows good performance compared to other existing methods in the literature, both in terms of capturing the dependence structure of the extremes in the data, as well as in generating accurate new extremes of the data distribution. The comparison is performed on simulated multivariate extremes from a logistic model in dimensions up to 50 and on a 30-dimensional financial data set.
    Keywords: Extreme value theory ; Wasserstein distance ; Generative adversarial networks ; Multivariate analysis ; Aitchison coordinates
    Date: 2025–05–01
    URL: https://d.repec.org/n?u=RePEc:aiz:louvad:2025010
  22. By: Dor Leventer
    Abstract: A growing body of research estimates child penalties, the gender gap in the effect of parenthood on labor market earnings, using event studies that normalize treatment effects by counterfactual earnings. I formalize the identification framework underlying this approach, which I term Normalized Triple Differences (NTD), and show it does not identify the conventional target estimand when the parallel trends assumption in levels is violated. Insights from human capital theory suggest such violations are likely: higher-ability individuals delay childbirth and have steeper earnings growth, a mechanism that causes conventional estimates to understate child penalties for early-treated parents. Using Israeli administrative data, a bias-bounding exercise suggests substantial understatement for early groups. As a solution, I propose targeting the effect of parenthood on the gender earnings ratio and show this new estimand is identified under NTD.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.07486
  23. By: Harrison Katz
    Abstract: Understanding how the composition of guest origin markets evolves over time is critical for destination marketing organizations, hospitality businesses, and tourism planners. We develop and apply Bayesian Dirichlet autoregressive moving average (BDARMA) models to forecast the compositional dynamics of guest origin market shares using proprietary Airbnb booking data spanning 2017--2024 across four major destination regions. Our analysis reveals substantial pandemic-induced structural breaks in origin composition, with heterogeneous recovery patterns across markets. The BDARMA framework achieves the lowest average forecast error across all destination regions, outperforming standard benchmarks including na\"ive forecasts, exponential smoothing, and SARIMA on log-ratio transformed data. For EMEA destinations, BDARMA achieves 23% lower forecast error than naive methods, with statistically significant improvements. By modeling compositions directly on the simplex with a Dirichlet likelihood and incorporating seasonal variation in both mean and precision parameters, our approach produces coherent forecasts that respect the unit-sum constraint while capturing complex temporal dependencies. The methodology provides destination stakeholders with probabilistic forecasts of source market shares, enabling more informed strategic planning for marketing resource allocation, infrastructure investment, and crisis response.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.18358
  24. By: Piersilvio De Bortoli; Davide Ferrari; Francesco Ravazzolo; Luca Rossini
    Abstract: This paper studies the Model Selection Confidence Set (MSCS) methodology for univariate time series models involving autoregressive and moving average components, and applies it to study model selection uncertainty in the Italian electricity load data. Rather than relying on a single model selected by an arbitrary criterion, the MSCS identifies a set of models that are statistically indistinguishable from the true data-generating process at a given confidence level. The size and composition of this set reveal crucial information about model selection uncertainty: noisy data scenarios produce larger sets with many candidate models, while more informative cases narrow the set considerably. To study the importance of each model term, we consider numerical statistics measuring the frequency with which each term is included in both the entire MSCS and in Lower Boundary Models (LBM), its most parsimonious specifications. Applied to Italian hourly electricity load data, the MSCS methodology reveals marked intraday variation in model selection uncertainty and isolates a collection of model specifications that deliver competitive short-term forecasts while highlighting key drivers of electricity load like intraday hourly lags, temperature, calendar effects and solar energy generation.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.16527
  25. By: Hamsa Bastani; Osbert Bastani; Bryce McLaughlin
    Abstract: A major challenge in data-driven decision-making is accurate policy evaluation-i.e., guaranteeing that a learned decision-making policy achieves the promised benefits. A popular strategy is model-based policy evaluation, which estimates a model from data to infer counterfactual outcomes. This strategy is known to produce unwarrantedly optimistic estimates of the true benefit due to the winner's curse. We searched the recent literature on data-driven decision-making, identifying a sample of 55 papers published in the Management Science in the past decade; all but two relied on this flawed methodology. Several common justifications are provided: (1) the estimated models are accurate, stable, and well-calibrated, (2) the historical data uses random treatment assignment, (3) the model family is well-specified, and (4) the evaluation methodology uses sample splitting. Unfortunately, we show that no combination of these justifications avoids the winner's curse. First, we provide a theoretical analysis demonstrating that the winner's curse can cause large, spurious reported benefits even when all these justifications hold. Second, we perform a simulation study based on the recent and consequential data-driven refugee matching problem. We construct a synthetic refugee matching environment (calibrated to closely match the real setting) but designed so that no assignment policy can improve expected employment compared to random assignment. Model-based methods report large, stable gains of around 60% even when the true effect is zero; these gains are on par with improvements of 22-75% reported in the literature. Our results provide strong evidence against model-based evaluation.
    Date: 2026–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2602.08892

This nep-ecm issue is ©2026 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.