nep-ecm 2025-07-21 papers

on Econometrics

Issue of 2025–07–21
23 papers chosen by
Sune Karlsson, Örebro universitet

Nonparametric causal inference with functional covariates By Kurisu, Daisuke; Otsu, Taisuke; Xu, Mengshan
Matrix-Valued Spatial Autoregressions with Dynamic and Robust Heterogeneous Spillovers By Yicong Lin; André Lucas; Shiqi Ye
Reexamining an old story: uncovering the hidden small sample bias in AR(1) models By Dou, Zhiwei; Ariens, Sigert; Ceulemans, Eva; Lafit, Ginette
Simulation Smoothing for State Space Models: An Extremum Monte Carlo Approach By Karim Moussa
The Power Asymmetry in Fuzzy Regression Discontinuity Designs By Daniel Kaliski; Michael P. Keane; Timothy Neal
Functional Location-Scale Models with Robust Observation-Driven Dynamics By Yicong Lin; André Lucas
Semiparametric Estimation of Probability Weighting Functions Implicit in Option Prices By H. Peter Boswijk; Jeroen Dalderop; Roger J. A. Laeven; Niels Marijnen
Chunk-Based Higher-Order Hierarchical Diagnostic Classification Models: A Maximum Likelihood Estimation Approach By Lee, Minho; Suh, Yon Soo
Unfolding the network of peer grades: a latent variable approach By Mignemi, Giuseppe; Chen, Yunxiao; Moustaki, Irini
Statistical Properties of Two Asymmetric Stochastic Volatility in Power Mean Models By Antonis Demos
Uncertainty in Empirical Economics By Frank Schorfheide; Zhiheng You
On a Definition of Trend By Silva Lopes, Artur
On the Correlations in Linearized Multivariate Stochastic Volatility Models By Karim Moussa
The Stick-Breaking and Ordering Representations of Compositional Data: Copulas and Regression models By Olivier P. Faugeras
Item response theory—a statistical framework for educational and psychological measurement By Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang
Improving Score-Driven Density Forecasts with an Application to Implied Volatility Surface Dynamics By Xia Zou; Yicong Lin; André Lucas
Clustering Extreme Value Indices in Large Panels By Chenhui Wang; Juan Juan Cai; Yicong Lin; Julia Schaumburg
Conditional Fat Tails and Scale Dynamics for Intraday Discrete Price Changes By Daan Schoemaker; André Lucas; Anne Opschoor
Limited Self-Knowledge and Survey Response Behavior By Armin Falk; Luca Henkel; Thomas Neuber; Philipp Strack
Estimated Monthly National Accounts for the United States By Mr. Philip Barrett
Clustering Approaches for Mixed‐Type Data: A Comparative Study By Badih Ghattas; Alvaro Sanchez San-Benito
The Retreat from Causality: Local Projections and Econometrics Without Economics By Chung, Sungyup
Measuring and Explaining the CDS-Bond Basis Term-Structure Shape and Dynamics By Yonas Khanna; André Lucas; Norman Seeger

Nonparametric causal inference with functional covariates

By:	Kurisu, Daisuke; Otsu, Taisuke; Xu, Mengshan
Abstract:	Functional data and their analysis have become increasingly popular in various fields of data science. This paper considers estimation and inference of the average treatment effect under unconfoundedness when the covariates involve a functional variable, and proposes the inverse probability weighting estimator, where the propensity score is estimated by utilizing a kernel estimator for functional variables. We establish the √-consistency and asymptotic normality of the proposed estimator. Numerical experiments and an empirical application demonstrate the usefulness of the proposed method.
Keywords:	casual inference; functional data; semiparametric estimation
JEL:	J1
Date:	2025–06–20
URL:	https://d.repec.org/n?u=RePEc:ehl:lserod:127990

Matrix-Valued Spatial Autoregressions with Dynamic and Robust Heterogeneous Spillovers

By:	Yicong Lin (Vrije Universiteit Amsterdam and Tinbergen Institute); André Lucas (Vrije Universiteit Amsterdam and Tinbergen Institute); Shiqi Ye (AMSS Center for Forecasting Science)
Abstract:	We introduce a new time-varying parameter spatial matrix autoregressive model that integrates matrix-valued time series, heterogeneous spillover effects, outlier robustness, and time-varying parameters in one unified framework. The model allows for separate dynamic spatial spillover effects across both the row and column dimensions of the matrix-valued observations. Robustness is introduced through innovations that follow a (conditionally heteroskedastic) matrix Student's $t$ distribution. In addition, the proposed model nests many existing spatial autoregressive models, yet remains easy to estimate using standard maximum likelihood methods. We establish the stationarity and invertibility of the model and the consistency and asymptotic normality of the maximum likelihood estimator. Our simulations reveal that the latent time-varying two-way spatial spillover effects can be successfully recovered, even under severe model misspecification. The model's usefulness is illustrated both in-sample and out-of-sample using two different applications: one in international trade, and the other based on global stock market data.
Keywords:	matrix-valued time series; spatial autoregression; time-varying parame- ters; score-driven dynamics
JEL:	C31 C32 C58
Date:	2025–07–04
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250042

Reexamining an old story: uncovering the hidden small sample bias in AR(1) models

By:	Dou, Zhiwei; Ariens, Sigert; Ceulemans, Eva; Lafit, Ginette
Abstract:	The first order autoregressive [AR(1)] model is widely used to investigate psycholog- ical dynamics. This study focuses on the estimation and inference of the autoregressive (AR) effect in AR(1) models under a limited sample size—a common scenario in psy- chological research. State-of-the-art estimators of the autoregressive effect are known to be biased when sample sizes are small. We analytically demonstrate the causes and consequences of this small sample bias on the estimation of the AR effect, its variance, and the AR(1) model’s intercept, particularly when using OLS. In addition, we reviewed various bias correction methods proposed in the time series literature. A simulation study compares the OLS estimator with these correction methods in terms of estimation accuracy and inference. The main result indicates that the small sam- ple bias of the OLS estimator of the autoregressive effect is a consequence of limited information and correcting for this bias without more information always induces a bias-variance trade-off. Nevertheless, correction methods discussed in this research may offer improved statistical power under moderate sample sizes when the primary research goal is hypothesis testing.
Date:	2025–06–17
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:esfpy_v1

Simulation Smoothing for State Space Models: An Extremum Monte Carlo Approach

By:	Karim Moussa (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	This paper introduces a novel approach to simulation smoothing for nonlinear and non-Gaussian state space models. It allows for computing smoothed estimates of the states and nonlinear functions of the states, as well as visualizing the joint smoothing distribution. The approach combines extremum estimation with simulated data from the model to estimate the conditional distributions in the backward smoothing decomposition. The method is generally applicable and can be paired with various estimators of conditional distributions. Several applications to nonlinear models are presented for illustration. An empirical application based on a stochastic volatility model with stable errors highlights the flexibility of the approach.
Date:	2025–05–16
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250034

The Power Asymmetry in Fuzzy Regression Discontinuity Designs

By:	Daniel Kaliski; Michael P. Keane; Timothy Neal
Abstract:	In a fuzzy regression discontinuity (RD) design, the probability of treatment jumps when a running variable (R) passes a threshold (R0). Fuzzy RD estimates are obtained via a procedure analogous to two-stage least squares (2SLS), where an indicator I(R > R0) plays the role of the instrument. Recently, Keane and Neal (2023, 2024) showed that 2SLS t-tests suffer from a “power asymmetry”: 2SLS standard errors are spuriously small (large) when the 2SLS estimate is close to (far from) the OLS estimate. Here, we show that a similar problem arises in Fuzzy RD. Hence, if the endogeneity bias is positive, the Fuzzy RD t-test has little power to detect true negative effects, and inflated power to find false positives. The problem persists even if the instrument is very strong. To avoid this problem one should rely exclusively on the intent-to-treat (ITT) regression to assess significance of the treatment effect.
JEL:	C12 C14 C18
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:33972

Functional Location-Scale Models with Robust Observation-Driven Dynamics

By:	Yicong Lin (Vrije Universiteit Amsterdam and Tinbergen Institute); André Lucas (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	We introduce a new class of location-scale models for dynamic functional data in arbitrary but fixed dimensions, where the location and scale functional parameters can evolve over time. A key feature of the parameter dynamics in these models is its observation-driven nature, where the one-step-ahead evolution is fully determined conditional on past observations, yet remains stochastic unconditionally. We estimate the model using a likelihood-based approach designed for sparsely observed data and establish the consistency and asymptotic normality of the underlying static parameters that govern the location-scale dynamics. The choice of objective function and the construction of the dynamics together shield the time-varying location and scale parameters from the potentially distorting effects of influential observations. Simulations reveal that our method can recover the unobserved location-scale dynamics from sparse data, even in the presence of model mis-specification and substantial outliers. We apply our framework to examine the intraday volatility dynamics of Pfizer stock returns during the COVID-19 pandemic, and PM2.5 concentrations measured by low-cost sensors across Europe. The proposed model exhibits robust performance in capturing dynamics for both datasets despite the presence of many large shocks.
Keywords:	time variation, location-scale, functional score-driven dynamics, sparse data, outlier robustness
JEL:	C22 C58 Q56
Date:	2025–04–17
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250027

Semiparametric Estimation of Probability Weighting Functions Implicit in Option Prices

By:	H. Peter Boswijk (University of Amsterdam and Tinbergen Institute); Jeroen Dalderop (University of Notre Dame); Roger J. A. Laeven (University of Amsterdam and Tinbergen Institute); Niels Marijnen (University of Amsterdam and Tinbergen Institute)
Abstract:	This paper develops a semiparametric estimation method that jointly identifies the probability weighting and utility functions implicit in option prices. Our econometric method avoids direct specification of the objective conditional return distributions, which are instead obtained by transforming the optionsâ€™ implied risk-neutral distributions according to the posited rank-dependent utility model. We nonparametrically estimate the probability weighting function using the kernel density of suitable utility-adjusted probability integral transforms. The parameters of the utility function are estimated by maximizing the resulting profile likelihood. We establish the asymptotic properties of our estimation procedure, and demonstrate its good finite sample performance in Monte Carlo simulations. Empirical results based on S&P 500 index option prices and returns over the period 1996â€“2023 reveal the relevance of probability weighting, in particular at the monthly horizon where the weighting function is inverse-S shaped, which is robust to various specifications of the utility function.
Keywords:	Semiparametric inference; Probability weighting function; Profile likelihood; Kernel estimation; Options
JEL:	C14 C58 G13
Date:	2025–03–21
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250022

Chunk-Based Higher-Order Hierarchical Diagnostic Classification Models: A Maximum Likelihood Estimation Approach

By:	Lee, Minho; Suh, Yon Soo
Abstract:	This paper presents a class of higher-order diagnostic classification models (HO–DCMs) capable of capturing complex, nonlinear hierarchical relationships among attributes. Building on and extending prior work, we adopt a nominal response model framework in item response theory and leverage standard maximum likelihood estimation (MLE). In parallel, we demonstrate that sequential HO–DCMs can likewise be implemented within an MLE framework. Furthermore, we introduce a novel chunk-based approach for representing attribute hierarchies, wherein attributes are organized into cognitively coherent subgraphs (chunks) nested within a continuous general ability continuum. The performance of the models is validated through simulation studies evaluating parameter recovery, classification accuracy, and null rejection rates of goodness-of-fit measures. An empirical demonstration showcases how the proposed framework can be applied in practice, highlighting its advantages in model flexibility, interpretability, and the additional diagnostic insights it affords.
Date:	2025–06–17
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:aney6_v1

Unfolding the network of peer grades: a latent variable approach

By:	Mignemi, Giuseppe; Chen, Yunxiao; Moustaki, Irini
Abstract:	Peer grading is an educational system in which students assess each other's work. It is commonly applied under Massive Open Online Course (MOOC) and offline classroom settings. With this system, instructors receive a reduced grading workload, and students enhance their understanding of course materials by grading others' work. Peer grading data have a complex dependence structure, for which all the peer grades may be dependent. This complex dependence structure is due to a network structure of peer grading, where each student can be viewed as a vertex of the network, and each peer grade serves as an edge connecting one student as a grader to another student as an examinee. This article introduces a latent variable model framework for analyzing peer grading data and develops a fully Bayesian procedure for its statistical inference. This framework has several advantages. First, when aggregating multiple peer grades, the average score and other simple summary statistics fail to account for grader effects and, thus, can be biased. The proposed approach produces more accurate model parameter estimates and, therefore, more accurate aggregated grades by modeling the heterogeneous grading behavior with latent variables. Second, the proposed method provides a way to assess each student's performance as a grader, which may be used to identify a pool of reliable graders or generate feedback to help students improve their grading. Third, our model may further provide insights into the peer grading system by answering questions such as whether a student who performs better in coursework also tends to be a more reliable grader. Finally, thanks to the Bayesian approach, uncertainty quantification is straightforward when inferring the student-specific latent variables as well as the structural parameters of the model. The proposed method is applied to two real-world datasets.
Keywords:	peer grading; rating model; cross-classified model; Bayesian modeling
JEL:	C1
Date:	2025–06–16
URL:	https://d.repec.org/n?u=RePEc:ehl:lserod:128146

Statistical Properties of Two Asymmetric Stochastic Volatility in Power Mean Models

By:	Antonis Demos (www.aueb.gr/users/demos)
Abstract:	Here we investigate the statistical properties of two autoregressive normal asymmetric SV models with possibly time varying risk premia. These, although they seem very similar, it turns out, that they possess quite different statistical properties. The derived properties can be employed to develop tests or to check for up to forth order stationarity, something important for the asymptotic properties of various estimators.
Date:	2025–06–26
URL:	https://d.repec.org/n?u=RePEc:aue:wpaper:2546

Uncertainty in Empirical Economics

By:	Frank Schorfheide; Zhiheng You
Abstract:	Econometricians invest substantial effort in constructing standard errors that yield valid inference under a hypothetical data-generating process. This paper asks a fundamental question: Are the uncertainty statements reported by applied researchers consistent with empirical frequencies? The short answer is no. Drawing on the forecasting literature, we predict estimates from “new” studies using estimates from corresponding baseline studies. By doing this across a large number of study groups and linking parameters through a hierarchical model, we compare stated probabilities to observed empirical frequencies. Alignment occurs only under limited external validity, namely, that the studies estimate different parameters.
JEL:	C11 C18 C21
Date:	2025–06
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:33962

On a Definition of Trend

By:	Silva Lopes, Artur
Abstract:	Several reasons explain the absence of a precise, complete and widely accepted definition of trend for economic time series, and the existence of two major disparate models is one of the most important. A recent operational proposal tried to overcome this difficulty resorting to a statistical test with good asymptotic properties against both those alternatives. However, this proposal may be criticized because it rests on a tool for inductive, not deductive, inference. Besides criticizing this recent definition, drawing heavily on previous ones, the paper provides a new proposal, more complete, containing several necessary but no sufficient condition(s).
Keywords:	trend; time series; macroeconomy; statistical testing
JEL:	B41 C12 C18 C22
Date:	2025–04–14
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125073

On the Correlations in Linearized Multivariate Stochastic Volatility Models

By:	Karim Moussa (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	In the analysis of multivariate stochastic volatility models, many estimation procedures begin by transforming the data, taking the logarithm of the squared returns to obtain a linear state space model. A well-known series representation links the correlations between elements of the observation error in the actual and linearized forms of the model. This note derives a closed-form expression for the series and discusses its statistical implications. Additionally, it offers a new interpretation of the correlations in the linearized model.
Date:	2025–03–21
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250021

The Stick-Breaking and Ordering Representations of Compositional Data: Copulas and Regression models

By:	Olivier P. Faugeras (TSE-R - Toulouse School of Economics - UT Capitole - Université Toulouse Capitole - UT - Université de Toulouse - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement)
Abstract:	Compositional Data (CoDa) is usually viewed as data on the simplex and is studied via a log-ratio analysis, following the classical work of Aitchison (1986). We propose to bring to the fore an alternative view of CoDa as a stick breaking process, an approach which originates from Bayesian nonparametrics. The first stick-breaking approach gives rise to a view of CoDa as ordered statistics, from which we can derive "stick-ordered" distributions. The second approach is based on a rescaled stick-breaking transformation, and give rises to a geometric view of CoDa as a free unit cube. The latter allows to introduce copula and regression models, which are useful for studying the internal or external dependence of CoDa. These stick-breaking representations allow to effectively and simply deal with CoDa with zeroes. We establish connections with other topics of probability and statistics like i) spacings and order statistics, ii) Bayesian nonparametrics and Dirichlet distributions, iii) neutrality, iv) hazard rates and the product integral, v) mixability.
Keywords:	Compositional data analysis, Stick-breaking representation, Copula, Regression, Distribution
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-05064979

Item response theory—a statistical framework for educational and psychological measurement

By:	Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Ying, Zhiliang
Abstract:	Item response theory (IRT) has become one of the most popular statistical models for psychometrics, a field of study concerned with the theory and techniques of psychological measurement. The IRT models are latent factor models tailored to the analysis, interpretation and prediction of individuals’ behaviors in answering a set of measurement items that typically involve categorical response data. Many important questions of measurement are directly or indirectly answered through the use of IRT models, including scoring individuals’ test performances, validating a test scale, linking two tests, among others. This paper provides a review of item response theory, including its statistical framework and psychometric applications. We establish connections between item response theory and related topics in statistics, including empirical Bayes, nonparametric methods, matrix completion, regularized estimation and sequential analysis. Possible future directions of IRT are discussed from the perspective of statistical learning.
Keywords:	psychometrics; measurement theory; factor analysis; item response theory; latent trait; validity; reliability
JEL:	C1
Date:	2025–06–02
URL:	https://d.repec.org/n?u=RePEc:ehl:lserod:120810

Improving Score-Driven Density Forecasts with an Application to Implied Volatility Surface Dynamics

By:	Xia Zou (Vrije Universiteit Amsterdam and Tinbergen Institute); Yicong Lin (Vrije Universiteit Amsterdam and Tinbergen Institute); André Lucas (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	Point forecasts of score-driven models have been shown to behave at par with those of state-space models under a variety of circumstances. We show, however, that density rather than point forecasts of plain-vanilla score-driven models substantially underperform their state-space counterparts in a factor model context. We uncover the origins of this phenomenon and show how a simple adjustment of the measurement density of the score-driven model can put score-driven and state-space models approximately back on an equal footing again. The score-driven models can subsequently easily be extended with non-Gaussian features to fit the data even better without complicating parameter estimation. We illustrate our findings using a factor model for the implied volatility surface of S&P500 index options data.
JEL:	C32 C38
Date:	2025–05–30
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250036

Clustering Extreme Value Indices in Large Panels

By:	Chenhui Wang (Vrije Universiteit Amsterdam); Juan Juan Cai (Vrije Universiteit Amsterdam and Tinbergen Institute); Yicong Lin (Vrije Universiteit Amsterdam and Tinbergen Institute); Julia Schaumburg (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	We analyze a large panel of units grouped by shared extreme value indices (EVIs) and aim to identify these unknown groups. To achieve this, we order the Hill estimates of individual EVIs and segment them by minimizing the total squared distance between each estimate and its corresponding group average. We show that our method consistently recovers group memberships, and we establish the asymptotic normality of the proposed group estimator. The group estimator attains a faster convergence rate than the individual Hill estimator, leading to improved estimation accuracy. Simulation results reveal that our method achieves high empirical segmentation accuracy, and the resulting group EVI estimates substantially reduce mean absolute errors compared to individual estimates. We apply the proposed method to analyze a rainfall dataset collected from 4, 735 stations across Europe, covering the winter seasons from January 1, 1950, to December 31, 2020, and find statistically significant evidence of an increase in the highest and a decrease in the lowest group EVI estimates, suggesting growing variability and intensification of extreme rainfall events across Europe.
JEL:	C1 C23 C38
Date:	2025–04–25
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250029

Conditional Fat Tails and Scale Dynamics for Intraday Discrete Price Changes

By:	Daan Schoemaker (Vrije Universiteit Amsterdam and Tinbergen Institute); André Lucas (Vrije Universiteit Amsterdam and Tinbergen Institute); Anne Opschoor (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	We investigate the conditional tail behaviour of asset price changes at high (10-second) frequencies using a new dynamic model for integer-valued tickdata. The model has fat tails, scale dynamics, and allows for possible over- or under-representation of zero price changes. The model can be easily estimated using standard maximum likelihood methods and accommodates both polynomially (fat) and geometrically declining tails. In an application to stock, cryptocurrency and foreign exchange markets during the COVID-19 crisis, we find that conditional fat-tailedness is empirically important for many assets, even at such high frequencies. The new model outperforms the thin-tailed (zero-initiated) dynamic benchmark Skellam model by a wide margin, both insample and out-of-sample.
Keywords:	high frequency tick data, polynomial tails, discrete data, Hurwitz zeta function, score-driven dynamics
JEL:	C22 C46 C58
Date:	2025–06–26
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250039

Limited Self-Knowledge and Survey Response Behavior

By:	Armin Falk; Luca Henkel; Thomas Neuber; Philipp Strack
Abstract:	We study response behavior in surveys and propose a method to identify and improve the informativeness of survey evidence. First, we develop a choice model of survey response behavior under the assumption that responses imperfectly reveal respondents' characteristics due to limited self-knowledge, inattention, or lack of engagement. Respondents receive individual-specific signals about their characteristics and choose their responses accordingly. We identify the conditions under which this process leads to biased inference from survey evidence and demonstrate how focusing on respondents with high signal precision mitigates bias. Importantly, we show that a respondent's signal precision can be inferred from observed response patterns. Second, based on these insights, we develop a consistent and unbiased estimator for a respondent's signal precision. Third, we provide experimental and survey evidence concerning the performance of the model and estimator. We experimentally test the model's key predictions in a context where the researcher knows the true characteristics. The data confirm both the model's predictions and the estimator's validity. Using a large survey, we show how our estimator can be used to improve survey evidence. Our estimator significantly increases the explanatory power of self-assessments and their association with behavior, and performs well relative to alternative methods proposed in the literature.
Keywords:	survey research, rational inattention, online experiment, non-cognitive skills, preferences
JEL:	C83 D83 C91 D91 J24
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_11968

Estimated Monthly National Accounts for the United States

By:	Mr. Philip Barrett
Abstract:	I jointly estimate monthly series for GDP and eight subcomponents for the US since 1950. The series match 1) quarterly national accounts equivalents, 2) exact data on monthly consumption, and 3) past relationships with other monthly indicators. I estimate the Kalman filter parameters by GMM, allowing fast calculation of confidence intervals for monthly estimates including parameter uncertainty, and validate the confidence intervals. After 1970 standard errors are tight, less than 0.3pp of GDP, and point estimates informative, with standard deviations four times the standard error. I provide confidence intervals for recessions and show that output peaks line up well with the onset of NBER recessions, but troughs often predate NBER equivalents.
Keywords:	Kalman Filter; GDP; recession; GMM
Date:	2025–07–04
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2025/134

Clustering Approaches for Mixed‐Type Data: A Comparative Study

By:	Badih Ghattas (AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Alvaro Sanchez San-Benito (Airbus Helicopters - Aeroport International de Marseille-Provence)
Abstract:	Clustering is widely used in unsupervised learning to fnd homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. Tis study presents the state-of-the-art of these approaches and compares them using various simulation models. Te compared methods include the distance-based approaches k-prototypes, PDQ, and convex k-means, and the probabilistic methods KAy-means for MIxed LArge data (KAMILA), the mixture of Bayesian networks (MBNs), and latent class model (LCM). Te aim is to provide insights into the behavior of diferent methods across a wide range of scenarios by varying some experimental factors such as the number of clusters, cluster overlap, sample size, dimension, proportion of continuous variables in the dataset, and clusters' distribution. Te degree of cluster overlap and the proportion of continuous variables in the dataset and the sample size have a signifcant impact on the observed performances. When strong interactions exist between variables alongside an explicit dependence on cluster membership, none of the evaluated methods demonstrated satisfactory performance. In our experiments KAMILA, LCM, and k-prototypes exhibited the best performance, with respect to the adjusted rand index (ARI). All the methods are available in R.
Keywords:	Bayesian networks, clustering, KAMILA, LCM, mixed-type data
Date:	2025–01
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-05069567

The Retreat from Causality: Local Projections and Econometrics Without Economics

By:	Chung, Sungyup
Abstract:	This paper critically examines the local projections method introduced by Jordà (2005), questioning its validity in the absence of a well-defined causal framework. Although local projections have been widely adopted for their technical simplicity and perceived robustness, this study argues that such methods may obscure, rather than clarify, underlying causal mechanisms. At the heart of the critique is the method's neglect of the recursive and interdependent structure of the data-generating process, which results in logically inconsistent assumptions and potentially distorted forecasts.
Keywords:	Local Projection, Causality, Econometrics
JEL:	A11 A12 C18 C50
Date:	2025–04–30
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125198

Measuring and Explaining the CDS-Bond Basis Term-Structure Shape and Dynamics

By:	Yonas Khanna (ING Bank); André Lucas (Vrije Universiteit Amsterdam and Tinbergen Institute); Norman Seeger (Vrije Universiteit Amsterdam and Tinbergen Institute)
Abstract:	The CDS-bond basis quantifies the difference in risk premia between credit default swap (CDS) and bond markets. It is hard to measure at the individual firm level given substantial missing-value problems (30%-100%) in either or both markets, even for highly liquid blue-chip financial firms. We propose a novel imputation approach to obtain full historical firm-level basis term-structures across all maturities. Our approach can accommodate different term-structure interpolation methods, including Nearest-Neighbor, spline, and Nelson-Siegel interpolation. Using the new methodology, we construct the full history of the 2011-2021 JP Morgan (JPM) basis term-structure and use it to analyze its empirical determinants. We find that factors like market liquidity, funding liquidity, counterparty risk, and the default premium all impact the basis term-structure, though not all at the same moment in time. All factors are statistically significant during the Covid-19 pandemic. The various empirical limits-to-arbitrage proxies correlate differently with different parts of the basis term-structure, stressing the need to model the full basis term-structure rather than assuming it to be flat. The results are robust for other blue-chip financials, each time requiring the full basis term-structure imputation approach as proposed in this paper.
Keywords:	CDS-bond basis, missing value imputation, high-dimensional panel data, multi-curve modeling, time-varying spline interpolation, dynamic Nelson-Siegel, Kalman filter
JEL:	C32 C33 C58 G12 G32
Date:	2025–05–30
URL:	https://d.repec.org/n?u=RePEc:tin:wpaper:20250037

This nep-ecm issue is ©2025 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.