
on Econometrics 
By:  Jiatong Li; Hongqiang Yan 
Abstract:  We develop uniform inference for highdimensional threshold regression parameters and valid inference for the threshold parameter in this paper. We first establish oracle inequalities for prediction errors and $\ell_1$ estimation errors for the Lasso estimator of the slope parameters and the threshold parameter, allowing for heteroskedastic nonsubgaussian error terms and nonsubgaussian covariates. Next, we derive the asymptotic distribution of tests involving an increasing number of slope parameters by debiasing (or desparsifying) the scaled Lasso estimator. The asymptotic distribution of tests without the threshold effect is identical to that with a fixed effect. Moreover, we perform valid inference for the threshold parameter using subsampling method. Finally, we conduct simulation studies to demonstrate the performance of our method in finite samples. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.08105&r=ecm 
By:  Dmitry Arkhangelsky; Aleksei Samkov 
Abstract:  We study the estimation of treatment effects of a binary policy in environments with a staggered treatment rollout. We propose a new estimator  Sequential Synthetic Difference in Difference (Sequential SDiD)  and establish its theoretical properties in a linear model with interactive fixed effects. Our estimator is based on sequentially applying the original SDiD estimator proposed in Arkhangelsky et al. (2021) to appropriately aggregated data. To establish the theoretical properties of our method, we compare it to an infeasible OLS estimator based on the knowledge of the subspaces spanned by the interactive fixed effects. We show that this OLS estimator has a sequential representation and use this result to show that it is asymptotically equivalent to the Sequential SDiD estimator. This result implies the asymptotic normality of our estimator along with corresponding efficiency guarantees. The method developed in this paper presents a natural alternative to the conventional DiD strategies in staggered adoption designs. 
Date:  2024–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.00164&r=ecm 
By:  Vogelsang, Timothy J. (Department of Economics, Michigan State University); Wagner, Martin (Department of Economics, University of Klagenfurt, Bank of Slovenia, Ljubljana and Institute for Advanced Studies, Vienna) 
Abstract:  This paper shows that the integrated modified OLS (IMOLS) estimator developed for cointegrating linear regressions in Vogelsang and Wagner (2014a) can be straightforwardly extended to cointegrating multivariate polynomial regressions. These are regression models that include as explanatory variables deterministic variables, integrated processes and products of (nonnegative) integer powers of these variables as regressors. The stationary errors are allowed to be serially correlated and the regressors are allowed to be endogenous. The IMOLS estimator is tuningparameter free and does not require the estimation of any longrun variances. A scalar longrun variance, however, has to be estimated and scaled out when using IMOLS for inference. In this respect, we consider both standard asymptotic inference as well as fixedb inference. Fixedb inference requires that the regression model is of full design. The results may be particularly interesting for specification testing of cointegrating relationships, with RESETtype specification tests following immediately. The simulation section also zooms in on RESET specification testing and illustrates that the performance of IMOLS is qualitatively comparable to its performance in cointegrating linear regressions. 
Keywords:  Cointegration, fixedb asymptotics, IMOLS, multivariate polynomials, nonlinearity, RESET 
JEL:  C12 C13 C32 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:ihs:ihswps:53&r=ecm 
By:  Bai, Jushan; Wang, Peng 
Abstract:  We propose a framework for causal inference using factor models. We base our identification strategy on the assumption that policy interventions cause structural breaks in the factor loadings for the treated units. The method allows heterogeneous trends and is easy to implement. We compare our method with the synthetic control methods of Abadie, et al (2010, 2015), and obtain similar results. Additionally, we provide confidence intervals for the causal effects. Our approach expands the toolset for causal inference. 
Keywords:  synthetic control, differenceindifferences, structural breaks, latent factors. 
JEL:  C1 C23 C33 C51 
Date:  2024–03–31 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:120585&r=ecm 
By:  Lena S. Bjerkander; Jonas Dovern; Hans Manner 
Abstract:  We review tests of null hypotheses that consist of many subsidiary null hypotheses, including tests that have not received much attention in the econometrics literature. We study test performance in the context of specification testing for linear regressions based on a Monte Carlo study. Overall, parametric tests that use (transformed) Pvalues corresponding to all subsidiary null hypotheses outperform the wellknown minimum Pvalue test and a recently proposed test that relies on the nonparametric estimation of the joint density of all subsidiary test statistics. 
Keywords:  combined hypothesis, Pvalue, multiple hypothesis testing, Fisher test 
JEL:  C12 C15 
Date:  2024 
URL:  http://d.repec.org/n?u=RePEc:ces:ceswps:_11027&r=ecm 
By:  Matteo Mogliani; Anna Simoni 
Abstract:  We propose a Machine Learning approach for optimal macroeconomic forecasting in a highdimensional setting with covariates presenting a known group structure. Our model encompasses forecasting settings with many series, mixed frequencies, and unknown nonlinearities. We introduce in timeseries econometrics the concept of bilevel sparsity, i.e. sparsity holds at both the group level and within groups, and we assume the true model satisfies this assumption. We propose a prior that induces bilevel sparsity, and the corresponding posterior distribution is demonstrated to contract at the minimaxoptimal rate, recover the model parameters, and have a support that includes the support of the model asymptotically. Our theory allows for correlation between groups, while predictors in the same group can be characterized by strong covariation as well as common characteristics and patterns. Finite sample performance is illustrated through comprehensive Monte Carlo experiments and a realdata nowcasting exercise of the US GDP growth rate. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.02671&r=ecm 
By:  Nir Billfeld; Moshe Kim 
Abstract:  We develop a novel identification strategy as well as a new estimator for contextdependent causal inference in nonparametric triangular models with nonseparable disturbances. Departing from the common practice, our analysis does not rely on the strict monotonicity assumption. Our key contribution lies in leveraging on diffusion models to formulate the structural equations as a system evolving from noise accumulation to account for the influence of the latent context (confounder) on the outcome. Our identifiability strategy involves a system of Fredholm integral equations expressing the distributional relationship between a latent context variable and a vector of observables. These integral equations involve an unknown kernel and are governed by a set of structural form functions, inducing a nonmonotonic inverse problem. We prove that if the kernel density can be represented as an infinite mixture of Gaussians, then there exists a unique solution for the unknown function. This is a significant result, as it shows that it is possible to solve a nonmonotonic inverse problem even when the kernel is unknown. On the methodological front we leverage on a novel and enriched Contaminated Generative Adversarial (Neural) Networks (CONGAN) which we provide as a solution to the nonmonotonic inverse problem. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.05021&r=ecm 
By:  James A. Duffy; Sophocles Mavroeidis 
Abstract:  While it is widely recognised that linear (structural) VARs may omit important features of economic time series, the use of nonlinear SVARs has to date been almost entirely confined to the modelling of stationary time series, because of a lack of understanding as to how common stochastic trends may be accommodated within nonlinear VAR models. This has unfortunately circumscribed the range of series to which such models can be applied  and/or required that these series be first transformed to stationarity, a potential source of misspecification  and prevented the use of longrun identifying restrictions in these models. To address these problems, we develop a flexible class of additively timeseparable nonlinear SVARs, which subsume models with thresholdtype endogenous regime switching, both of the piecewise linear and smooth transition varieties. We extend the GrangerJohansen representation theorem to this class of models, obtaining conditions that specialise exactly to the usual ones when the model is linear. We further show that, as a corollary, these models are capable of supporting the same kinds of longrun identifying restrictions as are available in linear cointegrated SVARs. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.05349&r=ecm 
By:  Anirban Mukherjee; Hannah Hanwen Chang 
Abstract:  Social science research often hinges on the relationship between categorical variables and outcomes. We introduce CAVIAR, a novel method for embedding categorical variables that assume values in a highdimensional ambient space but are sampled from an underlying manifold. Our theoretical and numerical analyses outline challenges posed by such categorical variables in causal inference. Specifically, dynamically varying and sparse levels can lead to violations of the Donsker conditions and a failure of the estimation functionals to converge to a tight Gaussian process. Traditional approaches, including the exclusion of rare categorical levels and principled variable selection models like LASSO, fall short. CAVIAR embeds the data into a lowerdimensional global coordinate system. The mapping can be derived from both structured and unstructured data, and ensures stable and robust estimates through dimensionality reduction. In a dataset of directtoconsumer apparel sales, we illustrate how highdimensional categorical variables, such as zip codes, can be succinctly represented, facilitating inference and analysis. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.04979&r=ecm 
By:  Artem Kraevskiy; Artem Prokhorov; Evgeniy Sokolovskiy 
Abstract:  We develop and apply a new online early warning system (EWS) for what is known in machine learning as concept drift, in economics as a regime shift and in statistics as a change point. The system goes beyond linearity assumed in many conventional methods, and is robust to heavy tails and taildependence in the data, making it particularly suitable for emerging markets. The key component is an effective changepoint detection mechanism for conditional entropy of the data, rather than for a particular indicator of interest. Combined with recent advances in machine learning methods for highdimensional random forests, the mechanism is capable of finding significant shifts in information transfer between interdependent time series when traditional methods fail. We explore when this happens using simulations and we provide illustrations by applying the method to Uzbekistan's commodity and equity markets as well as to Russia's equity market in 20212023. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.03319&r=ecm 
By:  Peter Reinhard Hansen; Chen Tong 
Abstract:  We introduce a new class of multivariate heavytailed distributions that are convolutions of heterogeneous multivariate tdistributions. Unlike commonly used heavytailed distributions, the multivariate convolutiont distributions embody cluster structures with flexible nonlinear dependencies and heterogeneous marginal distributions. Importantly, convolutiont distributions have simple density functions that facilitate estimation and likelihoodbased inference. The characteristic features of convolutiont distributions are found to be important in an empirical analysis of realized volatility measures and help identify their underlying factor structure. 
Date:  2024–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.00864&r=ecm 
By:  Eric Auerbach; Yong Cai; Ahnaf Rafi 
Abstract:  Researchers who estimate treatment effects using a regression discontinuity design (RDD) typically assume that there are no spillovers between the treated and control units. This may be unrealistic. We characterize the estimand of RDD in a setting where spillovers occur between units that are close in their values of the running variable. Under the assumption that spillovers are linearinmeans, we show that the estimand depends on the ratio of two terms: (1) the radius over which spillovers occur and (2) the choice of bandwidth used for the local linear regression. Specifically, RDD estimates direct treatment effect when radius is of larger order than the bandwidth, and total treatment effect when radius is of smaller order than the bandwidth. In the more realistic regime where radius is of similar order as the bandwidth, the RDD estimand is a mix of the above effects. To recover direct and spillover effects, we propose incorporating estimated spillover terms into local linear regression  the local analog of peer effects regression. We also clarify the settings under which the donuthole RD is able to eliminate the effects of spillovers. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.06471&r=ecm 
By:  Cisil Sarisoy; Bas J.M. Werker 
Abstract:  This paper analyzes the properties of expected return estimators on individual assets implied by the linear factor models of asset pricing, i.e., the product of β and λ. We provide the asymptotic properties of factormodelbased expected return estimators, which yield the standard errors for risk premium estimators for individual assets. We show that using factormodelbased risk premium estimates leads to sizable precision gains compared to using historical averages. Finally, inference about expected returns does not suffer from a smallbeta bias when factors are traded. The more precise factormodelbased estimates of expected returns translate into sizable improvements in outofsample performance of optimal portfolios. 
Keywords:  Cross section of expected returns; Risk premium; Small β’s 
JEL:  C13 G11 C38 
Date:  2024–03–28 
URL:  http://d.repec.org/n?u=RePEc:fip:fedgfe:202414&r=ecm 
By:  Christopher P. Chambers; Christopher Turansick 
Abstract:  We study identification and linear independence in random utility models. We characterize the dimension of the random utility model as the cyclomatic complexity of a specific graphical representation of stochastic choice data. We show that, as the number of alternatives grows, any linearly independent set of preferences is a vanishingly small subset of the set of all preferences. We introduce a new condition on sets of preferences which is sufficient for linear independence. We demonstrate by example that the condition is not necessary, but is strictly weaker than other existing sufficient conditions. 
Date:  2024–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2403.13773&r=ecm 
By:  Eric Luxenberg; Stephen Boyd 
Abstract:  An exponentially weighted moving model (EWMM) for a vector time series fits a new data model each time period, based on an exponentially fading loss function on past observed data. The well known and widely used exponentially weighted moving average (EWMA) is a special case that estimates the mean using a square loss function. For quadratic loss functions EWMMs can be fit using a simple recursion that updates the parameters of a quadratic function. For other loss functions, the entire past history must be stored, and the fitting problem grows in size as time increases. We propose a general method for computing an approximation of EWMM, which requires storing only a window of a fixed number of past samples, and uses an additional quadratic term to approximate the loss associated with the data before the window. This approximate EWMM relies on convex optimization, and solves problems that do not grow with time. We compare the estimates produced by our approximation with the estimates from the exact EWMM method. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.08136&r=ecm 
By:  Willem P Sijp; Anastasios Panagiotelis 
Abstract:  A new methodology is proposed to approximate the timedependent house price distribution at a fine regional scale using Gaussian mixtures. The means, variances and weights of the mixture components are related to time, location and dwelling type through a non linear function trained by a deep functional approximator. Price indices are derived as means, medians, quantiles or other functions of the estimated distributions. Price densities for larger regions, such as a city, are calculated via a weighted sum of the component density functions. The method is applied to a data set covering all of Australia at a fine spatial and temporal resolution. In addition to enabling a detailed exploration of the data, the proposed index yields lower prediction errors in the practical task of individual dwelling price projection from previous sales values within the three major Australian cities. The estimated quantiles are also found to be well calibrated empirically, capturing the complexity of house price distributions. 
Date:  2024–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2404.05178&r=ecm 