|
on Econometrics |
By: | Jiatong Li; Hongqiang Yan |
Abstract: | We develop uniform inference for high-dimensional threshold regression parameters and valid inference for the threshold parameter in this paper. We first establish oracle inequalities for prediction errors and $\ell_1$ estimation errors for the Lasso estimator of the slope parameters and the threshold parameter, allowing for heteroskedastic non-subgaussian error terms and non-subgaussian covariates. Next, we derive the asymptotic distribution of tests involving an increasing number of slope parameters by debiasing (or desparsifying) the scaled Lasso estimator. The asymptotic distribution of tests without the threshold effect is identical to that with a fixed effect. Moreover, we perform valid inference for the threshold parameter using subsampling method. Finally, we conduct simulation studies to demonstrate the performance of our method in finite samples. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.08105&r=ecm |
By: | Dmitry Arkhangelsky; Aleksei Samkov |
Abstract: | We study the estimation of treatment effects of a binary policy in environments with a staggered treatment rollout. We propose a new estimator -- Sequential Synthetic Difference in Difference (Sequential SDiD) -- and establish its theoretical properties in a linear model with interactive fixed effects. Our estimator is based on sequentially applying the original SDiD estimator proposed in Arkhangelsky et al. (2021) to appropriately aggregated data. To establish the theoretical properties of our method, we compare it to an infeasible OLS estimator based on the knowledge of the subspaces spanned by the interactive fixed effects. We show that this OLS estimator has a sequential representation and use this result to show that it is asymptotically equivalent to the Sequential SDiD estimator. This result implies the asymptotic normality of our estimator along with corresponding efficiency guarantees. The method developed in this paper presents a natural alternative to the conventional DiD strategies in staggered adoption designs. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.00164&r=ecm |
By: | Vogelsang, Timothy J. (Department of Economics, Michigan State University); Wagner, Martin (Department of Economics, University of Klagenfurt, Bank of Slovenia, Ljubljana and Institute for Advanced Studies, Vienna) |
Abstract: | This paper shows that the integrated modified OLS (IM-OLS) estimator developed for cointegrating linear regressions in Vogelsang and Wagner (2014a) can be straightforwardly extended to cointegrating multivariate polynomial regressions. These are regression models that include as explanatory variables deterministic variables, integrated processes and products of (non-negative) integer powers of these variables as regressors. The stationary errors are allowed to be serially correlated and the regressors are allowed to be endogenous. The IM-OLS estimator is tuningparameter free and does not require the estimation of any long-run variances. A scalar long-run variance, however, has to be estimated and scaled out when using IM-OLS for inference. In this respect, we consider both standard asymptotic inference as well as fixed-b inference. Fixed-b inference requires that the regression model is of full design. The results may be particularly interesting for specification testing of cointegrating relationships, with RESET-type specification tests following immediately. The simulation section also zooms in on RESET specification testing and illustrates that the performance of IM-OLS is qualitatively comparable to its performance in cointegrating linear regressions. |
Keywords: | Cointegration, fixed-b asymptotics, IM-OLS, multivariate polynomials, nonlinearity, RESET |
JEL: | C12 C13 C32 |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:ihs:ihswps:53&r=ecm |
By: | Bai, Jushan; Wang, Peng |
Abstract: | We propose a framework for causal inference using factor models. We base our identification strategy on the assumption that policy interventions cause structural breaks in the factor loadings for the treated units. The method allows heterogeneous trends and is easy to implement. We compare our method with the synthetic control methods of Abadie, et al (2010, 2015), and obtain similar results. Additionally, we provide confidence intervals for the causal effects. Our approach expands the toolset for causal inference. |
Keywords: | synthetic control, difference-in-differences, structural breaks, latent factors. |
JEL: | C1 C23 C33 C51 |
Date: | 2024–03–31 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:120585&r=ecm |
By: | Lena S. Bjerkander; Jonas Dovern; Hans Manner |
Abstract: | We review tests of null hypotheses that consist of many subsidiary null hypotheses, including tests that have not received much attention in the econometrics literature. We study test performance in the context of specification testing for linear regressions based on a Monte Carlo study. Overall, parametric tests that use (transformed) P-values corresponding to all subsidiary null hypotheses outperform the well-known minimum P-value test and a recently proposed test that relies on the non-parametric estimation of the joint density of all subsidiary test statistics. |
Keywords: | combined hypothesis, P-value, multiple hypothesis testing, Fisher test |
JEL: | C12 C15 |
Date: | 2024 |
URL: | http://d.repec.org/n?u=RePEc:ces:ceswps:_11027&r=ecm |
By: | Matteo Mogliani; Anna Simoni |
Abstract: | We propose a Machine Learning approach for optimal macroeconomic forecasting in a high-dimensional setting with covariates presenting a known group structure. Our model encompasses forecasting settings with many series, mixed frequencies, and unknown nonlinearities. We introduce in time-series econometrics the concept of bi-level sparsity, i.e. sparsity holds at both the group level and within groups, and we assume the true model satisfies this assumption. We propose a prior that induces bi-level sparsity, and the corresponding posterior distribution is demonstrated to contract at the minimax-optimal rate, recover the model parameters, and have a support that includes the support of the model asymptotically. Our theory allows for correlation between groups, while predictors in the same group can be characterized by strong covariation as well as common characteristics and patterns. Finite sample performance is illustrated through comprehensive Monte Carlo experiments and a real-data nowcasting exercise of the US GDP growth rate. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.02671&r=ecm |
By: | Nir Billfeld; Moshe Kim |
Abstract: | We develop a novel identification strategy as well as a new estimator for context-dependent causal inference in non-parametric triangular models with non-separable disturbances. Departing from the common practice, our analysis does not rely on the strict monotonicity assumption. Our key contribution lies in leveraging on diffusion models to formulate the structural equations as a system evolving from noise accumulation to account for the influence of the latent context (confounder) on the outcome. Our identifiability strategy involves a system of Fredholm integral equations expressing the distributional relationship between a latent context variable and a vector of observables. These integral equations involve an unknown kernel and are governed by a set of structural form functions, inducing a non-monotonic inverse problem. We prove that if the kernel density can be represented as an infinite mixture of Gaussians, then there exists a unique solution for the unknown function. This is a significant result, as it shows that it is possible to solve a non-monotonic inverse problem even when the kernel is unknown. On the methodological front we leverage on a novel and enriched Contaminated Generative Adversarial (Neural) Networks (CONGAN) which we provide as a solution to the non-monotonic inverse problem. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.05021&r=ecm |
By: | James A. Duffy; Sophocles Mavroeidis |
Abstract: | While it is widely recognised that linear (structural) VARs may omit important features of economic time series, the use of nonlinear SVARs has to date been almost entirely confined to the modelling of stationary time series, because of a lack of understanding as to how common stochastic trends may be accommodated within nonlinear VAR models. This has unfortunately circumscribed the range of series to which such models can be applied -- and/or required that these series be first transformed to stationarity, a potential source of misspecification -- and prevented the use of long-run identifying restrictions in these models. To address these problems, we develop a flexible class of additively time-separable nonlinear SVARs, which subsume models with threshold-type endogenous regime switching, both of the piecewise linear and smooth transition varieties. We extend the Granger-Johansen representation theorem to this class of models, obtaining conditions that specialise exactly to the usual ones when the model is linear. We further show that, as a corollary, these models are capable of supporting the same kinds of long-run identifying restrictions as are available in linear cointegrated SVARs. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.05349&r=ecm |
By: | Anirban Mukherjee; Hannah Hanwen Chang |
Abstract: | Social science research often hinges on the relationship between categorical variables and outcomes. We introduce CAVIAR, a novel method for embedding categorical variables that assume values in a high-dimensional ambient space but are sampled from an underlying manifold. Our theoretical and numerical analyses outline challenges posed by such categorical variables in causal inference. Specifically, dynamically varying and sparse levels can lead to violations of the Donsker conditions and a failure of the estimation functionals to converge to a tight Gaussian process. Traditional approaches, including the exclusion of rare categorical levels and principled variable selection models like LASSO, fall short. CAVIAR embeds the data into a lower-dimensional global coordinate system. The mapping can be derived from both structured and unstructured data, and ensures stable and robust estimates through dimensionality reduction. In a dataset of direct-to-consumer apparel sales, we illustrate how high-dimensional categorical variables, such as zip codes, can be succinctly represented, facilitating inference and analysis. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.04979&r=ecm |
By: | Artem Kraevskiy; Artem Prokhorov; Evgeniy Sokolovskiy |
Abstract: | We develop and apply a new online early warning system (EWS) for what is known in machine learning as concept drift, in economics as a regime shift and in statistics as a change point. The system goes beyond linearity assumed in many conventional methods, and is robust to heavy tails and tail-dependence in the data, making it particularly suitable for emerging markets. The key component is an effective change-point detection mechanism for conditional entropy of the data, rather than for a particular indicator of interest. Combined with recent advances in machine learning methods for high-dimensional random forests, the mechanism is capable of finding significant shifts in information transfer between interdependent time series when traditional methods fail. We explore when this happens using simulations and we provide illustrations by applying the method to Uzbekistan's commodity and equity markets as well as to Russia's equity market in 2021-2023. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.03319&r=ecm |
By: | Peter Reinhard Hansen; Chen Tong |
Abstract: | We introduce a new class of multivariate heavy-tailed distributions that are convolutions of heterogeneous multivariate t-distributions. Unlike commonly used heavy-tailed distributions, the multivariate convolution-t distributions embody cluster structures with flexible nonlinear dependencies and heterogeneous marginal distributions. Importantly, convolution-t distributions have simple density functions that facilitate estimation and likelihood-based inference. The characteristic features of convolution-t distributions are found to be important in an empirical analysis of realized volatility measures and help identify their underlying factor structure. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.00864&r=ecm |
By: | Eric Auerbach; Yong Cai; Ahnaf Rafi |
Abstract: | Researchers who estimate treatment effects using a regression discontinuity design (RDD) typically assume that there are no spillovers between the treated and control units. This may be unrealistic. We characterize the estimand of RDD in a setting where spillovers occur between units that are close in their values of the running variable. Under the assumption that spillovers are linear-in-means, we show that the estimand depends on the ratio of two terms: (1) the radius over which spillovers occur and (2) the choice of bandwidth used for the local linear regression. Specifically, RDD estimates direct treatment effect when radius is of larger order than the bandwidth, and total treatment effect when radius is of smaller order than the bandwidth. In the more realistic regime where radius is of similar order as the bandwidth, the RDD estimand is a mix of the above effects. To recover direct and spillover effects, we propose incorporating estimated spillover terms into local linear regression -- the local analog of peer effects regression. We also clarify the settings under which the donut-hole RD is able to eliminate the effects of spillovers. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.06471&r=ecm |
By: | Cisil Sarisoy; Bas J.M. Werker |
Abstract: | This paper analyzes the properties of expected return estimators on individual assets implied by the linear factor models of asset pricing, i.e., the product of β and λ. We provide the asymptotic properties of factor--model--based expected return estimators, which yield the standard errors for risk premium estimators for individual assets. We show that using factor-model-based risk premium estimates leads to sizable precision gains compared to using historical averages. Finally, inference about expected returns does not suffer from a small--beta bias when factors are traded. The more precise factor--model--based estimates of expected returns translate into sizable improvements in out--of--sample performance of optimal portfolios. |
Keywords: | Cross section of expected returns; Risk premium; Small β’s |
JEL: | C13 G11 C38 |
Date: | 2024–03–28 |
URL: | http://d.repec.org/n?u=RePEc:fip:fedgfe:2024-14&r=ecm |
By: | Christopher P. Chambers; Christopher Turansick |
Abstract: | We study identification and linear independence in random utility models. We characterize the dimension of the random utility model as the cyclomatic complexity of a specific graphical representation of stochastic choice data. We show that, as the number of alternatives grows, any linearly independent set of preferences is a vanishingly small subset of the set of all preferences. We introduce a new condition on sets of preferences which is sufficient for linear independence. We demonstrate by example that the condition is not necessary, but is strictly weaker than other existing sufficient conditions. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.13773&r=ecm |
By: | Eric Luxenberg; Stephen Boyd |
Abstract: | An exponentially weighted moving model (EWMM) for a vector time series fits a new data model each time period, based on an exponentially fading loss function on past observed data. The well known and widely used exponentially weighted moving average (EWMA) is a special case that estimates the mean using a square loss function. For quadratic loss functions EWMMs can be fit using a simple recursion that updates the parameters of a quadratic function. For other loss functions, the entire past history must be stored, and the fitting problem grows in size as time increases. We propose a general method for computing an approximation of EWMM, which requires storing only a window of a fixed number of past samples, and uses an additional quadratic term to approximate the loss associated with the data before the window. This approximate EWMM relies on convex optimization, and solves problems that do not grow with time. We compare the estimates produced by our approximation with the estimates from the exact EWMM method. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.08136&r=ecm |
By: | Willem P Sijp; Anastasios Panagiotelis |
Abstract: | A new methodology is proposed to approximate the time-dependent house price distribution at a fine regional scale using Gaussian mixtures. The means, variances and weights of the mixture components are related to time, location and dwelling type through a non linear function trained by a deep functional approximator. Price indices are derived as means, medians, quantiles or other functions of the estimated distributions. Price densities for larger regions, such as a city, are calculated via a weighted sum of the component density functions. The method is applied to a data set covering all of Australia at a fine spatial and temporal resolution. In addition to enabling a detailed exploration of the data, the proposed index yields lower prediction errors in the practical task of individual dwelling price projection from previous sales values within the three major Australian cities. The estimated quantiles are also found to be well calibrated empirically, capturing the complexity of house price distributions. |
Date: | 2024–04 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2404.05178&r=ecm |