nep-ecm New Economics Papers
on Econometrics
Issue of 2023‒04‒10
nineteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Identification-robust inference for the LATE with high-dimensional covariates By Yukun Ma
  2. Inference of Grouped Time-Varying Network Vector Autoregression Models By Degui Li; Bin Peng; Songqiao Tang; Weibiao Wu
  3. Heteroskedasticity and Clustered Covariances from a Bayesian Perspective By Lewis, Gabriel
  4. Bootstrap based asymptotic refinements for high-dimensional nonlinear models By Joel L. Horowitz; Ahnaf Rafi
  5. Identification- and many instrument-robust inference via invariant moment conditions By Tom Boot; Johannes W. Ligtenberg
  6. Improved inference in financial factor models By Elliot Beck; Gianluca De Nard; Michael Wolf
  7. A Framework for the Estimation of Demand for Differentiated Products with Simultaneous Consumer Search By Josè L. Moraga González; Zsolt Sándor; Matthijs Wildenbeest
  8. Counterfactual Copula and Its Application to the Effects of College Education on Intergenerational Mobility By Tsung-Chih Lai; Jiun-Hua Su
  9. A User's Guide to Inference in Models Defined by Moment Inequalities By Ivan Canay; Gastón Illanes; Amilcar Velez
  10. Estimation of Probabilities for Ordered Sets and Application to Calibration of Rating Models By Gustavo F. Serenelli; Emiliano Delfau
  11. Adaptive Estimation of Intersection Bounds: a Classification Approach By Vira Semenova
  12. A Sufficient Statistical Test for Dynamic Stability By Ahmed, Muhammad Ashfaq; Nawaz, Nasreen
  13. Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis By Sylvia Fr\"uhwirth-Schnatter
  14. Understanding Model Complexity for temporal tabular and multi-variate time series, case study with Numerai data science tournament By Thomas Wong; Prof. Mauricio Barahona
  15. Machine Learning as a Tool for Hypothesis Generation By Jens Ludwig; Sendhil Mullainathan
  16. Conditional and unconditional quantile regressions By Federico Favata; Gabriel Rojas Montes; Martín Trombetta; Javier Alejo
  17. Threshold autoregressive model blind identification based on array clustering By Jean-Marc Le Caillec
  18. Extreme expectile estimation for short-tailed data, with an application to market risk assessment By Daouia, Abdelaati; Padoan, Simone A.; Stupfler, Gilles
  19. Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices By Jonathan Berrisch; Florian Ziel

  1. By: Yukun Ma
    Abstract: This paper investigates the local average treatment effect (LATE) with high-dimensional covariates, regardless of the strength of identification. We propose a novel test statistic for the high-dimensional LATE, and show that our test has uniformly correct asymptotic size. Applying the double/debiased machine learning (DML) method to estimate nuisance parameters, we develop easy-to-implement algorithms for inference/confidence interval of the high-dimensional LATE. Simulations indicate that our test is efficient in the strongly identified LATE model.
    Date: 2023–02
  2. By: Degui Li; Bin Peng; Songqiao Tang; Weibiao Wu
    Abstract: This paper considers statistical inference of time-varying network vector autoregression models for large-scale time series. A latent group structure is imposed on the heterogeneous and node-specific time-varying momentum and network spillover effects so that the number of unknown time-varying coefficients to be estimated can be reduced considerably. A classic agglomerative clustering algorithm with normalized distance matrix estimates is combined with a generalized information criterion to consistently estimate the latent group number and membership. A post-grouping local linear smoothing method is proposed to estimate the group-specific time-varying momentum and network effects, substantially improving the convergence rates of the preliminary estimates which ignore the latent structure. In addition, a post-grouping specification test is conducted to verify the validity of the parametric model assumption for group-specific time-varying coefficient functions, and the asymptotic theory is derived for the test statistic constructed via a kernel weighted quadratic form under the null and alternative hypotheses. Numerical studies including Monte-Carlo simulation and an empirical application to the global trade flow data are presented to examine the finite-sample performance of the developed model and methodology.
    Date: 2023–03
  3. By: Lewis, Gabriel
    Abstract: We show that root-n-consistent heteroskedasticity-robust and cluster-robust regression estimators and confidence intervals can be derived from fully Bayesian models of population sampling. In our model, the vexed question of how and when to “cluster” is answered by the sampling design encoded in the model: simple random sampling implies a heteroskedasticity-robust Bayesian estimator, and clustered sampling implies a cluster-robust Bayesian estimator, providing a Bayesian parallel to the work of Abadie et al. (2017). Our model is based on the Finite Dirichlet Process (FDP), a well-studied population sampling process that apparently originates with R.A. Fisher, and our findings may not be surprising to readers familiar with the frequentist properties of the closely related Bayesian Bootstrap, Dirichlet Process, and Efron “pairs” or “block” bootstraps. However, our application of FDP to robust regression is novel, and it fills a gap concerning Bayesian cluster-robust regression. Our approach has several advantages over related methods: we present a full probability model with clear assumptions about a sampling design, one that does not assume that all possible data-values have been observed (unlike many bootstrap procedures); and our posterior estimates and credible intervals can be regularized toward reasonable prior values in small samples, while achieving the desirable frequency properties of a bootstrap in moderate and large samples. However, our approach also illustrates some limitations of “robust” procedures.
    Keywords: Bayesian; Heteroskedastic; Clustered Covariance; Robust Covariance; Sandwich Estimator
    JEL: C1 C11 C14 C5 C83
    Date: 2022–12–14
  4. By: Joel L. Horowitz; Ahnaf Rafi
    Abstract: We consider penalized extremum estimation of a high-dimensional, possibly nonlinear model that is sparse in the sense that most of its parameters are zero but some are not. We use the SCAD penalty function, which provides model selection consistent and oracle efficient estimates under suitable conditions. However, asymptotic approximations based on the oracle model can be inaccurate with the sample sizes found in many applications. This paper gives conditions under which the bootstrap, based on estimates obtained through SCAD penalization with thresholding, provides asymptotic refinements of size O (n−2) for the error in the rejection (coverage) probability of a symmetric hypothesis test (confidence interval) and O (n−1) for the error in rejection (coverage) probability of a one-sided or equal tailed test (confidence interval). The results of Monte Carlo experiments show that the bootstrap can provide large reductions in errors in coverage probabilities. The bootstrap is consistent, though it does not necessarily provide asymptotic refinements, even if some parameters are close but not equal to zero. Random-coefficients logit and probit models and nonlinear moment models are examples of models to which the procedure applies.
    Date: 2023–03–29
  5. By: Tom Boot; Johannes W. Ligtenberg
    Abstract: Identification-robust hypothesis tests are commonly based on the continuous updating objective function or its score. When the number of moment conditions grows proportionally with the sample size, the large-dimensional weighting matrix prohibits the use of conventional asymptotic approximations and the behavior of these tests remains unknown. We show that the structure of the weighting matrix opens up an alternative route to asymptotic results when, under the null hypothesis, the distribution of the moment conditions is reflection invariant. In a heteroskedastic linear instrumental variables model, we then establish asymptotic normality of conventional tests statistics under many instrument sequences. A key result is that the additional terms that appear in the variance are negative. Revisiting a study on the elasticity of substitution between immigrant and native workers where the number of instruments is over a quarter of the sample size, the many instrument-robust approximation indeed leads to substantially narrower confidence intervals.
    Date: 2023–03
  6. By: Elliot Beck; Gianluca De Nard; Michael Wolf
    Abstract: Conditional heteroskedasticity of the error terms is a common occurrence in financial factor models, such as the CAPM and Fama-French factor models. This feature necessitates the use of heteroskedasticity consistent (HC) standard errors to make valid inference for regression coefficients. In this paper, we show that using weighted least squares (WLS) or adaptive least squares (ALS) to estimate model parameters generally leads to smaller HC standard errors compared to ordinary least squares (OLS), which translates into improved inference in the form of shorter confidence intervals and more powerful hypothesis tests. In an extensive empirical analysis based on historical stock returns and commonly used factors, we find that conditional heteroskedasticity is pronounced and that WLS and ALS can dramatically shorten confidence intervals compared to OLS, especially during times of financial turmoil.
    Keywords: CAPM, conditional heteroskedasticity, factor models, HC standard errors
    JEL: C12 C13 C21
    Date: 2023–03
  7. By: Josè L. Moraga González (Vrije Universiteit Amsterdam); Zsolt Sándor (Sapientia Hungarian University); Matthijs Wildenbeest (University of Arizona)
    Abstract: We propose a tractable method for estimation of a simultaneous search model for differentiated products that allows for observed and unobserved heterogeneity in both preferences and search costs. We show that for type I extreme value distributed search costs, expressions for search and purchase probabilities can be obtained in closed form. We show that our search model belongs to the generalized extreme value (GEV) class, which implies that it has a full information discrete-choice equivalent, and hence search data are necessary to distinguish between the search model and the equivalent full information model. We allow for price endogeneity when estimating the model and show how to obtain parameter estimates using a combination of aggregate market share data and individual level data on search and purchases. To deal with the dimensionality problem that typically arises in search models due to a large number of consideration sets we propose a novel Monte Carlo estimator for the search and purchase probabilities. Monte Carlo experiments highlight the importance of allowing for sufficient consumer heterogeneity when doing policy counterfactuals and show that our Monte Carlo estimator is accurate and computationally fast. Finally, a behavioral assumption on how consumers search provides a micro-foundation for consideration probabilities widely used in the literature.
    Keywords: demand estimation, price endogeneity, simultaneous search, differentiated products
    JEL: C14 D83 L13
    Date: 2023–03–20
  8. By: Tsung-Chih Lai; Jiun-Hua Su
    Abstract: This paper proposes a nonparametric estimator of the counterfactual copula of two outcome variables that would be affected by a policy intervention. The proposed estimator allows policymakers to conduct ex-ante evaluations by comparing the estimated counterfactual and actual copulas as well as their corresponding measures of association. Asymptotic properties of the counterfactual copula estimator are established under regularity conditions. These conditions are also used to validate the nonparametric bootstrap for inference on counterfactual quantities. Simulation results indicate that our estimation and inference procedures perform well in moderately sized samples. Applying the proposed method to studying the effects of college education on intergenerational income mobility under two counterfactual scenarios, we find that while providing some college education to all children is unlikely to promote mobility, offering a college degree to children from less educated families can significantly reduce income persistence across generations.
    Date: 2023–03
  9. By: Ivan Canay; Gastón Illanes; Amilcar Velez
    Abstract: Models defined by moment inequalities have become a standard modeling framework for empirical economists, spreading over a wide range of fields within economics. From the point of view of an empirical researcher, the literature on inference in moment inequality models is large and complex, including multiple survey papers that document the non-standard features these models possess, the main novel concepts behind inference in these models, and the most recent developments that bring advances in accuracy and computational tractability. In this paper we present a guide to empirical practice intended to help applied researchers navigate all the decisions required to frame a model as a moment inequality model and then to construct confidence intervals for the parameters of interest. We divide our template into four main steps: (a) a behavioral decision model, (b) moving from the decision model to a moment inequality model, (c) choosing a test statistic and critical value, and (d) accounting for computational challenges. We split each of these steps into a discussion of the “how” and the ”why”, and then illustrate how to take these steps to practice in an empirical application that studies identification of expected sunk costs of offering a product in a market.
    JEL: C12 C14
    Date: 2023–03
  10. By: Gustavo F. Serenelli; Emiliano Delfau
    Abstract: The goal of this document is to present a methodology for estimating probabilities for ordered sets. This may have several practical applications such as calibration of Rating Models, estimation of Mortality Tables or measurement of side effects related to different doze sizes. In order to do this, an Objective / Non Informative Bayesian approach is applied, through which, using a multidimensional Jeffreys prior, a posterior distribution may be inferred for each of the probabilities being estimated
    Keywords: Bayesian estimation, probability estimation, uninformative prior, rating calibration, low default, ordered sets, jeffreys prior.
    Date: 2023–03
  11. By: Vira Semenova
    Abstract: This paper studies averages of intersection bounds -- the bounds defined by the infimum of a collection of regression functions -- and other similar functionals of these bounds, such as averages of saddle values. Examples of such parameters are Frechet-Hoeffding bounds, Makarov (1981) bounds on distributional effects. The proposed estimator classifies covariate values into the regions corresponding to the identity of the binding regression function and takes the sample average. The paper shows that the proposed moment function is insensitive to first-order classification mistakes, enabling various nonparametric and regularized/machine learning classifiers in the first (classification) step. The result is generalized to cover bounds on the values of linear programming problems and best linear predictor of intersection bounds.
    Date: 2023–03
  12. By: Ahmed, Muhammad Ashfaq; Nawaz, Nasreen
    Abstract: In the existing Statistics and Econometrics literature, there does not exist a statistical test which may test for all kinds of roots of the characteristic polynomial leading to an unstable dynamic response, i.e., positive and negative real unit roots, complex unit roots and the roots lying inside the unit circle. This paper develops a test which is sufficient to prove dynamic stability (in the context of roots of the characteristic polynomial) of a univariate as well as a multivariate time series without having a structural break. It covers all roots (positive and negative real unit roots, complex unit roots and the roots inside the unit circle whether single or multiple) which may lead to an unstable dynamic response. Furthermore, it also indicates the number of roots causing instability in the time series. The test is much simpler in its application as compared to the existing tests as the series is strictly stationary under the null.
    Keywords: Dynamic stability, Real and complex roots, Unit circle
    JEL: C01 C12
    Date: 2023–03–16
  13. By: Sylvia Fr\"uhwirth-Schnatter
    Abstract: The paper discusses shrinkage priors which impose increasing shrinkage in a sequence of parameters. We review the cumulative shrinkage process (CUSP) prior of Legramanti et al. (2020), which is a spike-and-slab shrinkage prior where the spike probability is stochastically increasing and constructed from the stick-breaking representation of a Dirichlet process prior. As a first contribution, this CUSP prior is extended by involving arbitrary stick-breaking representations arising from beta distributions. As a second contribution, we prove that exchangeable spike-and-slab priors, which are popular and widely used in sparse Bayesian factor analysis, can be represented as a finite generalized CUSP prior, which is easily obtained from the decreasing order statistics of the slab probabilities. Hence, exchangeable spike-and-slab shrinkage priors imply increasing shrinkage as the column index in the loading matrix increases, without imposing explicit order constraints on the slab probabilities. An application to sparse Bayesian factor analysis illustrates the usefulness of the findings of this paper. A new exchangeable spike-and-slab shrinkage prior based on the triple gamma prior of Cadonna et al. (2020) is introduced and shown to be helpful for estimating the unknown number of factors in a simulation study.
    Date: 2023–03
  14. By: Thomas Wong; Prof. Mauricio Barahona
    Abstract: In this paper, we explore the use of different feature engineering and dimensionality reduction methods in multi-variate time-series modelling. Using a feature-target cross correlation time series dataset created from Numerai tournament, we demonstrate under over-parameterised regime, both the performance and predictions from different feature engineering methods converge to the same equilibrium, which can be characterised by the reproducing kernel Hilbert space. We suggest a new Ensemble method, which combines different random non-linear transforms followed by ridge regression for modelling high dimensional time-series. Compared to some commonly used deep learning models for sequence modelling, such as LSTM and transformers, our method is more robust (lower model variance over different random seeds and less sensitive to the choice of architecture) and more efficient. An additional advantage of our method is model simplicity as there is no need to use sophisticated deep learning frameworks such as PyTorch. The learned feature rankings are then applied to the temporal tabular prediction problem in the Numerai tournament, and the predictive power of feature rankings obtained from our method is better than the baseline prediction model based on moving averages
    Date: 2023–03
  15. By: Jens Ludwig; Sendhil Mullainathan
    Abstract: While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.
    JEL: B4 C01
    Date: 2023–03
  16. By: Federico Favata; Gabriel Rojas Montes; Martín Trombetta; Javier Alejo
    Keywords: Quantile regression, unconditional quantile regression, in uence functions
    JEL: J01 J31
    Date: 2021–11
  17. By: Jean-Marc Le Caillec (IMT Atlantique - ITI - Département lmage et Traitement Information - IMT Atlantique - IMT Atlantique - IMT - Institut Mines-Télécom [Paris], Lab-STICC_M3 - Equipe Marine Mapping & Metrology - Lab-STICC - Laboratoire des sciences et techniques de l'information, de la communication et de la connaissance - ENIB - École Nationale d'Ingénieurs de Brest - UBS - Université de Bretagne Sud - UBO - Université de Brest - ENSTA Bretagne - École Nationale Supérieure de Techniques Avancées Bretagne - IMT - Institut Mines-Télécom [Paris] - CNRS - Centre National de la Recherche Scientifique - UBL - Université Bretagne Loire - IMT Atlantique - IMT Atlantique - IMT - Institut Mines-Télécom [Paris])
    Abstract: In this paper, we propose a new algorithm to estimate all the parameters of a Self Exited Threshold AutoRegressive (SETAR) model from an observed time series. The aim of this algorithm is to relax all the hypotheses concerning the SETAR model for instance, the knowledge (or assumption) of the number of regimes, the switching variables, as well as of the switching function. For this, we reverse the usual framework of SETAR model identification of the previous papers, by first identifying the AR models using array clustering (instead of the switching variables and function) and second the switching conditions (instead of the AR models). The proposed algorithm is a pipeline of well-known algorithms in image/data processing allowing us to deal with the statistical non-stationarity of the observed time series. We pay a special attention on the results of each step over the possible discrepancies over the following step. Since we do not assume any SETAR model property, asymptotical properties of the identification results are difficult to derive. Thus, we validate our approach on several experiment sets. In order to assess the performance of our algorithm, we introduce global metrics and ancillary metrics to validate each step of the proposed algorithm.
    Date: 2021–07
  18. By: Daouia, Abdelaati; Padoan, Simone A.; Stupfler, Gilles
    Abstract: The use of expectiles in risk management has recently gathered remarkable momentum due to their excellent axiomatic and probabilistic properties. In particular, the class of elicitable law-invariant coherent risk measures only consists of expectiles. While the theory of expectile estimation at central levels is substantial, tail estima- tion at extreme levels has so far only been considered when the tail of the underlying distribution is heavy. This article is the first work to handle the short-tailed setting where the loss (e.g. negative log-returns) distribution of interest is bounded to the right and the corresponding extreme value index is negative. We derive an asymptotic expansion of tail expectiles in this challenging context under a general second-order extreme value condition, which allows to come up with two semiparametric estima- tors of extreme expectiles, and with their asymptotic properties in a general model of strictly stationary but weakly dependent observations. A simulation study and a real data analysis from a forecasting perspective are performed to verify and compare the proposed competing estimation procedures.
    Keywords: Expectiles; Extreme values; Second-order condition; Short tails; Weak dependence
    Date: 2023–03–07
  19. By: Jonathan Berrisch; Florian Ziel
    Abstract: This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts, taking into account dependencies between quantiles and covariates through a smoothing procedure that allows for online learning. Two smoothing methods are discussed: dimensionality reduction using Basis matrices and penalized smoothing. The new online learning algorithm generalizes the standard CRPS learning framework into multivariate dimensions. It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic learning properties. We provide an in-depth discussion on possible extensions of the algorithm and several nested cases related to the existing literature on online forecast combination. The methodology is applied to forecasting day-ahead electricity prices, which are 24-dimensional distributional forecasts. The proposed method yields significant improvements over uniform combination in terms of continuous ranked probability score (CRPS). We discuss the temporal evolution of the weights and hyperparameters and present the results of reduced versions of the preferred model. A fast C++ implementation of all discussed methods is provided in the R-Package profoc.
    Date: 2023–03

This nep-ecm issue is ©2023 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.