nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒04‒11
twelve papers chosen by
Sune Karlsson
Örebro universitet

  1. Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data By Bu, R.; Li, D.; Linton, O.; Wang, H.
  2. Long-term Causal Inference Under Persistent Confounding via Data Combination By Guido Imbens; Nathan Kallus; Xiaojie Mao; Yuhao Wang
  3. Optimal pooling and distributed inference for the tail index and extreme quantiles By Daouia, Abdelaati; Padoan, Simone A.; Stupfler, Gilles
  4. Inference for extremal regression with dependent heavy-tailed data By Daouia, Abdelaati; Stupfler, Gilles; Usseglio-Carleve, Antoine
  5. Asymptotics of Cointegration Tests for High-Dimensional VAR($k$) By Anna Bykhovskaya; Vadim Gorin
  6. A generalized framework for estimating spatial econometric interaction models By Dargel, Lukas; Thomas-Agnan, Christine
  7. Dynamic Autoregressive Liquidity (DArLiQ) By Hafner, Christian; Linton, Oliver; Wang, Linqi
  8. Revisiting the Great Ratios Hypothesis By Chudik, A.; Pesaran, M. H.; Smith, R. P.
  9. Real-time nowcasting with sparse factor models By Hauber, Philipp
  10. Detecting outliers in compositional data using Invariant Coordinate Selection By Ruiz-Gazen, Anne; Thomas-Agnan, Christine; Laurent, Thibault; Mondon, Camille
  11. Deep Regression Ensembles By Antoine Didisheim; Bryan T. Kelly; Semyon Malamud
  12. Evaluating probabilistic forecasts of football matches: the case against the ranked probability score By Wheatcroft, Edward

  1. By: Bu, R.; Li, D.; Linton, O.; Wang, H.
    Abstract: In this paper, we consider estimating spot/instantaneous volatility matrices of high-frequency data collected for a large number of assets. We first combine classic nonparametric kernel-based smoothing with a generalised shrinkage technique in the matrix estimation for noise-free data under a uniform sparsity assumption, a natural extension of the approximate sparsity commonly used in the literature. The uniform consistency property is derived for the proposed spot volatility matrix estimator with convergence rates comparable to the optimal minimax one. For the highfrequency data contaminated by the microstructure noise, we introduce a localised pre-averaging estimation method in the high-dimensional setting which first pre-whitens data via a kernel filter and then uses the estimation tool developed in the noise-free scenario, and further derive the uniform convergence rates for the developed spot volatility matrix estimator. In addition, we also combine the kernel smoothing with the shrinkage technique to estimate the time-varying volatility matrix of the high-dimensional noise vector, and establish the relevant uniform consistency result. Numerical studies are provided to examine performance of the proposed estimation methods in finite samples.
    Keywords: Brownian semi-martingale, Kernel smoothing, Microstructure noise, Sparsity, Spot volatility matrix, Uniform consistency
    JEL: C10 C14 C22
    Date: 2022–03–16
  2. By: Guido Imbens; Nathan Kallus; Xiaojie Mao; Yuhao Wang
    Abstract: We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple short-term outcomes, and develop three novel identification strategies for the average long-term treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders.
    Date: 2022–02
  3. By: Daouia, Abdelaati; Padoan, Simone A.; Stupfler, Gilles
    Abstract: This paper investigates pooling strategies for tail index and extreme quantile estimation from heavy-tailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their large-sample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with di↵erent and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the variance-optimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSE-optimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and e↵ective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm the statistical inferential theory of our pooled estimators. Two applications to real weather and insurance data are showcased.
    Keywords: Extreme values ; Heavy tails ; Distributed inference ; Pooling ; Testing
    Date: 2022–03–21
  4. By: Daouia, Abdelaati; Stupfler, Gilles; Usseglio-Carleve, Antoine
    Abstract: Nonparametric inference on tail conditional quantiles and their least squares analogs, expectiles, remains limited to i.i.d. data. Expectiles are themselves quan- tiles of a transformation of the underlying distribution. We develop a fully operational kernel-based inferential theory for extreme conditional quantiles and expectiles in the challenging framework of ↵-mixing, conditional heavy-tailed data whose tail index may vary with covariate values. This extreme value problem requires a dedicated treatment to deal with data sparsity in the far tail of the response, in addition to handling diffi culties inher- ent to mixing, smoothing, and sparsity associated to covariate localization. We prove the pointwise asymptotic normality of our estimators and obtain optimal rates of convergence reminiscent of those found in the i.i.d. regression setting, but which had not been estab- lished in the conditional extreme value literature so far. Our mathematical assumptions are satisfied in location-scale models with possible temporal misspecification, nonlinear regression models, and autoregressive models, among others. We propose full bias and variance reduction procedures, and simple but e↵ective data-based rules for selecting tun- ing hyperparameters. Our inference strategy is shown to perform well in finite samples and is showcased in applications to stock returns and tornado loss data.
    Keywords: Conditional quantiles ; Conditional expectiles, Extreme value analysis ; Heavy tailes ; Inference ; Mixing ; Nonprametric regression
    Date: 2022–03–21
  5. By: Anna Bykhovskaya; Vadim Gorin
    Abstract: The paper studies non-stationary high-dimensional vector autoregressions of order $k$, VAR($k$). Additional deterministic terms such as trend or seasonality are allowed. The number of time periods, $T$, and number of coordinates, $N$, are assumed to be large and of the same order. Under such regime the first-order asymptotics of the Johansen likelihood ratio (LR), Pillai-Barlett, and Hotelling-Lawley tests for cointegration is derived: Test statistics converge to non-random integrals. For more refined analysis, the paper proposes and analyzes a modification of the Johansen test. The new test for the absence of cointegration converges to the partial sum of the Airy$_1$ point process. Supporting Monte Carlo simulations indicate that the same behavior persists universally in many situations beyond our theorems. The paper presents an empirical implementation of the approach to the analysis of stocks in S$\&$P$100$ and of cryptocurrencies. The latter example has strong presence of multiple cointegrating relationships, while the former is consistent with the null of no cointegration.
    Date: 2022–02
  6. By: Dargel, Lukas; Thomas-Agnan, Christine
    Abstract: In the framework of spatial econometric interaction models for origin-destination flows, we develop an estimation method for the case when the list of origins may be distinct from the list of destinations, and when the origin-destination matrix may be sparse. The proposed model resembles a weighted version of the one of LeSage and Pace (2008) and we are able to retain most of the efficiency gains associated with the matrix form estimation, which we illustrate for the maximum likelihood estimator. We also derive computationally feasible tests for the coherence of the estimation results and present an efficient approximation of the conditional expectation of the flows.
    Keywords: spatial econometric interaction models; zero flow problem
    JEL: C01 C21
    Date: 2022–03–01
  7. By: Hafner, Christian (Université catholique de Louvain, LIDAM/ISBA, Belgium); Linton, Oliver (; Wang, Linqi (Université catholique de Louvain, LIDAM/LFIN, Belgium)
    Abstract: We introduce a new class of semiparametric dynamic autoregressive models forthe Amihud illiquidity measure, which captures both the long-run trend in the illiquidity series with a nonparametric component and the short-run dynamics with an autoregressive component. We develop a GMM estimator based on conditional moment restrictions and an efficient semiparametric ML estimator based on an iid assumption. We derive large sample properties for both estimators. We further develop a methodology to detect the occurrence of permanent and transitory breaks in the illiquidity process. Finally, we demonstrate the model performance and its empirical relevance on two applications. First, we study the impact of stock splits on the illiquidity dynamics of the five largest US technology company stocks. Second, we investigate how the different components of the illiquidity process obtained from our model relate to the stock market risk premium using data on the S&P 500 stock market index.
    Keywords: Nonparametric ; Semiparametric ; Splits ; Structural Change
    JEL: C12 C14
    Date: 2022–02–23
  8. By: Chudik, A.; Pesaran, M. H.; Smith, R. P.
    Abstract: The idea that certain economic variables are roughly constant in the long-run is an old one. Kaldor described them as stylized facts, whereas Klein and Kosobud labelled them great ratios. While such ratios are widely adopted in theoretical models in economics as conditions for balanced growth, arbitrage or solvency, the empirical literature has tended to find little evidence for them. We argue that this outcome could be due to episodic failure of cointegration, possible two-way causality between the variables in the ratios, and cross-country error dependence due to latent factors. We propose a new system pooled mean group estimator (SPMG) to deal with these features. Using this new panel estimator and a dataset spanning almost one and half centuries and seventeen countries, we find support for five out of the seven great ratios that we consider. Extensive Monte Carlo experiments also show that the SPMG estimator with bootstrapped confidence intervals stands out as the only estimator with satisfactory small sample properties.
    Keywords: Great ratios, debt, consumption, and investment to GDP ratios, arbitrage conditions, heterogeneous panels, episodic cointegration, two-way long-run causality, error cross-sectional dependence
    JEL: B40 C18 C33 C50
    Date: 2022–03–04
  9. By: Hauber, Philipp
    Abstract: Factor models feature prominently in the macroeconomic nowcasting literature, yet no clear consensus has emerged regarding the question of how many and which variables to select in such applications. Examples of both large-scale models, estimated with data sets consisting of over 100 time series as well as small-scale models based on only a few, pre-selected variables can be found in the literature. To adress the issue of variable selection in factor models, in this paper we employ sparse priors on the loadings matrix. These priors concentrate more mass at zero than those conventionally used in the literature while retaining fat tails to capture signals. As a result, variable selection and estimation can be performed simultaneously in a Bayesian framework. Using large data sets consisting of over 100 variables, we evaluate the performance of sparse factor models in real-time for US and German GDP point and density nowcasts. We find that sparse priors lead to relatively small gains in nowcast accuracy compared to a benchmark Normal prior. Moreover, different types of sparse priors discussed in the literature yield very similar results. Our findings are compatible with the hypothesis that large macroeconomic data sets typically used in now- or forecasting applications are not sparse but dense.
    Keywords: factor models,sparsity,nowcasting,variable selection
    JEL: C11 C53 C55 E37
    Date: 2022
  10. By: Ruiz-Gazen, Anne; Thomas-Agnan, Christine; Laurent, Thibault; Mondon, Camille
    Abstract: Invariant Coordinate Selection (ICS) is a multivariate statistical method introduced by Tyler et al. (2009) and based on the simultaneous diagonalization of two scatter matrices. A model based approach of ICS, called Invariant Coordinate Analysis, has already been adapted for compositional data in Muehlmann et al.(2021). In a model free context, ICS is also helpful at identifying outliers (Nordhausen and Ruiz-Gazen, 2022). We propose to develop a version of ICS for outlier detection in compositional data. This version is first introduced in coordinate space for a specific choice of ilr coordinate system associated to a contrast matrix and follows the outlier detection procedure proposed by Archimbaud et al. (2018a). We then show that the procedure is independent of the choice of contrast matrix and can be defined directly in the simplex. To do so, we first establish some properties of the set of matrices satisfying the zero-sum property and introduce a simplex definition of the Mahalanobis distance and the one-step M-estimators class of scatter matrices. We also need to define the family of elliptical distributions in the simplex. We then show how to interpret the results directly in the simplex using two artificial datasets and a real dataset of market shares in the automobile industry.
    Date: 2022–03–18
  11. By: Antoine Didisheim (Swiss Finance Institute, UNIL); Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute)
    Abstract: We introduce a methodology for designing and training deep neural networks (DNN) that we call “Deep Regression Ensembles" (DRE). It bridges the gap between DNN and two-layer neural networks trained with random feature regression. Each layer of DRE has two components, randomly drawn input weights and output weights trained myopically (as if the final output layer) using linear ridge regression. Within a layer, each neuron uses a different subset of inputs and a different ridge penalty, constituting an ensemble of random feature ridge regressions. Our experiments show that a single DRE architecture is at par with or exceeds state-of-the-art DNN in many data sets. Yet, because DRE neural weights are either known in closed-form or randomly drawn, its computational cost is orders of magnitude smaller than DNN.
    Keywords: Deep learning, Neural network, Random features, Ensembles
    Date: 2022–03
  12. By: Wheatcroft, Edward
    Abstract: A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is 'sensitive to distance', that is it takes into account the ordering in the outcomes (a home win is 'closer' to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is non-local and sensitive to distance, the Brier score, which is non-local and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of non-locality and sensitivity to distance as properties of scoring rules in this context.
    Keywords: football forecasting; forecast evaluation; ignorance score; ranked probability score; scoring rules
    JEL: C1
    Date: 2021–12–01

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.