
on Econometrics 
By:  Bu, R.; Li, D.; Linton, O.; Wang, H. 
Abstract:  In this paper, we consider estimating spot/instantaneous volatility matrices of highfrequency data collected for a large number of assets. We first combine classic nonparametric kernelbased smoothing with a generalised shrinkage technique in the matrix estimation for noisefree data under a uniform sparsity assumption, a natural extension of the approximate sparsity commonly used in the literature. The uniform consistency property is derived for the proposed spot volatility matrix estimator with convergence rates comparable to the optimal minimax one. For the highfrequency data contaminated by the microstructure noise, we introduce a localised preaveraging estimation method in the highdimensional setting which first prewhitens data via a kernel filter and then uses the estimation tool developed in the noisefree scenario, and further derive the uniform convergence rates for the developed spot volatility matrix estimator. In addition, we also combine the kernel smoothing with the shrinkage technique to estimate the timevarying volatility matrix of the highdimensional noise vector, and establish the relevant uniform consistency result. Numerical studies are provided to examine performance of the proposed estimation methods in finite samples. 
Keywords:  Brownian semimartingale, Kernel smoothing, Microstructure noise, Sparsity, Spot volatility matrix, Uniform consistency 
JEL:  C10 C14 C22 
Date:  2022–03–16 
URL:  http://d.repec.org/n?u=RePEc:cam:camjip:2208&r= 
By:  Guido Imbens; Nathan Kallus; Xiaojie Mao; Yuhao Wang 
Abstract:  We study the identification and estimation of longterm treatment effects when both experimental and observational data are available. Since the longterm outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some shortterm outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, shortterm outcomes and the longterm outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple shortterm outcomes, and develop three novel identification strategies for the average longterm treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on longterm employment using semisynthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.07234&r= 
By:  Daouia, Abdelaati; Padoan, Simone A.; Stupfler, Gilles 
Abstract:  This paper investigates pooling strategies for tail index and extreme quantile estimation from heavytailed data. To fully exploit the information contained in several samples, we present general weighted pooled Hill estimators of the tail index and weighted pooled Weissman estimators of extreme quantiles calculated through a nonstandard geometric averaging scheme. We develop their largesample asymptotic theory across a fixed number of samples, covering the general framework of heterogeneous sample sizes with di↵erent and asymptotically dependent distributions. Our results include optimal choices of pooling weights based on asymptotic variance and MSE minimization. In the important application of distributed inference, we prove that the varianceoptimal distributed estimators are asymptotically equivalent to the benchmark Hill and Weissman estimators based on the unfeasible combination of subsamples, while the AMSEoptimal distributed estimators enjoy a smaller AMSE than the benchmarks in the case of large bias. We consider additional scenarios where the number of subsamples grows with the total sample size and e↵ective subsample sizes can be low. We extend our methodology to handle serial dependence and the presence of covariates. Simulations confirm the statistical inferential theory of our pooled estimators. Two applications to real weather and insurance data are showcased. 
Keywords:  Extreme values ; Heavy tails ; Distributed inference ; Pooling ; Testing 
Date:  2022–03–21 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:126783&r= 
By:  Daouia, Abdelaati; Stupfler, Gilles; UsseglioCarleve, Antoine 
Abstract:  Nonparametric inference on tail conditional quantiles and their least squares analogs, expectiles, remains limited to i.i.d. data. Expectiles are themselves quan tiles of a transformation of the underlying distribution. We develop a fully operational kernelbased inferential theory for extreme conditional quantiles and expectiles in the challenging framework of ↵mixing, conditional heavytailed data whose tail index may vary with covariate values. This extreme value problem requires a dedicated treatment to deal with data sparsity in the far tail of the response, in addition to handling diffi culties inher ent to mixing, smoothing, and sparsity associated to covariate localization. We prove the pointwise asymptotic normality of our estimators and obtain optimal rates of convergence reminiscent of those found in the i.i.d. regression setting, but which had not been estab lished in the conditional extreme value literature so far. Our mathematical assumptions are satisfied in locationscale models with possible temporal misspecification, nonlinear regression models, and autoregressive models, among others. We propose full bias and variance reduction procedures, and simple but e↵ective databased rules for selecting tun ing hyperparameters. Our inference strategy is shown to perform well in finite samples and is showcased in applications to stock returns and tornado loss data. 
Keywords:  Conditional quantiles ; Conditional expectiles, Extreme value analysis ; Heavy tailes ; Inference ; Mixing ; Nonprametric regression 
Date:  2022–03–21 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:126785&r= 
By:  Anna Bykhovskaya; Vadim Gorin 
Abstract:  The paper studies nonstationary highdimensional vector autoregressions of order $k$, VAR($k$). Additional deterministic terms such as trend or seasonality are allowed. The number of time periods, $T$, and number of coordinates, $N$, are assumed to be large and of the same order. Under such regime the firstorder asymptotics of the Johansen likelihood ratio (LR), PillaiBarlett, and HotellingLawley tests for cointegration is derived: Test statistics converge to nonrandom integrals. For more refined analysis, the paper proposes and analyzes a modification of the Johansen test. The new test for the absence of cointegration converges to the partial sum of the Airy$_1$ point process. Supporting Monte Carlo simulations indicate that the same behavior persists universally in many situations beyond our theorems. The paper presents an empirical implementation of the approach to the analysis of stocks in S$\&$P$100$ and of cryptocurrencies. The latter example has strong presence of multiple cointegrating relationships, while the former is consistent with the null of no cointegration. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.07150&r= 
By:  Dargel, Lukas; ThomasAgnan, Christine 
Abstract:  In the framework of spatial econometric interaction models for origindestination flows, we develop an estimation method for the case when the list of origins may be distinct from the list of destinations, and when the origindestination matrix may be sparse. The proposed model resembles a weighted version of the one of LeSage and Pace (2008) and we are able to retain most of the eﬀiciency gains associated with the matrix form estimation, which we illustrate for the maximum likelihood estimator. We also derive computationally feasible tests for the coherence of the estimation results and present an eﬀicient approximation of the conditional expectation of the flows. 
Keywords:  spatial econometric interaction models; zero flow problem 
JEL:  C01 C21 
Date:  2022–03–01 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:126685&r= 
By:  Hafner, Christian (Université catholique de Louvain, LIDAM/ISBA, Belgium); Linton, Oliver (obl20@cam.ac.uk); Wang, Linqi (Université catholique de Louvain, LIDAM/LFIN, Belgium) 
Abstract:  We introduce a new class of semiparametric dynamic autoregressive models forthe Amihud illiquidity measure, which captures both the longrun trend in the illiquidity series with a nonparametric component and the shortrun dynamics with an autoregressive component. We develop a GMM estimator based on conditional moment restrictions and an efficient semiparametric ML estimator based on an iid assumption. We derive large sample properties for both estimators. We further develop a methodology to detect the occurrence of permanent and transitory breaks in the illiquidity process. Finally, we demonstrate the model performance and its empirical relevance on two applications. First, we study the impact of stock splits on the illiquidity dynamics of the five largest US technology company stocks. Second, we investigate how the different components of the illiquidity process obtained from our model relate to the stock market risk premium using data on the S&P 500 stock market index. 
Keywords:  Nonparametric ; Semiparametric ; Splits ; Structural Change 
JEL:  C12 C14 
Date:  2022–02–23 
URL:  http://d.repec.org/n?u=RePEc:ajf:louvlf:2022002&r= 
By:  Chudik, A.; Pesaran, M. H.; Smith, R. P. 
Abstract:  The idea that certain economic variables are roughly constant in the longrun is an old one. Kaldor described them as stylized facts, whereas Klein and Kosobud labelled them great ratios. While such ratios are widely adopted in theoretical models in economics as conditions for balanced growth, arbitrage or solvency, the empirical literature has tended to find little evidence for them. We argue that this outcome could be due to episodic failure of cointegration, possible twoway causality between the variables in the ratios, and crosscountry error dependence due to latent factors. We propose a new system pooled mean group estimator (SPMG) to deal with these features. Using this new panel estimator and a dataset spanning almost one and half centuries and seventeen countries, we find support for five out of the seven great ratios that we consider. Extensive Monte Carlo experiments also show that the SPMG estimator with bootstrapped confidence intervals stands out as the only estimator with satisfactory small sample properties. 
Keywords:  Great ratios, debt, consumption, and investment to GDP ratios, arbitrage conditions, heterogeneous panels, episodic cointegration, twoway longrun causality, error crosssectional dependence 
JEL:  B40 C18 C33 C50 
Date:  2022–03–04 
URL:  http://d.repec.org/n?u=RePEc:cam:camdae:2215&r= 
By:  Hauber, Philipp 
Abstract:  Factor models feature prominently in the macroeconomic nowcasting literature, yet no clear consensus has emerged regarding the question of how many and which variables to select in such applications. Examples of both largescale models, estimated with data sets consisting of over 100 time series as well as smallscale models based on only a few, preselected variables can be found in the literature. To adress the issue of variable selection in factor models, in this paper we employ sparse priors on the loadings matrix. These priors concentrate more mass at zero than those conventionally used in the literature while retaining fat tails to capture signals. As a result, variable selection and estimation can be performed simultaneously in a Bayesian framework. Using large data sets consisting of over 100 variables, we evaluate the performance of sparse factor models in realtime for US and German GDP point and density nowcasts. We find that sparse priors lead to relatively small gains in nowcast accuracy compared to a benchmark Normal prior. Moreover, different types of sparse priors discussed in the literature yield very similar results. Our findings are compatible with the hypothesis that large macroeconomic data sets typically used in now or forecasting applications are not sparse but dense. 
Keywords:  factor models,sparsity,nowcasting,variable selection 
JEL:  C11 C53 C55 E37 
Date:  2022 
URL:  http://d.repec.org/n?u=RePEc:zbw:esprep:251551&r= 
By:  RuizGazen, Anne; ThomasAgnan, Christine; Laurent, Thibault; Mondon, Camille 
Abstract:  Invariant Coordinate Selection (ICS) is a multivariate statistical method introduced by Tyler et al. (2009) and based on the simultaneous diagonalization of two scatter matrices. A model based approach of ICS, called Invariant Coordinate Analysis, has already been adapted for compositional data in Muehlmann et al.(2021). In a model free context, ICS is also helpful at identifying outliers (Nordhausen and RuizGazen, 2022). We propose to develop a version of ICS for outlier detection in compositional data. This version is first introduced in coordinate space for a specific choice of ilr coordinate system associated to a contrast matrix and follows the outlier detection procedure proposed by Archimbaud et al. (2018a). We then show that the procedure is independent of the choice of contrast matrix and can be defined directly in the simplex. To do so, we first establish some properties of the set of matrices satisfying the zerosum property and introduce a simplex definition of the Mahalanobis distance and the onestep Mestimators class of scatter matrices. We also need to define the family of elliptical distributions in the simplex. We then show how to interpret the results directly in the simplex using two artificial datasets and a real dataset of market shares in the automobile industry. 
Date:  2022–03–18 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:126752&r= 
By:  Antoine Didisheim (Swiss Finance Institute, UNIL); Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute) 
Abstract:  We introduce a methodology for designing and training deep neural networks (DNN) that we call “Deep Regression Ensembles" (DRE). It bridges the gap between DNN and twolayer neural networks trained with random feature regression. Each layer of DRE has two components, randomly drawn input weights and output weights trained myopically (as if the final output layer) using linear ridge regression. Within a layer, each neuron uses a different subset of inputs and a different ridge penalty, constituting an ensemble of random feature ridge regressions. Our experiments show that a single DRE architecture is at par with or exceeds stateoftheart DNN in many data sets. Yet, because DRE neural weights are either known in closedform or randomly drawn, its computational cost is orders of magnitude smaller than DNN. 
Keywords:  Deep learning, Neural network, Random features, Ensembles 
Date:  2022–03 
URL:  http://d.repec.org/n?u=RePEc:chf:rpseri:rp2220&r= 
By:  Wheatcroft, Edward 
Abstract:  A scoring rule is a function of a probabilistic forecast and a corresponding outcome used to evaluate forecast performance. There is some debate as to which scoring rules are most appropriate for evaluating forecasts of sporting events. This paper focuses on forecasts of the outcomes of football matches. The ranked probability score (RPS) is often recommended since it is 'sensitive to distance', that is it takes into account the ordering in the outcomes (a home win is 'closer' to a draw than it is to an away win). In this paper, this reasoning is disputed on the basis that it adds nothing in terms of the usual aims of using scoring rules. A local scoring rule is one that only takes the probability placed on the outcome into consideration. Two simulation experiments are carried out to compare the performance of the RPS, which is nonlocal and sensitive to distance, the Brier score, which is nonlocal and insensitive to distance, and the Ignorance score, which is local and insensitive to distance. The Ignorance score outperforms both the RPS and the Brier score, casting doubt on the value of nonlocality and sensitivity to distance as properties of scoring rules in this context. 
Keywords:  football forecasting; forecast evaluation; ignorance score; ranked probability score; scoring rules 
JEL:  C1 
Date:  2021–12–01 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:111494&r= 