nep-ecm New Economics Papers
on Econometrics
Issue of 2023‒04‒17
fourteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Using Forests in Multivariate Regression Discontinuity Designs By Yiqi Liu; Yuan Qi
  2. Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models By Matteo Barigozzi
  3. Heckman sample selection estimators under heteroskedasticity By Alyssa Carlson; Wei Zhao
  4. Inference on Optimal Dynamic Policies via Softmax Approximation By Qizhao Chen; Morgane Austern; Vasilis Syrgkanis
  5. Estimation of Asymmetric Stochastic Volatility in Mean Models By Antonis Demos
  6. Standard errors when a regressor is randomly assigned By Denis Chetverikov; Jinyong Hahn; Zhipeng Liao; Andres Santos
  7. On the Existence and Information of Orthogonal Moments For Inference By Facundo Arga\~naraz; Juan Carlos Escanciano
  8. High-Frequency Volatility Estimation with Fast Multiple Change Points Detection By Greeshma Balabhadra; El Mehdi Ainasse; Pawel Polak
  9. Distributional Vector Autoregression: Eliciting Macro and Financial Dependence By Yunyun Wang; Tatsushi Oka; Dan Zhu
  10. Statistical error bounds for weighted mean and median, with application to robust aggregation of cryptocurrency data By Michaël Allouche; Mnacho Echenim; Emmanuel Gobet; Anne-Claire Maurice
  11. How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on Over 60 Replicated Studies By Apoorva Lal; Mac Lockhart; Yiqing Xu; Ziwen Zu
  12. Network log-ARCH models for forecasting stock market volatility By Raffaele Mattera; Philipp Otto
  13. A Multilevel Stochastic Approximation Algorithm for Value-at-Risk and Expected Shortfall Estimation By Stéphane Crépey; Noufel Frikha; Azar Louzi
  14. A Distributionally Robust Random Utility Model By David M\"uller; Emerson Melo; Ruben Schlotter

  1. By: Yiqi Liu; Yuan Qi
    Abstract: We discuss estimating conditional treatment effects in regression discontinuity designs with multiple scores. While local linear regressions have been popular in settings where the treatment status is completely described by one running variable, they do not easily generalize to empirical applications involving multiple treatment assignment rules. In practice, the multivariate problem is usually reduced to a univariate one where using local linear regressions is suitable. Instead, we propose a forest-based estimator that can flexibly model multivariate scores, where we build two honest forests in the sense of Wager and Athey (2018) on both sides of the treatment boundary. This estimator is asymptotically normal and sidesteps the pitfalls of running local linear regressions in higher dimensions. In simulations, we find our proposed estimator outperforms local linear regressions in multivariate designs and is competitive against the minimax-optimal estimator of Imbens and Wager (2019). The implementation of this estimator is simple, can readily accommodate any (fixed) number of running variables, and does not require estimating any nuisance parameters of the data generating process.
    Date: 2023–03
  2. By: Matteo Barigozzi
    Abstract: We review Quasi Maximum Likelihood estimation of factor models for high-dimensional panels of time series. We consider two cases: (1) estimation when no dynamic model for the factors is specified (Bai and Li, 2016); (2) estimation based on the Kalman smoother and the Expectation Maximization algorithm thus allowing to model explicitly the factor dynamics (Doz et al., 2012). Our interest is in approximate factor models, i.e., when we allow for the idiosyncratic components to be mildly cross-sectionally, as well as serially, correlated. Although such setting apparently makes estimation harder, we show, in fact, that factor models do not suffer of the curse of dimensionality problem, but instead they enjoy a blessing of dimensionality property. In particular, we show that if the cross-sectional dimension of the data, $N$, grows to infinity, then: (i) identification of the model is still possible, (ii) the mis-specification error due to the use of an exact factor model log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to infinity, we can also consistently estimate all parameters of the model and make inference. The same is true for estimation of the latent factors which can be carried out by weighted least-squares, linear projection, or Kalman filtering/smoothing. We also compare the approaches presented with: Principal Component analysis and the classical, fixed $N$, exact Maximum Likelihood approach. We conclude with a discussion on efficiency of the considered estimators.
    Date: 2023–03
  3. By: Alyssa Carlson (Department of Economics, University of Missouri); Wei Zhao (Department of Economics, University of Missouri)
    Abstract: This paper studies the properties of two Heckman sample selection estimators, full information maximum likelihood (FIML) and limited information maximum likelihood (LIML), under heteroskedasticity. In this case, FIML is inconsistent while LIML can be consistent in certain settings. For the LIML estimator, we provide robust asymptotic variance formulas, not currently provided with standard Statacommands. Since heteroskedasticity affects these two estimators’ performance, this paper also offers guidance on how to properly test for heteroskedasticity. We propose a new demeaned Breusch–Pagan test to detect general heteroskedasticity in sample selection settings as well as a test for when LIML is consistent under heteroskedasticity. The Monte Carlo simulations illustrate that both of the proposed test procedures perform well.
    Keywords: st0001, sample selection, heteroskedasticty, Bruesh–Pagan test, Hausman test
    JEL: C13 C24
    Date: 2023–04
  4. By: Qizhao Chen; Morgane Austern; Vasilis Syrgkanis
    Abstract: Estimating optimal dynamic policies from offline data is a fundamental problem in dynamic decision making. In the context of causal inference, the problem is known as estimating the optimal dynamic treatment regime. Even though there exists a plethora of methods for estimation, constructing confidence intervals for the value of the optimal regime and structural parameters associated with it is inherently harder, as it involves non-linear and non-differentiable functionals of un-known quantities that need to be estimated. Prior work resorted to sub-sample approaches that can deteriorate the quality of the estimate. We show that a simple soft-max approximation to the optimal treatment regime, for an appropriately fast growing temperature parameter, can achieve valid inference on the truly optimal regime. We illustrate our result for a two-period optimal dynamic regime, though our approach should directly extend to the finite horizon case. Our work combines techniques from semi-parametric inference and $g$-estimation, together with an appropriate triangular array central limit theorem, as well as a novel analysis of the asymptotic influence and asymptotic bias of softmax approximations.
    Date: 2023–03
  5. By: Antonis Demos (
    Abstract: Here we investigate the estimation of asymmetric Autoregressive Stochastic Volatility models with possibly time varying risk premia. We employ the Indirect Inference estimation developed in Gallant and Tauchen (1996), with a first step estimator either the Generalized Quadratic ARCH or the Exponential GARCH. We employ Monte-Carlo simulations to compare the two first step models in terms of bias and root Mean Squared Error. We apply the developed methods for the estimation of an asymmetric autoregressive SV-M model to international stock markets excess returns.
    Keywords: Stochastic Volatility estimation asymmetry leverage indirect inference
    Date: 2023–03–21
  6. By: Denis Chetverikov; Jinyong Hahn; Zhipeng Liao; Andres Santos
    Abstract: We examine asymptotic properties of the OLS estimator when the values of the regressor of interest are assigned randomly and independently of other regressors. We find that the OLS variance formula in this case is often simplified, sometimes substantially. In particular, when the regressor of interest is independent not only of other regressors but also of the error term, the textbook homoskedastic variance formula is valid even if the error term and auxiliary regressors exhibit a general dependence structure. In the context of randomized controlled trials, this conclusion holds in completely randomized experiments with constant treatment effects. When the error term is heteroscedastic with respect to the regressor of interest, the variance formula has to be adjusted not only for heteroscedasticity but also for correlation structure of the error term. However, even in the latter case, some simplifications are possible as only a part of the correlation structure of the error term should be taken into account. In the context of randomized control trials, this implies that the textbook homoscedastic variance formula is typically not valid if treatment effects are heterogenous but heteroscedasticity-robust variance formulas are valid if treatment effects are independent across units, even if the error term exhibits a general dependence structure. In addition, we extend the results to the case when the regressor of interest is assigned randomly at a group level, such as in randomized control trials with treatment assignment determined at a group (e.g., school/village) level.
    Date: 2023–03
  7. By: Facundo Arga\~naraz; Juan Carlos Escanciano
    Abstract: Locally Robust (LR)/Orthogonal/Debiased moments have been proved useful with machine learning or high dimensional first steps, but their existence has not been investigated for general models and parameters. In this paper, we provide a necessary and sufficient condition, referred to as Restricted Local Non-surjectivity (RLN), for the existence of such orthogonal moments to conduct robust inference on parameters of interest in regular semiparametric models. Importantly, RLN does not require identification of the parameters of interest or identification of the nuisance parameters. Thus, orthogonal moments exist under rather general conditions. However, for orthogonal moments to be informative for inference, the efficient Fisher Information matrix for the parameter must be non-zero (though possibly singular). We use these results to characterize the existence of orthogonal moments in a class of models with Unobserved Heterogeneity (UH), and to clarify the important role played by the support of UH for such characterization. Our results deliver functional differencing moments as a special case, and they also extend them to general functionals of UH. We also investigate existence of orthogonal moments and their relevance for models defined by moment restrictions with possibly different conditioning variables, and we characterize orthogonal moments for heterogenous parameters in treatment effects, for sample selection models, and for popular models of demand of differentiated products.
    Date: 2023–03
  8. By: Greeshma Balabhadra; El Mehdi Ainasse; Pawel Polak
    Abstract: We propose high-frequency volatility estimators with multiple change points that are $\ell_1$-regularized versions of two classical estimators: quadratic variation and bipower variation. We establish consistency of these estimators for the true unobserved volatility and the change points locations under general sub-Weibull distribution assumptions on the jump process. The proposed estimators employ the computationally efficient least angle regression algorithm for estimation purposes, followed by a reduced dynamic programming step to refine the final number of change points. In terms of numerical performance, the proposed estimators are computationally fast and accurately identify breakpoints near the end of the sample, which is highly desirable in today's electronic trading environment. In terms of out-of-sample volatility prediction, our new estimators provide more realistic and smoother volatility forecasts, and they outperform a wide range of classical and recent volatility estimators across various frequencies and forecasting horizons.
    Date: 2023–03
  9. By: Yunyun Wang; Tatsushi Oka; Dan Zhu
    Abstract: Vector autoregression is an essential tool in empirical macroeconomics and finance for understanding the dynamic interdependencies among multivariate time series. In this study, we expand the scope of vector autoregression by incorporating a multivariate distributional regression framework and introducing a distributional impulse response function, providing a comprehensive view of dynamic heterogeneity. We propose a straightforward yet flexible estimation method and establish its asymptotic properties under weak dependence assumptions. Our empirical analysis examines the conditional joint distribution of GDP growth and financial conditions in the United States, with a focus on the global financial crisis. Our results show that tight financial conditions lead to a multimodal conditional joint distribution of GDP growth and financial conditions, and easing financial conditions significantly impacts long-term GDP growth, while improving the GDP growth during the global financial crisis has limited effects on financial conditions.
    Date: 2023–03
  10. By: Michaël Allouche (Kaiko); Mnacho Echenim (LIG - Laboratoire d'Informatique de Grenoble - CNRS - Centre National de la Recherche Scientifique - UGA - Université Grenoble Alpes - Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology - UGA - Université Grenoble Alpes, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology - UGA - Université Grenoble Alpes); Emmanuel Gobet (CMAP - Centre de Mathématiques Appliquées - Ecole Polytechnique - X - École polytechnique - CNRS - Centre National de la Recherche Scientifique); Anne-Claire Maurice (Kaiko)
    Abstract: We study price aggregation methodologies applied to crypto-currency prices with quotations fragmented on different platforms. An intrinsic difficulty is that the price returns and volumes are heavytailed, with many outliers, making averaging and aggregation challenging. While conventional methods rely on Volume-Weighted Average Prices (called VWAPs), or Volume-Weighted Median prices (called VWMs), we develop a new Robust Weighted Median (RWM) estimator that is robust to price and volume outliers. Our study is based on new probabilistic concentration inequalities for weighted means and weighted quantiles under different tail assumptions (heavy tails, sub-gamma tails, sub-Gaussian tails). This justifies that fluctuations of VWAP and VWM are statistically important given the heavy-tailed properties of volumes and/or prices. We show that our RWM estimator overcomes this problem and also satisfies all the desirable properties of a price aggregator. We illustrate the behavior of RWM on synthetic data (within a parametric model close to real data): our estimator achieves a statistical accuracy twice as good as its competitors, and also allows to recover realized volatilities in a very accurate way. Tests on real data are also performed and confirm the good behavior of the estimator on various use cases.
    Keywords: robust aggregation, weighted mean and quantile estimation, heavy tails, concentration inequalities, outliers
    Date: 2023–03–07
  11. By: Apoorva Lal; Mac Lockhart; Yiqing Xu; Ziwen Zu
    Abstract: Instrumental variable (IV) strategies are widely used in political science to establish causal relationships. However, the identifying assumptions required by an IV design are demanding, and it remains challenging for researchers to assess their validity. In this paper, we replicate 67 papers published in three top journals in political science during 2010-2022 and identify several troubling patterns. First, researchers often overestimate the strength of their IVs due to non-i.i.d. errors, such as a clustering structure. Second, the most commonly used t-test for the two-stage-least-squares (2SLS) estimates often severely underestimates uncertainty. Using more robust inferential methods, we find that around 19-30% of the 2SLS estimates in our sample are underpowered. Third, in the majority of the replicated studies, the 2SLS estimates are much larger than the ordinary-least-squares estimates, and their ratio is negatively correlated with the strength of the IVs in studies where the IVs are not experimentally generated, suggesting potential violations of unconfoundedness or the exclusion restriction. To help researchers avoid these pitfalls, we provide a checklist for better practice.
    Date: 2023–03
  12. By: Raffaele Mattera; Philipp Otto
    Abstract: This paper presents a novel dynamic network autoregressive conditional heteroscedasticity (ARCH) model based on spatiotemporal ARCH models to forecast volatility in the US stock market. To improve the forecasting accuracy, the model integrates temporally lagged volatility information and information from adjacent nodes, which may instantaneously spill across the entire network. The model is also suitable for high-dimensional cases where multivariate ARCH models are typically no longer applicable. We adopt the theoretical foundations from spatiotemporal statistics and transfer the dynamic ARCH model for processes to networks. This new approach is compared with independent univariate log-ARCH models. We could quantify the improvements due to the instantaneous network ARCH effects, which are studied for the first time in this paper. The edges are determined based on various distance and correlation measures between the time series. The performances of the alternative networks' definitions are compared in terms of out-of-sample accuracy. Furthermore, we consider ensemble forecasts based on different network definitions.
    Date: 2023–03
  13. By: Stéphane Crépey (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité); Noufel Frikha (CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique); Azar Louzi (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UPCité - Université Paris Cité)
    Abstract: We propose a multilevel stochastic approximation (MLSA) scheme for the computation of the Value-at-Risk (VaR) and the Expected Shortfall (ES) of a financial loss, which can only be computed via simulations conditional on the realization of future risk factors. Thus, the problem of estimating its VaR and ES is nested in nature and can be viewed as an instance of a stochastic approximation problem with biased innovation. In this framework, for a prescribed accuracy ε, the optimal complexity of a standard stochastic approximation algorithm is shown to be of order ε −3. To estimate the VaR, our MLSA algorithm attains an optimal complexity of order ε −2−δ , where δ
    Keywords: Value-at-Risk, Expected Shortfall, stochastic approximation algorithm, Nested Monte Carlo, Multilevel Monte Carlo
    Date: 2023–03–22
  14. By: David M\"uller; Emerson Melo; Ruben Schlotter
    Abstract: This paper introduces the distributionally robust random utility model (DRO-RUM), which allows the preference shock (unobserved heterogeneity) distribution to be misspecified or unknown. We make three contributions using tools from the literature on robust optimization. First, by exploiting the notion of distributionally robust social surplus function, we show that the DRO-RUM endogenously generates a shock distributionthat incorporates a correlation between the utilities of the different alternatives. Second, we show that the gradient of the distributionally robust social surplus yields the choice probability vector. This result generalizes the celebrated William-Daly-Zachary theorem to environments where the shock distribution is unknown. Third, we show how the DRO-RUM allows us to nonparametrically identify the mean utility vector associated with choice market data. This result extends the demand inversion approach to environments where the shock distribution is unknown or misspecified. We carry out several numerical experiments comparing the performance of the DRO-RUM with the traditional multinomial logit and probit models.
    Date: 2023–03

This nep-ecm issue is ©2023 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.