nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒07‒25
sixteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Inference for Matched Tuples and Fully Blocked Factorial Designs By Yuehao Bai; Jizhou Liu; Max Tabord-Meehan
  2. Model Averaging Estimation of Panel Data Models with Many Instruments and Boosting By Hao Hao; Bai Huang; Tae-Hwy Lee
  3. Finite-Sample Guarantees for High-Dimensional DML By Victor Quintas-Martinez
  4. Targeting moments for calibration compared with indirect inference By Meenagh, David; Minford, Patrick; Xu, Yongdeng
  5. Randomization Inference Tests for Shift-Share Designs By Luis Alvarez; Bruno Ferman; Raoni Oliveira
  6. S-estimation in Linear Models with Structured Covariance Matrices By Ruiz-Gazen, Anne; Lopuhaä, Henrik Paul; Gares, Valérie
  7. Fast Two-Stage Variational Bayesian Approach to Estimating Panel Spatial Autoregressive Models with Unrestricted Spatial Weights Matrices By Deborah Gefang; Stephen G. Hall; George S. Tavlas
  8. Estimating spot volatility under infinite variation jumps with market microstructure noise By Qiang Liu; Zhi Liu
  9. Comparison of Bayesian and Sample Theory Parametric and Semiparametric Binary Response Models By Xiangjin Shen; Iskander Karibzhanov; Hiroki Tsurumi; Shiliang Li
  10. Data-driven stabilizations of goodness-of-fit tests By Fernández de Marcos Giménez de los Galanes, Alberto; García Portugues, Eduardo
  11. A Nonparametric Panel Model for Climate Data with Seasonal and Spatial Variation By Gao, J.; Linton, O.; Peng, B.
  12. QR Prediction for Statistical Data Integration By Medous, Estelle; Goga, Camelia; Ruiz-Gazen, Anne; Beaumont, Jean-François; Dessertaine, Alain; Puech, Pauline
  13. Time-Varying Multivariate Causal Processes By Jiti Gao; Bin Peng; Wei Biao Wu; Yayi Yan
  14. Automatic robust Box-Cox and extended Yeo-Johnson transformations in regression By Riani, Marco; Atkinson, Anthony C.; Corbellini, Aldo
  15. Economic variable selection By Koji Miyawaki; Steven N. MacEachern
  16. An interpretable machine learning workflow with an application to economic forecasting By Buckmann, Marcus; Joseph, Andreas

  1. By: Yuehao Bai; Jizhou Liu; Max Tabord-Meehan
    Abstract: This paper studies inference in randomized controlled trials with multiple treatments, where treatment status is determined according to a "matched tuples" design. Here, by a matched tuples design, we mean an experimental design where units are sampled i.i.d. from the population of interest, grouped into "homogeneous" blocks with cardinality equal to the number of treatments, and finally, within each block, each treatment is assigned exactly once uniformly at random. We first study estimation and inference for matched tuples designs in the general setting where the parameter of interest is a vector of linear contrasts over the collection of average potential outcomes for each treatment. Parameters of this form include but are not limited to standard average treatment effects used to compare one treatment relative to another. We first establish conditions under which a sample analogue estimator is asymptotically normal and construct a consistent estimator of its corresponding asymptotic variance. Combining these results establishes the asymptotic validity of tests based on these estimators. In contrast, we show that a common testing procedure based on a linear regression with block fixed effects and the usual heteroskedasticity-robust variance estimator is invalid in the sense that the resulting test may have a limiting rejection probability under the null hypothesis strictly greater than the nominal level. We then apply our results to study the asymptotic properties of what we call "fully-blocked" $2^K$ factorial designs, which are simply matched tuples designs applied to a full factorial experiment. Leveraging our previous results, we establish that our estimator achieves a lower asymptotic variance under the fully-blocked design than that under any stratified factorial design. A simulation study and empirical application illustrate the practical relevance of our results.
    Date: 2022–06
  2. By: Hao Hao (Ford Motor Company); Bai Huang (Central University of Finance and Economics); Tae-Hwy Lee (Department of Economics, University of California Riverside)
    Abstract: Applied researchers often confront two issues when using the fixed effect-two-stage least squares (FE-2SLS) estimator for panel data models. One is that it may lose its consistency due to too many instruments. The other is that the gain of using FE-2SLS may not exceed its loss when the endogeneity is weak. In this paper, an L2Boosting regularization procedure for panel data models is proposed to tackle the many instruments issue. We then construct a Stein-like model-averaging estimator to take advantage of FE and FE-2SLS-Boosting estimators. Finite sample properties are examined in Monte Carlo and an empirical application is presented.
    Keywords: FE-2SLS, weak endogeneity, combined estimator, many instruments, L2Boosting, FE-2SLS-Boosting
    JEL: C13 C33 C36 C52
    Date: 2022–07
  3. By: Victor Quintas-Martinez
    Abstract: Debiased machine learning (DML) offers an attractive way to estimate treatment effects in observational settings, where identification of causal parameters requires a conditional independence or unconfoundedness assumption, since it allows to control flexibly for a potentially very large number of covariates. This paper gives novel finite-sample guarantees for joint inference on high-dimensional DML, bounding how far the finite-sample distribution of the estimator is from its asymptotic Gaussian approximation. These guarantees are useful to applied researchers, as they are informative about how far off the coverage of joint confidence bands can be from the nominal level. There are many settings where high-dimensional causal parameters may be of interest, such as the ATE of many treatment profiles, or the ATE of a treatment on many outcomes. We also cover infinite-dimensional parameters, such as impacts on the entire marginal distribution of potential outcomes. The finite-sample guarantees in this paper complement the existing results on consistency and asymptotic normality of DML estimators, which are either asymptotic or treat only the one-dimensional case.
    Date: 2022–06
  4. By: Meenagh, David (Cardiff Business School); Minford, Patrick (Cardiff Business School); Xu, Yongdeng (Cardiff Business School)
    Abstract: A common practice in estimating parameters in DSGE models is to find a set that when simulated gets close to an average of certain data moments; the model's simulated performance for other moments is then compared to the data for these as an informal test of the model. We call this procedure informal Indirect Inference, III. By contrast what we call Formal Indirect Inference, FII, chooses a set of moments as the auxiliary model and computes the Wald statistic for the joint distribution of these moments according to the structural DSGE model; it tests the model according to the probability of obtaining the data moments. The FII estimator then chooses structural parameters that maximise this probability. We show in this note via Monte Carlo experiments that the FII estimator has low bias in small samples, whereas the III estimator has much higher bias. It follows that models estimated by III will typically also be rejected by formal indirect inference tests.
    Keywords: Moments, Indirect Inference
    JEL: C12 C32 C52
    Date: 2022–07
  5. By: Luis Alvarez; Bruno Ferman; Raoni Oliveira
    Abstract: We consider the problem of inference in shift-share research designs. The choice between existing approaches that allow for unrestricted spatial correlation involves tradeoffs, varying in terms of their validity when there are relatively few or concentrated shocks, and in terms of the assumptions on the shock assignment process and treatment effects heterogeneity. We propose alternative randomization inference methods that combine the advantages of different approaches. These methods are valid in finite samples under relatively stronger assumptions, while asymptotically valid under weaker assumptions.
    Date: 2022–06
  6. By: Ruiz-Gazen, Anne; Lopuhaä, Henrik Paul; Gares, Valérie
    Abstract: We provide a unified approach to S-estimation in balanced linear models with structured covariance matrices. Of main interest are S-estimators for linear mixed effects models, but our approach also includes S-estimators in several other standard multivariate models, such as multiple regression, multivariate regression, and multivariate location and scatter. We provide sufficient conditions for the existence of S-functionals and S-estimators, establish asymptotic properties such as consistency and asymptotic normality, and derive their robustness prop-erties in terms of breakdown point and influence function. All the results are obtained for general identifiable covariance structures and are established under mild conditions on the distribution of the observations, which goes far beyond models with elliptically contoured densities. Some of our results are new and others are more general than existing ones in the literature. In this way this manuscript completes and improves results on S-estimation in a wide variety of multivariate models. We illustrate our results by means of a simulation study and an application to data from a trial on the treatment of lead-exposed children.
    Date: 2022–06–22
  7. By: Deborah Gefang; Stephen G. Hall; George S. Tavlas
    Abstract: This paper proposes a fast two-stage variational Bayesian algorithm to estimating panel spatial autoregressive models with unknown spatial weights matrices. Using Dirichlet-Laplace global-local shrinkage priors, we are able to uncover the spatial impacts between cross-sectional units without imposing any a priori restrictions. Monte Carlo experiments show that our approach works well for both long and short panels. We are also the first in the literature to develop VB methods to estimate large covariance matrices with unrestricted sparsity patterns. The method is important in itself because of its relevance to other popular large data models such as Bayesian vector autoregressions. Matlab code is provided.
    Date: 2022–05
  8. By: Qiang Liu; Zhi Liu
    Abstract: Jumps and market microstructure noise are stylized features of high-frequency financial data. It is well known that they introduce bias in the estimation of volatility (including integrated and spot volatilities) of assets, and many methods have been proposed to deal with this problem. When the jumps are intensive with infinite variation, the estimation of spot volatility in a noisy setting is not available and is thus in need. To this end, we propose a novel estimator of spot volatility with a hybrid use of the pre-averaging technique and the empirical characteristic function. Under mild assumptions, the consistency and asymptotic normality results of our estimation were established. Furthermore, we showed that our estimator achieves an almost efficient convergence rate with optimal variance. Simulation studies verified our theoretical conclusions. We also applied our proposed estimator to conduct empirical analyses, such as estimating the weekly volatility curve using second-by-second transaction price data.
    Date: 2022–05
  9. By: Xiangjin Shen; Iskander Karibzhanov; Hiroki Tsurumi; Shiliang Li
    Abstract: This study proposes a Bayesian semiparametric binary response model using Markov chain Monte Carlo algorithms since this Bayesian algorithm works when the maximum likelihood estimation fails. Implementing graphic processing unit computing improves the computation time because of its efficiency in estimating the optimal bandwidth of the kernel density. The study employs simulated data and Monte Carlo experiments to compare the performances of the parametric and semiparametric models. We use mean squared errors, receiver operating characteristic curves and marginal effects as model assessment criteria. Finally, we present an application to evaluate the consumer bankrupt rates based on Canadian TransUnion data.
    Keywords: Credit risk management; Econometric and statistical methods transmission
    JEL: C1 C14 C35 C51 C63 D1
    Date: 2022–07
  10. By: Fernández de Marcos Giménez de los Galanes, Alberto; García Portugues, Eduardo
    Abstract: Exact null distributions of goodness-of-fit test statistics are generally challenging to obtain in tractable forms. Practitioners are therefore usually obliged to rely on asymptotic null distributions or Monte Carlo methods, either in the form of a lookup table or carried out on demand, to apply a goodness-of-fit test. Stephens (1970) provided remarkable simple and useful transformations of several classic goodness-of-fit test statistics that stabilized their exact-n critical values for varying sample sizes n. However, detail on the accuracy of these and subsequent transformations in yielding exact p-values, or even deep understanding on the derivation of several transformations, is still scarce nowadays. We illuminate and automatize, using modern tools, the latter stabilization approach to (i) expand its scope of applicability and (ii) yield semi-continuous exact p-values, as opposed to exact critical values for fixed significance levels. We show improvements on the stabilization accuracy of the exact null distributions of the Kolmogorov-Smirnov, Cramér-von Mises, Anderson-Darling, Kuiper, and Watson test statistics. In addition, we provide a parameter-dependent exact-n stabilization for several novel statistics for testing uniformity on the hypersphere of arbitrary dimension. A data application in astronomy illustrates the benefits of the advocated stabilization for quickly analyzing small-to-moderate sequentially-measured samples.
    Keywords: Exact Distribution; Goodness-Of-Fit; P-Value; Stabilization; Uniformity
    Date: 2022–06–28
  11. By: Gao, J.; Linton, O.; Peng, B.
    Abstract: In this paper, we consider a panel data model which allows for heterogeneous time trends at different locations. We propose a new estimation method for the panel data model before we establish an asymptotic theory for the proposed estimation method. For inferential purposes, we develop a bootstrap method for the case where weak correlation presents in both dimensions of the error terms. We examine the ï¬ nite–sample properties of the proposed model and estimation method through extensive simulated studies. Finally, we use the newly proposed model and method to investigate rainfall, temperature and sunshine data of U.K. respectively. Overall, we ï¬ nd the weather of winter has changed dramatically over the past ï¬ fty years. Changes may vary with respect to locations for the other seasons.
    Keywords: Bootstrap method, Interactive ï¬ xed–effect, Panel rainfall data, Time trend
    JEL: Q50 C23
    Date: 2022–06–20
  12. By: Medous, Estelle; Goga, Camelia; Ruiz-Gazen, Anne; Beaumont, Jean-François; Dessertaine, Alain; Puech, Pauline
    Abstract: n this paper, we investigate how a big non-probability database can be used to improve estimates from a small probability sample through data integration techniques. In the situation where the study variable is observed in both data sources, Kim and Tam (2021) proposed two design-consistent estimators that can be justified through dual frame survey theory. First, we provide conditions ensuring that these estimators are more eÿcient than the Horvitz-Thompson estimator when the probability sample is selected using either Poisson sampling or simple random sampling without replacement. Then, we study the class of QR predictors, proposed by Särndal and Wright (1984) to handle the case where the non-probability database contains auxiliary variables but no study variable. We provide conditions ensuring that the QR predictor is asymptotically design-unbiased. Assuming the probability sampling design is not informative, the QR predictor is also model-unbiased regardless of the validity of those conditions. We compare the design properties of di˙erent predictors, in the class of QR predictors, through a simulation study. They include a model-based predictor, a model-assisted estimator and a cosmetic estimator. In our simulation setups, the cosmetic estimator performed slightly better than the model-assisted estimator. As expected, the model-based predictor did not perform well when the underlying model was misspecified.
    Keywords: cosmetic estimator; dual-frame; GREG estimator, non-probability sample; prob-ability sample
    Date: 2022–06–23
  13. By: Jiti Gao; Bin Peng; Wei Biao Wu; Yayi Yan
    Abstract: In this paper, we consider a wide class of time-varying multivariate causal processes which nests many classic and new examples as special cases. We first prove the existence of a weakly dependent stationary approximation for our model which is the foundation to initiate the theoretical development. Afterwards, we consider the QMLE estimation approach, and provide both point-wise and simultaneous inferences on the coefficient functions. In addition, we demonstrate the theoretical findings through both simulated and real data examples. In particular, we show the empirical relevance of our study using an application to evaluate the conditional correlations between the stock markets of China and U.S. We find that the interdependence between the two stock markets is increasing over time.
    Date: 2022–06
  14. By: Riani, Marco; Atkinson, Anthony C.; Corbellini, Aldo
    Abstract: The paper introduces an automatic procedure for the parametric transformation of the response in regression models to approximate normality. We consider the Box-Cox transformation and its generalization to the extended Yeo-Johnson transformation which allows for both positive and negative responses. A simulation study illuminates the superior comparative properties of our automatic procedure for the Box-Cox transformation. The usefulness of our procedure is demonstrated on four sets of data, two including negative observations. An important theoretical development is an extension of the Bayesian Information Criterion (BIC) to the comparison of models following the deletion of observations, the number deleted here depending on the transformation parameter.
    Keywords: Bayesian Information Criterion (BIC); constructed variable; extended coefficient of determination (R2); forward search; negative observations; simultaneous test; T&F deal
    JEL: C1
    Date: 2022–04–19
  15. By: Koji Miyawaki; Steven N. MacEachern
    Abstract: Regression plays a central role in the discipline of Statistics and is the primary analytic technique in many research areas. Variable selection is a classic and major problem for regression. This study emphasizes the economic aspect of variable selection. The problem is formulated in terms of the cost of predictors to be purchased for future use: only the subset of covariates used in the model will need to be purchased. This leads to a decision-theoretic formulation of the variable selection problems that includes the cost of predictors as well as their effect. We adopt a Bayesian perspective and propose two approaches to address uncertainty about model and model parameters. These approaches, termed the restricted and extended approaches, lead us to rethink model averaging. From objective or robust Bayes point of view, the former is preferred. The proposed method is applied to three popular datasets for illustration.
    Date: 2022–03
  16. By: Buckmann, Marcus (Bank of England); Joseph, Andreas (Bank of England)
    Abstract: We propose a generic workflow for the use of machine learning models to inform decision making and to communicate modelling results with stakeholders. It involves three steps: (1) a comparative model evaluation, (2) a feature importance analysis and (3) statistical inference based on Shapley value decompositions. We discuss the different steps of the workflow in detail and demonstrate each by forecasting changes in US unemployment one year ahead using the well-established FRED-MD dataset. We find that universal function approximators from the machine learning literature, including gradient boosting and artificial neural networks, outperform more conventional linear models. This better performance is associated with greater flexibility, allowing the machine learning models to account for time-varying and nonlinear relationships in the data generating process. The Shapley value decomposition identifies economically meaningful nonlinearities learned by the models. Shapley regressions for statistical inference on machine learning models enable us to assess and communicate variable importance akin to conventional econometric approaches. While we also explore high-dimensional models, our findings suggest that the best trade-off between interpretability and performance of the models is achieved when a small set of variables is selected by domain experts.
    Keywords: machine learning; model interpretability; forecasting; unemployment; Shapley values
    JEL: C14 C38 C45 C52 C53 C71 E24
    Date: 2022–06–01

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.