nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒07‒18
twenty papers chosen by
Sune Karlsson
Örebro universitet

  1. Bayesian and Frequentist Inference for Synthetic Controls By Ignacio Martinez; Jaume Vives-i-Bastida
  2. Semiparametric Single-Index Estimation for Average Treatment Effects By Difang Huang; Jiti Gao; Tatsushi Oka
  3. Debiased Machine Learning without Sample-Splitting for Stable Estimators By Qizhao Chen; Vasilis Syrgkanis; Morgane Austern
  4. Debiased Semiparametric U-Statistics: Machine Learning Inference on Inequality of Opportunity By Juan Carlos Escanciano; Jo\"el Robert Terschuur
  5. A Structural Dynamic Factor Model for Daily Global Stock Market Returns By Linton, O. B.; Tang, H.; Wu, J.;
  6. Tensor Factor Model Estimation by Iterative Projection By Yuefeng Han; Rong Chen; Dan Yang; Cun-Hui Zhang
  7. Parametric Estimation of Long Memory in Factor Models By Yunus Emre Ergemen
  8. Forecasting macroeconomic data with Bayesian VARs: Sparse or dense? It depends! By Luis Gruber; Gregor Kastner
  9. Nonlinear Forecasts and Impulse Responses for Causal-Noncausal (S)VAR Models By Christian Gourieroux; Joann Jasiak
  10. Choosing between persistent and stationary volatility By Chronopoulos, Ilias; Giraitis, Liudas; Kapetanios, George
  11. A Robust Test for Weak Instruments with Multiple Endogenous Regressors By Daniel J. Lewis; Karel Mertens
  12. Using hierarchical aggregation constraints to nowcast regional economic aggregates By Gary Koop; Stuart McIntyre; James Mitchell; Aubrey Poon
  13. Forgetting Approaches to Improve Forecasting By Paulo M.M. Rodrigues; Robert Hill
  14. Assessing Omitted Variable Bias when the Controls are Endogenous By Paul Diegert; Matthew A. Masten; Alexandre Poirier
  15. Provably Auditing Ordinary Least Squares in Low Dimensions By Ankur Moitra; Dhruv Rohatgi
  16. Estimation of Parametric Binary Outcome Models with Degenerate Pure Choice-Based Data with Application to COVID-19-Positive Tests from British Columbia By Nail Kashaev
  17. Markovian Interference in Experiments By Vivek F. Farias; Andrew A. Li; Tianyi Peng; Andrew T. Zheng
  18. Prices, Profits, Proxies, and Production By Victor H. Aguiar; Nail Kashaev; Roy Allen
  19. Nowcasting in the presence of large measurement errors and revisions By Martin Weale; Paul Labonne
  20. RMT-Net: Reject-aware Multi-Task Network for Modeling Missing-not-at-random Data in Financial Credit Scoring By Qiang Liu; Yingtao Luo; Shu Wu; Zhen Zhang; Xiangnan Yue; Hong Jin; Liang Wang

  1. By: Ignacio Martinez; Jaume Vives-i-Bastida
    Abstract: The synthetic control method has become a widely popular tool to estimate causal effects with observational data. Despite this, inference for synthetic control methods remains challenging. Often, inferential results rely on linear factor model data generating processes. In this paper, we characterize the conditions on the factor model primitives (the factor loadings) for which the statistical risk minimizers are synthetic controls (in the simplex). Then, we propose a Bayesian alternative to the synthetic control method that preserves the main features of the standard method and provides a new way of doing valid inference. We explore a Bernstein-von Mises style result to link our Bayesian inference to the frequentist inference. For linear factor model frameworks we show that a maximum likelihood estimator (MLE) of the synthetic control weights can consistently estimate the predictive function of the potential outcomes for the treated unit and that our Bayes estimator is asymptotically close to the MLE in the total variation sense. Through simulations, we show that there is convergence between the Bayes and frequentist approach even in sparse settings. Finally, we apply the method to re-visit the study of the economic costs of the German re-unification. The Bayesian synthetic control method is available in the bsynth R-package.
    Date: 2022–06
  2. By: Difang Huang; Jiti Gao; Tatsushi Oka
    Abstract: We propose a semiparametric method to estimate the average treatment effect under the assumption of unconfoundedness given observational data. Our estimation method alleviates misspecification issues of the propensity score function by estimating the single-index link function involved through Hermite polynomials. Our approach is computationally tractable and allows for moderately large dimension covariates. We provide the large sample properties of the estimator and show its validity. Also, the average treatment effect estimator achieves the parametric rate and asymptotic normality. Our extensive Monte Carlo study shows that the proposed estimator is valid in finite samples. We also provide an empirical analysis on the effect of maternal smoking on babies' birth weight and the effect of job training program on future earnings.
    Date: 2022–06
  3. By: Qizhao Chen; Vasilis Syrgkanis; Morgane Austern
    Abstract: Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-$n$ consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.
    Date: 2022–06
  4. By: Juan Carlos Escanciano; Jo\"el Robert Terschuur
    Abstract: We construct locally robust/orthogonal moments in a semiparametric U-statistics setting. These are quadratic moments in the distribution of the data with a zero derivative with respect to first steps at their limit, which reduces model selection bias with machine learning first steps. We use orthogonal moments to propose new debiased estimators and valid inferences in a variety of applications ranging from Inequality of Opportunity (IOp) to distributional treatment effects. U-statistics with machine learning first steps arise naturally in these and many other applications. A leading example in IOp is the Gini coefficient of machine learning fitted values. We introduce a novel U-moment representation of the First Step Influence Function (U-FSIF) to take into account the effect of the first step estimation on an identifying quadratic moment. Adding the U-FISF to the identifying quadratic moment gives rise to an orthogonal quadratic moment. Our leading and motivational application is to measuring IOp, for which we propose a simple debiased estimator, and the first available inferential methods. We give general and simple regularity conditions for asymptotic theory, and demonstrate an improved finite sample performance in simulations for our debiased measures of IOp. In an empirical application, we find that standard measures of IOp are about six times more sensitive to first step machine learners than our debiased measures, and that between $42\%$ and $46\%$ of income inequality in Spain is explained by circumstances out of the control of the individual.
    Date: 2022–06
  5. By: Linton, O. B.; Tang, H.; Wu, J.;
    Abstract: Most stock markets are open for 6-8 hours per trading day. The Asian, European and American stock markets are separated in time by time-zone differences. We propose a statistical dynamic factor model for a large number of daily returns across multiple time zones. Our model has a common global factor as well as continental factors. Under a mild fixed-signs assumption, our model is identified and has a structural interpretation. We propose several estimators of the model: the maximum likelihood estimator-one day (MLE-one day), the quasi-maximum likelihood estimator (QMLE), an improved estimator from QMLE (QMLE-md), the QMLEres (similar to MLE-one day), and a Bayesian estimator (Gibbs sampling). We establish consistency, the rates of convergence and the asymptotic distributions of the QMLE and the QMLE-md. We next provide a heuristic procedure for conducting inference for the MLE-one day and the QMLE-res. Monte Carlo simulations reveal that the MLE-one day, the QMLE-res and the QMLE-md work well. We then apply our model to two real data sets: (1) equity portfolio returns from Japan, Europe and the US; (2) MSCI equity indices of 41 developed and emerging markets. Some new insights about linkages among different markets are drawn.
    Keywords: Daily Global Stock Market Returns, Expectation Maximization Algorithm, Minimum Distance, Quasi Maximum Likelihood, Structural Dynamic Factor Model, Time-Zone Differences
    JEL: C55 C58 G15
    Date: 2022–06–15
  6. By: Yuefeng Han; Rong Chen; Dan Yang; Cun-Hui Zhang
    Abstract: Tensor time series, which is a time series consisting of tensorial observations, has become ubiquitous. It typically exhibits high dimensionality. One approach for dimension reduction is to use a factor model structure, in a form similar to Tucker tensor decomposition, except that the time dimension is treated as a dynamic process with a time dependent structure. In this paper we introduce two approaches to estimate such a tensor factor model by using iterative orthogonal projections of the original tensor time series. The approaches extend the existing estimation procedures and our theoretical investigation shows that they improve the estimation accuracy and convergence rate significantly. The developed approaches are similar to higher order orthogonal projection methods for tensor decomposition, but with significant differences and theoretical properties. Simulation study is conducted to further illustrate the statistical properties of these estimators.
    Date: 2020–06
  7. By: Yunus Emre Ergemen (Aarhus University, Department of Economics and Business Economics, and CREATES)
    Abstract: A dynamic factor model is proposed in that factor dynamics are driven by stochastic time trends describing arbitrary persistence levels. The proposed model is essentially a long memory factor model, which nests standard I(0) and I(1) behavior smoothly in common factors. In the estimation, principal components analysis (PCA) and conditional sum of squares (CSS) estimations are employed. For the dynamic model parameters, centered normal asymptotics are established at the usual parametric rates, and their small-sample properties are explored via Monte-Carlo experiments. The method is then applied to a panel of U.S. industry realized volatilities. JEL classifcation: C12, C13, C33 Key words: Factor models, long memory, conditional sum of squares, principal components analysis, realized volatility
    Date: 2022–06–24
  8. By: Luis Gruber; Gregor Kastner
    Abstract: Vectorautogressions (VARs) are widely applied when it comes to modeling and forecasting macroeconomic variables. In high dimensions, however, they are prone to overfitting. Bayesian methods, more concretely shrinking priors, have shown to be successful in improving prediction performance. In the present paper we introduce the recently developed $R^2$-induced Dirichlet-decomposition prior to the VAR framework and compare it to refinements of well-known priors in the VAR literature. We demonstrate the virtues of the proposed prior in an extensive simulation study and in an empirical application forecasting data of the US economy. Further, we shed more light on the ongoing Illusion of Sparsity debate. We find that forecasting performances under sparse/dense priors vary across evaluated economic variables and across time frames; dynamic model averaging, however, can combine the merits of both worlds. All priors are implemented using the reduced-form VAR and all models feature stochastic volatility in the variance-covariance matrix.
    Date: 2022–06
  9. By: Christian Gourieroux; Joann Jasiak
    Abstract: We introduce the closed-form formulas of nonlinear forecasts and nonlinear impulse response functions (IRF) for the mixed causal-noncausal (Structural) Vector Autoregressive (S)VAR models. We also discuss the identification of nonlinear causal innovations of the model to which the shocks are applied. Our approach is illustrated by a simulation study and an application to a bivariate process of Bitcoin/USD and Ethereum/USD exchange rates.
    Date: 2022–05
  10. By: Chronopoulos, Ilias; Giraitis, Liudas; Kapetanios, George
    Abstract: This paper suggests a multiplicative volatility model where volatility is decomposed into a stationary and a non-stationary persistent part. We provide a testing procedure to determine which type of volatility is prevalent in the data. The persistent part of volatility is associated with a nonstationary persistent process satisfying some smoothness and moment conditions. The stationary part is related to stationary conditional heteroskedasticity. We outline theory and conditions that allow the extraction of the persistent part from the data and enable standard conditional heteroskedasticity tests to detect stationary volatility after persistent volatility is taken into account. Monte Carlo results support the testing strategy in small samples. The empirical application of the theory supports the persistent volatility paradigm, suggesting that stationary conditional heteroskedasticity is considerably less pronounced than previously thought.
    Date: 2022–06–21
  11. By: Daniel J. Lewis; Karel Mertens
    Abstract: We extend the popular bias-based test of Stock and Yogo (2005) for instrument strength in linear instrumental variables regressions with multiple endogenous regressors to be robust to heteroskedasticity and autocorrelation. Equivalently, we extend the robust test of Montiel Olea and Pflueger (2013) for one endogenous regressor to the general case with multiple endogenous regressors. We describe a simple procedure for applied researchers to conduct our generalized first-stage test of instrument strength and provide efficient and easy-to-use Matlab code for its implementation. We demonstrate our testing procedures by considering the estimation of the state-dependent effects of fiscal policy as in Ramey and Zubairy (2018).
    Keywords: instrumental variables; weak instruments test; multiple endogenous regressors; heteroskedasticity; serial correlation
    JEL: C26 C36
    Date: 2022–06–01
  12. By: Gary Koop; Stuart McIntyre; James Mitchell; Aubrey Poon
    Abstract: Recent decades have seen advances in using econometric methods to produce more timely and higher frequency estimates of economic activity at the national level, enabling better tracking of the economy in real-time. These advances have not generally been replicated at the sub-national level, likely because of the empirical challenges that nowcasting at a regional level present, notably, the short time series of available data, changes in data frequency over time, and the hierarchical structure of the data. This paper develops a mixed-frequency Bayesian VAR model to address common features of the regional nowcasting context, using an application to regional productivity in the UK. We evaluate the contribution that different features of our model provide to the accuracy of point and density nowcasts, in particular the role of hierarchical aggregation constraints. We show that these aggregation constraints, imposed in stochastic form, play a key role in delivering improved regional nowcasts; they prove more important than adding region specific predictors when the equivalent national data are known, but not when this aggregate is unknown.
    Keywords: bayesian methods, mixed frequency nowcasting, real-time data, regional data
    JEL: C32 C53 E37
    Date: 2022–03
  13. By: Paulo M.M. Rodrigues; Robert Hill
    Abstract: There is widespread evidence of parameter instability in the literature. One way to account for this feature is through the use of time-varying parameter (TVP) models that discount older data in favour of more recent data. This practise is often known as forgetting and can be applied in several different ways. This paper introduces and examines the performance of different (flexible) forgetting methodologies in the context of the Kalman filter. We review and develop the theoretical background and investigate the performance of each methodology in simulations as well as in two empirical forecast exercises using dynamic model averaging (DMA). Specifically, out-of-sample DMA forecasts of CPI inflation and S&P500 returns obtained using different forgetting approaches are compared. Results show that basing the amount of forgetting on the forecast error does not perform as well as avoiding instability by placing bounds on the parameter covariance matrix.
    JEL: C22 C51 C53
    Date: 2022
  14. By: Paul Diegert; Matthew A. Masten; Alexandre Poirier
    Abstract: Omitted variables are one of the most important threats to the identification of causal effects. Several widely used methods, including Oster (2019), have been developed to assess the impact of omitted variables on empirical conclusions. These methods all require an exogenous controls assumption: the omitted variables must be uncorrelated with the included controls. This is often considered a strong and implausible assumption. We provide a new approach to sensitivity analysis that allows for endogenous controls, while still letting researchers calibrate sensitivity parameters by comparing the magnitude of selection on observables with the magnitude of selection on unobservables. We illustrate our results in an empirical study of the effect of historical American frontier life on modern cultural beliefs. Finally, we implement these methods in the companion Stata module regsensitivity for easy use in practice.
    Date: 2022–06
  15. By: Ankur Moitra; Dhruv Rohatgi
    Abstract: Measuring the stability of conclusions derived from Ordinary Least Squares linear regression is critically important, but most metrics either only measure local stability (i.e. against infinitesimal changes in the data), or are only interpretable under statistical assumptions. Recent work proposes a simple, global, finite-sample stability metric: the minimum number of samples that need to be removed so that rerunning the analysis overturns the conclusion, specifically meaning that the sign of a particular coefficient of the estimated regressor changes. However, besides the trivial exponential-time algorithm, the only approach for computing this metric is a greedy heuristic that lacks provable guarantees under reasonable, verifiable assumptions; the heuristic provides a loose upper bound on the stability and also cannot certify lower bounds on it. We show that in the low-dimensional regime where the number of covariates is a constant but the number of samples is large, there are efficient algorithms for provably estimating (a fractional version of) this metric. Applying our algorithms to the Boston Housing dataset, we exhibit regression analyses where we can estimate the stability up to a factor of $3$ better than the greedy heuristic, and analyses where we can certify stability to dropping even a majority of the samples.
    Date: 2022–05
  16. By: Nail Kashaev (University of Western Ontario)
    Abstract: I propose a generalized method of moments type procedure to estimate parametric binary choice models when the researcher only observes degenerate pure choices-based or presence-only data and has some information about the distribution of the covariates. This auxiliary information comes in the form of moments. I present an application based on the data on all COVID-19-positive tests from British Columbia. Publicly available demographic information on the population in British Columbia allows me to estimate the conditional probability of a person being COVID-19-positively tested conditional on demographics.
    Keywords: Pure Choice-Based Data, Presence-Only Data, Data Combination, Missing Data, Epidemiology, Novel Coronavirus
    JEL: C2 C81 I19
    Date: 2022
  17. By: Vivek F. Farias; Andrew A. Li; Tianyi Peng; Andrew T. Zheng
    Abstract: We consider experiments in dynamical systems where interventions on some experimental units impact other units through a limiting constraint (such as a limited inventory). Despite outsize practical importance, the best estimators for this `Markovian' interference problem are largely heuristic in nature, and their bias is not well understood. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, apparently incur a large penalty in variance relative to state-of-the-art heuristics. We introduce an on-policy estimator: the Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general have exponentially smaller variance than off-policy evaluation. At the same time, its bias is second order in the impact of the intervention. This yields a striking bias-variance tradeoff so that the DQ estimator effectively dominates state-of-the-art alternatives. From a theoretical perspective, we introduce three separate novel techniques that are of independent interest in the theory of Reinforcement Learning (RL). Our empirical evaluation includes a set of experiments on a city-scale ride-hailing simulator.
    Date: 2022–06
  18. By: Victor H. Aguiar (University of Western Ontario); Nail Kashaev (University of Western Ontario); Roy Allen (University of Western Ontario)
    Abstract: This paper studies nonparametric identification and counterfactual bounds for heterogeneous firms that can be ranked in terms of productivity. Our approach works when quantities and prices are latent, rendering standard approaches inapplicable. Instead, we require observation of profits or other optimizing-values such as costs or revenues, and either prices or price proxies of flexibly chosen variables. We extend classical duality results for price-taking firms to a setup with discrete heterogeneity, endogeneity, and limited variation in possibly latent prices. Finally, we show that convergence results for nonparametric estimators may be directly converted to convergence results for production sets.
    Keywords: Counterfactual Bounds, Cost Minimization, Nonseparable Heterogeneity, Partial Identification, Profit Maximization, Production Set, Revenue Maximization, Shape Restrictions
    JEL: C5 D24
    Date: 2022
  19. By: Martin Weale; Paul Labonne
    Abstract: This paper extends the temporal disaggregation approach of Labonne and Weale (2020) to tackle another feature of the VAT data: the delay and highly noisy nature of the early figures. The main contribution of this paper lies in the presentation and illustration of a cleaning method which can deal with non-Gaussian features in the distribution of measurement errors such as asymmetry and extreme observations.
    Keywords: cleaning, fat tails, measurement errors, nowcasting, score driven model
    JEL: C32 C53
    Date: 2022–03
  20. By: Qiang Liu; Yingtao Luo; Shu Wu; Zhen Zhang; Xiangnan Yue; Hong Jin; Liang Wang
    Abstract: In financial credit scoring, loan applications may be approved or rejected. We can only observe default/non-default labels for approved samples but have no observations for rejected samples, which leads to missing-not-at-random selection bias. Machine learning models trained on such biased data are inevitably unreliable. In this work, we find that the default/non-default classification task and the rejection/approval classification task are highly correlated, according to both real-world data study and theoretical analysis. Consequently, the learning of default/non-default can benefit from rejection/approval. Accordingly, we for the first time propose to model the biased credit scoring data with Multi-Task Learning (MTL). Specifically, we propose a novel Reject-aware Multi-Task Network (RMT-Net), which learns the task weights that control the information sharing from the rejection/approval task to the default/non-default task by a gating network based on rejection probabilities. RMT-Net leverages the relation between the two tasks that the larger the rejection probability, the more the default/non-default task needs to learn from the rejection/approval task. Furthermore, we extend RMT-Net to RMT-Net++ for modeling scenarios with multiple rejection/approval strategies. Extensive experiments are conducted on several datasets, and strongly verifies the effectiveness of RMT-Net on both approved and rejected samples. In addition, RMT-Net++ further improves RMT-Net's performances.
    Date: 2022–06

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.