nep-ecm 2020-05-25 papers

on Econometrics

Issue of 2020‒05‒25
twenty papers chosen by
Sune Karlsson
Örebro universitet

Fast and Accurate Variational Inference for Models with Many Latent Variables By Rub\'en Loaiza-Maya; Michael Stanley Smith; David J. Nott; Peter J. Danaher
Bootstrapping Quasi Likelihood Ratio Tests under Misspecification By Lavergne, Pascal; Bertail, Patrice
Modeling High-Dimensional Unit-Root Time Series By Zhaoxing Gao; Ruey S. Tsay
Bayesian dynamic variable selection in high dimensions By Gary Koop; Dimitris Korobilis
General Doubly Robust Identification and Estimation By Arthur Lewbel; Jin-Young Choi; Zhuzhu Zhou
Detecting Latent Communities in Network Formation Models By Shujie Ma; Liangjun Su; Yichong Zhang
On unbalanced data and common shock models in stochastic loss reserving By Benjamin Avanzi; Phuong Anh Vu; Gregory Clive Taylor; Bernard Wong
Smooth marginalized particle filters for dynamic network effect models By Dieter Wang; Julia Schaumburg
A New Method for Estimating Teacher Value-Added By Michael Gilraine; Jiaying Gu; Robert McMillan
Multivariate non-Gaussian models for financial applications By Michele Leonardo Bianchi; Asmerilda Hitaj; Gian Luca Tassinari
Arbitrage Pricing, Weak Beta, Strong Beta: Identification-Robust and Simultaneous Inference By Marie-Claude Beaulieu; Jean-Marie Dufour; Lynda Khalaf
Kotlarski with a Factor Loading By Arthur Lewbel
Time Varying Markov Process with Partially Observed Aggregate Data; An Application to Coronavirus By Christian GOURIEROUX; Joann JASIAK
Combining Population and Study Data for Inference on Event Rates By Christoph Rothe
Size does matter. A study on the required window size for optimal quality market risk models By Mateusz Buczyński; Marcin Chlebus
A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels By Wallimann, Hannes; Imhof, David; Huber, Martin
The Influence of Hidden Researcher Decisions in Applied Microeconomics By Huntington-Klein, Nick; Arenas, Andreu; Beam, Emily; Bertoni, Marco; Bloem, Jeffrey R.; Burli, Pralhad; Chen, Naibin; Greico, Paul; Ekpe, Godwin; Pugatch, Todd; Saavedra, Martin; Stopnitzky, Yaniv
Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition By Huber, Martin; Laffers, Lukáš
Social Networks with Misclassified or Unobserved Links By Arthur Lewbel; Xi Qu; Xun Tang
Tactics for design and inference in synthetic control studies: An applied example using high-dimensional data By Hollingsworth, Alex; Wing, Coady

Fast and Accurate Variational Inference for Models with Many Latent Variables

By:	Rub\'en Loaiza-Maya; Michael Stanley Smith; David J. Nott; Peter J. Danaher
Abstract:	Models with a large number of latent variables are often used to fully utilize the information in big or complex data. However, they can be difficult to estimate using standard approaches, and variational inference methods are a popular alternative. Key to the success of these is the selection of an approximation to the target density that is accurate, tractable and fast to calibrate using optimization methods. Mean field or structured Gaussian approximations are common, but these can be inaccurate and slow to calibrate when there are many latent variables. Instead, we propose a family of tractable variational approximations that are more accurate and faster to calibrate for this case. The approximation is a parsimonious copula model for the parameter posterior, combined with the exact conditional posterior of the latent variables. We derive a simplified expression for the re-parameterization gradient of the variational lower bound, which is the main ingredient of efficient optimization algorithms used to implement variational estimation. We illustrate using two substantive econometric examples. The first is a nonlinear state space model for U.S. inflation. The second is a random coefficients tobit model applied to a rich marketing dataset with one million sales observations from a panel of 10,000 individuals. In both cases, we show that our approximating family is faster to calibrate than either mean field or structured Gaussian approximations, and that the gains in posterior estimation accuracy are considerable.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.07430&r=all

Bootstrapping Quasi Likelihood Ratio Tests under Misspecification

By:	Lavergne, Pascal; Bertail, Patrice
Abstract:	We consider quasi likelihood ratio (QLR) tests for restrictions on parameters under potential model misspecification. For convex M-estimation, including quantile regression, we propose a general and simple nonparametric bootstrap procedure that yields asymptotically valid critical values. The method modifies the bootstrap objective function to mimic what happens under the null hypothesis. When testing for an univariate restriction, we show how the test statistic can be made asymptotically pivotal. Our bootstrap can then provide asymptotic refinements as illustrated for a linear regression model. A Monte-Carlo study and an empirical application illustrate that double bootstrap of the QLR test controls level well and is powerful.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:tse:wpaper:124273&r=all

Modeling High-Dimensional Unit-Root Time Series

By:	Zhaoxing Gao; Ruey S. Tsay
Abstract:	In this paper, we propose a new procedure to build a structural-factor model for a vector unit-root time series. For a $p$-dimensional unit-root process, we assume that each component consists of a set of common factors, which may be unit-root non-stationary, and a set of stationary components, which contain the cointegrations among the unit-root processes. To further reduce the dimensionality, we also postulate that the stationary part of the series is a nonsingular linear transformation of certain common factors and idiosyncratic white noise components as in Gao and Tsay (2019a, b). The estimation of linear loading spaces of the unit-root factors and the stationary components is achieved by an eigenanalysis of some nonnegative definite matrix, and the separation between the stationary factors and the white noises is based on an eigenanalysis and a projected principal component analysis. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to demonstrate the performance of the proposed method in finite samples.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.03496&r=all

Bayesian dynamic variable selection in high dimensions

By:	Gary Koop; Dimitris Korobilis
Abstract:	This paper proposes a variational Bayes algorithm for computationally efficient posterior and predictive inference in time-varying parameter (TVP) models. Within this context we specify a new dynamic variable/model selection strategy for TVP dynamic regression models in the presence of a large number of predictors. This strategy allows for assessing in individual time periods which predictors are relevant (or not) for forecasting the dependent variable. The new algorithm is evaluated numerically using synthetic data and its computational advantages are established. Using macroeconomic data for the US we find that regression models that combine time-varying parameters with the information in many predictors have the potential to improve forecasts of price inflation over a number of alternative forecasting models.
Keywords:	dynamic linear model; approximate posterior inference; dynamic variable selection; forecasting
JEL:	C11 C13 C52 C53 C61
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:gla:glaewp:2020_11&r=all

General Doubly Robust Identification and Estimation

By:	Arthur Lewbel (Boston College); Jin-Young Choi (Xiamen University); Zhuzhu Zhou (Boston College)
Abstract:	Consider two parametric models. At least one is correctly specified, but we donít know which. Both models include a common vector of parameters. An estimator for this common parameter vector is called Doubly Robust (DR) if it's consistent no matter which model is correct. We provide a general technique for constructing DR estimators. Our General Doubly Robust (GDR) technique is a simple extension of the Generalized Method of Moments. We illustrate our GDR with a variety of models, including average treatment effect estimation. Our empirical application is instrumental variables estimation, where either one of two instrument vectors might be invalid.
Keywords:	Doubly Robust Estimation, Generalized Method of Moments, Instrumental Variables, Average Treatment E§ects, Parametric Models
JEL:	C51 C36 C31
Date:	2019–12–15
URL:	http://d.repec.org/n?u=RePEc:boc:bocoec:1003&r=all

Detecting Latent Communities in Network Formation Models

By:	Shujie Ma; Liangjun Su; Yichong Zhang
Abstract:	This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.03226&r=all

On unbalanced data and common shock models in stochastic loss reserving

By:	Benjamin Avanzi; Phuong Anh Vu; Gregory Clive Taylor; Bernard Wong
Abstract:	Introducing common shocks is a popular dependence modelling approach, with some recent applications in loss reserving. The main advantage of this approach is the ability to capture structural dependence coming from known relationships. In addition, it helps with the parsimonious construction of correlation matrices of large dimensions. However, complications arise in the presence of "unbalanced data", that is, when (expected) magnitude of observations over a single triangle, or between triangles, can vary substantially. Specifically, if a single common shock is applied to all of these cells, it can contribute insignificantly to the larger values and/or swamp the smaller ones, unless careful adjustments are made. This problem is further complicated in applications involving negative claim amounts. In this paper, we address this problem in the loss reserving context using a common shock Tweedie approach for unbalanced data. We show that the solution not only provides a much better balance of the common shock proportions relative to the unbalanced data, but it is also parsimonious. Finally, the common shock Tweedie model also provides distributional tractability.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.03500&r=all

Smooth marginalized particle filters for dynamic network effect models

By:	Dieter Wang (Vrije Universiteit Amsterdam); Julia Schaumburg (Vrije Universiteit Amsterdam)
Abstract:	We propose a dynamic network model for the study of high-dimensional panel data. Crosssectional dependencies between units are captured via one or multiple observed networks and a low-dimensional vector of latent stochastic network intensity parameters. The parameterdriven, nonlinear structure of the model requires simulation-based filtering and estimation, for which we suggest to use the smooth marginalized particle filter (SMPF). In a Monte Carlo simulation study, we demonstrate the SMPFâ€™s good performance relative to benchmarks, particularly when the cross-section dimension is large and the network is dense. An empirical application on the propagation of COVID-19 through international travel networks illustrates the usefulness of our method.
Keywords:	Dynamic network effects, Multiple networks, Nonlinear state-space model, Smooth marginalized particle filter, COVID-19
JEL:	C63 C32 C33
Date:	2020–05–10
URL:	http://d.repec.org/n?u=RePEc:tin:wpaper:20200023&r=all

A New Method for Estimating Teacher Value-Added

By:	Michael Gilraine; Jiaying Gu; Robert McMillan
Abstract:	This paper proposes a new methodology for estimating teacher value-added. Rather than imposing a normality assumption on unobserved teacher quality (as in the standard empirical Bayes approach), our nonparametric estimator permits the underlying distribution to be estimated directly and in a computationally feasible way. The resulting estimates fit the unobserved distribution very well regardless of the form it takes, as we show in Monte Carlo simulations. Implementing the nonparametric approach in practice using two separate large-scale administrative data sets, we find the estimated teacher value-added distributions depart from normality and differ from each other. To draw out the policy implications of our method, we first consider a widely-discussed policy to release teachers at the bottom of the value-added distribution, comparing predicted test score gains under our nonparametric approach with those using parametric empirical Bayes. Here the parametric method predicts similar policy gains in one data set while overestimating those in the other by a substantial margin. We also show the predicted gains from teacher retention policies can be underestimated significantly based on the parametric method. In general, the results highlight the benefit of our nonparametric empirical Bayes approach, given that the unobserved distribution of value-added is likely to be context-specific.
JEL:	C14 H75 I21 J24 J45
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:27094&r=all

Multivariate non-Gaussian models for financial applications

By:	Michele Leonardo Bianchi; Asmerilda Hitaj; Gian Luca Tassinari
Abstract:	In this paper we consider several continuous-time multivariate non-Gaussian models applied to finance and proposed in the literature in the last years. We study the models focusing on the parsimony of the number of parameters, the properties of the dependence structure, and the computational tractability. For each model we analyze the main features, we provide the characteristic function, the marginal moments up to order four, the covariances and the correlations. Thus, we describe how to calibrate them on the time-series of log-returns with a view toward practical applications and possible numerical issues. To empirically compare these models, we conduct an analysis on a five-dimensional series of stock index log-returns.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.06390&r=all

Arbitrage Pricing, Weak Beta, Strong Beta: Identification-Robust and Simultaneous Inference

By:	Marie-Claude Beaulieu; Jean-Marie Dufour; Lynda Khalaf
Abstract:	Factor models based on Arbitrage Pricing Theory (APT) characterize key parameters jointly and nonlinearly, which complicates identification. We propose simultaneous inference methods which preserve equilibrium relations between all model parameters including ex-post sample-dependent ones, without assuming identification. Confidence sets based on inverting joint tests are derived, and tractable analytical solutions are supplied. These allow one to assess whether traded and nontraded factors are priced risk-drivers, and to take account of cross-sectional intercepts. A formal test for traded factor assumptions is proposed. Simulation and empirical analyses are conducted with Fama-French factors. Simulation results underscore the information content of cross-sectional intercept and traded factor restrictions. Three empirical results are especially noteworthy: (1) the Fama-French three factors are priced before 1970; thereafter, we find no evidence favoring any factor relative to the market; (2) heterogeneity is not sufficient to distinguish priced momentum from profitability or investment risk; (3) after the 1970s, factors are rejected or appear to be weak, depending on intercept restrictions or test portfolios.
Keywords:	Capital Asset Pricing Model,CAPM,Arbitrage Pricing Theory,Black,Fama-French Factors,Meanvariance Efficiency,Non-Normality,Weak Identification,Identification-Robust,Projection,Fieller,Multivariate Linear Regression,Uniform Linear Hypothesis,Exact Test,Monte Carlo Test,Bootstrap,Nuisance Parameters,
JEL:	C1 C12 C3 C38 C58 G1 G11 G12
Date:	2020–05–08
URL:	http://d.repec.org/n?u=RePEc:cir:cirwor:2020s-30&r=all

Kotlarski with a Factor Loading

By:	Arthur Lewbel (Boston College)
Abstract:	This note extends the Kotlarski (1967) Lemma to show exactly what is identified when we allow for an unknown factor loading on the common unobserved factor. Potential applications include measurement error models and panel data factor models.
Keywords:	unobserved factor, factor loading
Date:	2020–04–15
URL:	http://d.repec.org/n?u=RePEc:boc:bocoec:1001&r=all

Time Varying Markov Process with Partially Observed Aggregate Data; An Application to Coronavirus

By:	Christian GOURIEROUX (University of Toronto, Toulouse School of Economics and CREST); Joann JASIAK (York University, Canada)
Abstract:	A major difficulty in the analysis of propagation of the coronavirus is that many infected individuals show no symptoms of Covid-19. This implies a lack of information on the total counts of infected individuals and of recovered and immunized individuals. In this paper, we consider parametric time varying Markov processes of Coronavirus propagation and show how to estimate the model parameters and approximate the unobserved counts from daily numbers of infected and detectedi ndividuals and total daily death counts. This model-based approach is illustrated in an application to French data.
Keywords:	Markov Process; Partial Observability; Information Recovery; Estimating Equations; SIR Model; Coronavirus; Infection Rate.
Date:	2020–03–31
URL:	http://d.repec.org/n?u=RePEc:crs:wpaper:2020-11&r=all

Combining Population and Study Data for Inference on Event Rates

By:	Christoph Rothe
Abstract:	This note considers the problem of conducting statistical inference on the share of individuals in some subgroup of a population that experience some event. The specific complication is that the size of the subgroup needs to be estimated, whereas the number of individuals that experience the event is known. The problem is motivated by the recent study of Streeck et al. (2020), who estimate the infection fatality rate (IFR) of SARS-CoV-2 infection in a German town that experienced a super-spreading event in mid-February 2020. In their case the subgroup of interest is comprised of all infected individuals, and the event is death caused by the infection. We clarify issues with the precise definition of the target parameter in this context, and propose confidence intervals (CIs) based on classical statistical principles that result in good coverage properties.
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2005.06769&r=all

Size does matter. A study on the required window size for optimal quality market risk models

By:	Mateusz Buczyński (Interdisciplinary Doctoral School, University of Warsaw); Marcin Chlebus (Faculty of Economic Sciences, University of Warsaw)
Abstract:	When it comes to market risk models, should we use full data that we possess or rather find a sufficient subsample? We have conducted a study of different fixed moving window’s lengths (from 300 to 2000 observations) for three Value-at-Risk models: historical simulation, GARCH and CAViaR model for three different indexes: WIG20, S&P500 and FTSE100. Testing samples contained 250 observations, each ending with the end of years 2015-2019. We have also addressed the subjectivity of choosing the window’s size by testing change points detection algorithms: binary segmentation and Pelt; to find the best matching cut-off point. Results indicate that the size of the training sample greater than 900-1000 observations doesn’t increase the quality of the model, while the lengths lower than such cut-off provide unsatisfactory results and decrease model’s conservatism. Change point detection methods provide more accurate models. Applying the algorithms with every model’s recalculation provides results better by on average 1 exceedance. Our recommendation is to use GARCH or CAViaR model with recalculated window size.
Keywords:	Value at Risk; historical simulation; CAViaR; GARCH; forecast comparison; sample size
JEL:	G32 C52 C53 C58
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:war:wpaper:2020-09&r=all

A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

By:	Wallimann, Hannes (Faculty of Economics and Social Sciences); Imhof, David; Huber, Martin
Abstract:	We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.
Keywords:	Bid rigging detection; screening methods; descriptive statistics; machine learning; random forest; lasso; ensemble methods
JEL:	C21 C45 C52 D22 D40 K40
Date:	2020–04–01
URL:	http://d.repec.org/n?u=RePEc:fri:fribow:fribow00513&r=all

The Influence of Hidden Researcher Decisions in Applied Microeconomics

By:	Huntington-Klein, Nick; Arenas, Andreu; Beam, Emily; Bertoni, Marco; Bloem, Jeffrey R.; Burli, Pralhad; Chen, Naibin; Greico, Paul; Ekpe, Godwin; Pugatch, Todd; Saavedra, Martin; Stopnitzky, Yaniv
Abstract:	Researchers make hundreds of decisions about data collection, preparation, and analysis in their research. We use a many-analysts approach to measure the extent and impact of these decisions. Two published causal empirical results are replicated by seven replicators each. We find large differences in data preparation and analysis decisions, many of which would not likely be reported in a publication. No two replicators reported the same sample size. Statistical significance varied across replications, and for one of the studies the effect’s sign varied as well. The standard deviation of estimates across replications was 3-4 times the typical reported standard error.
Keywords:	Replication,Metascience,Research
JEL:	C81 C10 B41
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:glodps:537&r=all

Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition

By:	Huber, Martin; Laffers, Lukáš (Matej Bel University)
Abstract:	Causal mediation analysis aims at disentangling a treatment effect into an indirect mechanism operating through an intermediate outcome or mediator, as well as the direct effect of the treatment on the outcome of interest. However, the evaluation of direct and indirect effects is frequently complicated by non-ignorable selection into the treatment and/or mediator, even after controlling for observables, as well as sample selection/outcome attrition. We propose a method for bounding direct and indirect effects in the presence of such complications using a method that is based on a sequence of linear programming problems. Considering inverse probability weighting by propensity scores, we compute the weights that would yield identification in the absence of complications and perturb them by an entropy parameter reflecting a specific amount of propensity score misspecification to set-identify the effects of interest. We apply our method to data from the National Longitudinal Survey of Youth 1979 to derive bounds on the explained and unexplained components of a gender wage gap decomposition that is likely prone to non-ignorable mediator selection and outcome attrition.
Keywords:	Causal mechanisms; direct effects; indirect effects; causal channels; mediation analysis; sample selection; bounds
JEL:	C21
Date:	2020–05–01
URL:	http://d.repec.org/n?u=RePEc:fri:fribow:fribow00514&r=all

Social Networks with Misclassified or Unobserved Links

By:	Arthur Lewbel (Boston College); Xi Qu (Shanghai Jiao Tong University); Xun Tang (Rice University)
Abstract:	We identify and estimate social network models when network links are either misclassified or unobserved in the data. First, we derive conditions under which some misclassification of links does not interfere with the consistency or asymptotic properties of standard instrumental variable estimators of social effects. Second, we construct a consistent estimator of social effects in a model where network links are not observed in the data at all. Our method does not require repeated observations of individual network members. We apply our estimator to data from Tennessee's Student/Teacher Achievement Ratio (STAR) Project. Without observing the latent network in each classroom, we identify and estimate peer and contextual effects on students' performance in mathematics. We found that peer effects tend to be larger in bigger classes, and that increasing peer effects would significantly improve students' average test scores.
Keywords:	Social networks, Peer effects, Misclassified links, Missing links, Mismeasured network, Unobserved network, Classroom performance
Date:	2019–07–30
URL:	http://d.repec.org/n?u=RePEc:boc:bocoec:1004&r=all

Tactics for design and inference in synthetic control studies: An applied example using high-dimensional data

By:	Hollingsworth, Alex; Wing, Coady
Abstract:	We describe identification assumptions underlying synthetic control studies and offer recommendations for key---and normally ad hoc---implementation decisions, focusing on model selection; model fit; cross-validation; and decision rules for inference. We outline how to implement a Synthetic Control Using Lasso (SCUL). The method---available as an R package---allows for a high-dimensional donor pool; automates model selection; includes donors from a wide range of variable types; and permits both extrapolation and negative weights. In an application, we employ our recommendations and the SCUL strategy to estimate how recreational marijuana legalization affects sales of alcohol and over-the-counter painkillers, finding reductions in alcohol sales.
Date:	2020–05–03
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:fc9xt&r=all

This nep-ecm issue is ©2020 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.