Econometrics
http://lists.repec.org/mailman/listinfo/nep-ecm
Econometrics
2020-05-25
Fast and Accurate Variational Inference for Models with Many Latent Variables
http://d.repec.org/n?u=RePEc:arx:papers:2005.07430&r=ecm
Models with a large number of latent variables are often used to fully utilize the information in big or complex data. However, they can be difficult to estimate using standard approaches, and variational inference methods are a popular alternative. Key to the success of these is the selection of an approximation to the target density that is accurate, tractable and fast to calibrate using optimization methods. Mean field or structured Gaussian approximations are common, but these can be inaccurate and slow to calibrate when there are many latent variables. Instead, we propose a family of tractable variational approximations that are more accurate and faster to calibrate for this case. The approximation is a parsimonious copula model for the parameter posterior, combined with the exact conditional posterior of the latent variables. We derive a simplified expression for the re-parameterization gradient of the variational lower bound, which is the main ingredient of efficient optimization algorithms used to implement variational estimation. We illustrate using two substantive econometric examples. The first is a nonlinear state space model for U.S. inflation. The second is a random coefficients tobit model applied to a rich marketing dataset with one million sales observations from a panel of 10,000 individuals. In both cases, we show that our approximating family is faster to calibrate than either mean field or structured Gaussian approximations, and that the gains in posterior estimation accuracy are considerable.
Rub\'en Loaiza-Maya
Michael Stanley Smith
David J. Nott
Peter J. Danaher
2020-05
Bootstrapping Quasi Likelihood Ratio Tests under Misspecification
http://d.repec.org/n?u=RePEc:tse:wpaper:124273&r=ecm
We consider quasi likelihood ratio (QLR) tests for restrictions on parameters under potential model misspecification. For convex M-estimation, including quantile regression, we propose a general and simple nonparametric bootstrap procedure that yields asymptotically valid critical values. The method modifies the bootstrap objective function to mimic what happens under the null hypothesis. When testing for an univariate restriction, we show how the test statistic can be made asymptotically pivotal. Our bootstrap can then provide asymptotic refinements as illustrated for a linear regression model. A Monte-Carlo study and an empirical application illustrate that double bootstrap of the QLR test controls level well and is powerful.
Lavergne, Pascal
Bertail, Patrice
2020-05
Modeling High-Dimensional Unit-Root Time Series
http://d.repec.org/n?u=RePEc:arx:papers:2005.03496&r=ecm
In this paper, we propose a new procedure to build a structural-factor model for a vector unit-root time series. For a $p$-dimensional unit-root process, we assume that each component consists of a set of common factors, which may be unit-root non-stationary, and a set of stationary components, which contain the cointegrations among the unit-root processes. To further reduce the dimensionality, we also postulate that the stationary part of the series is a nonsingular linear transformation of certain common factors and idiosyncratic white noise components as in Gao and Tsay (2019a, b). The estimation of linear loading spaces of the unit-root factors and the stationary components is achieved by an eigenanalysis of some nonnegative definite matrix, and the separation between the stationary factors and the white noises is based on an eigenanalysis and a projected principal component analysis. Asymptotic properties of the proposed method are established for both fixed $p$ and diverging $p$ as the sample size $n$ tends to infinity. Both simulated and real examples are used to demonstrate the performance of the proposed method in finite samples.
Zhaoxing Gao
Ruey S. Tsay
2020-05
Bayesian dynamic variable selection in high dimensions
http://d.repec.org/n?u=RePEc:gla:glaewp:2020_11&r=ecm
This paper proposes a variational Bayes algorithm for computationally efficient posterior and predictive inference in time-varying parameter (TVP) models. Within this context we specify a new dynamic variable/model selection strategy for TVP dynamic regression models in the presence of a large number of predictors. This strategy allows for assessing in individual time periods which predictors are relevant (or not) for forecasting the dependent variable. The new algorithm is evaluated numerically using synthetic data and its computational advantages are established. Using macroeconomic data for the US we find that regression models that combine time-varying parameters with the information in many predictors have the potential to improve forecasts of price inflation over a number of alternative forecasting models.
Gary Koop
Dimitris Korobilis
dynamic linear model; approximate posterior inference; dynamic variable selection; forecasting
2020-05
General Doubly Robust Identification and Estimation
http://d.repec.org/n?u=RePEc:boc:bocoec:1003&r=ecm
Consider two parametric models. At least one is correctly specified, but we donít know which. Both models include a common vector of parameters. An estimator for this common parameter vector is called Doubly Robust (DR) if it's consistent no matter which model is correct. We provide a general technique for constructing DR estimators. Our General Doubly Robust (GDR) technique is a simple extension of the Generalized Method of Moments. We illustrate our GDR with a variety of models, including average treatment effect estimation. Our empirical application is instrumental variables estimation, where either one of two instrument vectors might be invalid.
Arthur Lewbel
Jin-Young Choi
Zhuzhu Zhou
Doubly Robust Estimation, Generalized Method of Moments, Instrumental Variables, Average Treatment E§ects, Parametric Models
2019-12-15
Detecting Latent Communities in Network Formation Models
http://d.repec.org/n?u=RePEc:arx:papers:2005.03226&r=ecm
This paper proposes a logistic undirected network formation model which allows for assortative matching on observed individual characteristics and the presence of edge-wise fixed effects. We model the coefficients of observed characteristics to have a latent community structure and the edge-wise fixed effects to be of low rank. We propose a multi-step estimation procedure involving nuclear norm regularization, sample splitting, iterative logistic regression and spectral clustering to detect the latent communities. We show that the latent communities can be exactly recovered when the expected degree of the network is of order log n or higher, where n is the number of nodes in the network. The finite sample performance of the new estimation and inference methods is illustrated through both simulated and real datasets.
Shujie Ma
Liangjun Su
Yichong Zhang
2020-05
On unbalanced data and common shock models in stochastic loss reserving
http://d.repec.org/n?u=RePEc:arx:papers:2005.03500&r=ecm
Introducing common shocks is a popular dependence modelling approach, with some recent applications in loss reserving. The main advantage of this approach is the ability to capture structural dependence coming from known relationships. In addition, it helps with the parsimonious construction of correlation matrices of large dimensions. However, complications arise in the presence of "unbalanced data", that is, when (expected) magnitude of observations over a single triangle, or between triangles, can vary substantially. Specifically, if a single common shock is applied to all of these cells, it can contribute insignificantly to the larger values and/or swamp the smaller ones, unless careful adjustments are made. This problem is further complicated in applications involving negative claim amounts. In this paper, we address this problem in the loss reserving context using a common shock Tweedie approach for unbalanced data. We show that the solution not only provides a much better balance of the common shock proportions relative to the unbalanced data, but it is also parsimonious. Finally, the common shock Tweedie model also provides distributional tractability.
Benjamin Avanzi
Phuong Anh Vu
Gregory Clive Taylor
Bernard Wong
2020-05
Smooth marginalized particle filters for dynamic network effect models
http://d.repec.org/n?u=RePEc:tin:wpaper:20200023&r=ecm
We propose a dynamic network model for the study of high-dimensional panel data. Crosssectional dependencies between units are captured via one or multiple observed networks and a low-dimensional vector of latent stochastic network intensity parameters. The parameterdriven, nonlinear structure of the model requires simulation-based filtering and estimation, for which we suggest to use the smooth marginalized particle filter (SMPF). In a Monte Carlo simulation study, we demonstrate the SMPFâ€™s good performance relative to benchmarks, particularly when the cross-section dimension is large and the network is dense. An empirical application on the propagation of COVID-19 through international travel networks illustrates the usefulness of our method.
Dieter Wang
Julia Schaumburg
Dynamic network effects, Multiple networks, Nonlinear state-space model, Smooth marginalized particle filter, COVID-19
2020-05-10
A New Method for Estimating Teacher Value-Added
http://d.repec.org/n?u=RePEc:nbr:nberwo:27094&r=ecm
This paper proposes a new methodology for estimating teacher value-added. Rather than imposing a normality assumption on unobserved teacher quality (as in the standard empirical Bayes approach), our nonparametric estimator permits the underlying distribution to be estimated directly and in a computationally feasible way. The resulting estimates fit the unobserved distribution very well regardless of the form it takes, as we show in Monte Carlo simulations. Implementing the nonparametric approach in practice using two separate large-scale administrative data sets, we find the estimated teacher value-added distributions depart from normality and differ from each other. To draw out the policy implications of our method, we first consider a widely-discussed policy to release teachers at the bottom of the value-added distribution, comparing predicted test score gains under our nonparametric approach with those using parametric empirical Bayes. Here the parametric method predicts similar policy gains in one data set while overestimating those in the other by a substantial margin. We also show the predicted gains from teacher retention policies can be underestimated significantly based on the parametric method. In general, the results highlight the benefit of our nonparametric empirical Bayes approach, given that the unobserved distribution of value-added is likely to be context-specific.
Michael Gilraine
Jiaying Gu
Robert McMillan
2020-05
Multivariate non-Gaussian models for financial applications
http://d.repec.org/n?u=RePEc:arx:papers:2005.06390&r=ecm
In this paper we consider several continuous-time multivariate non-Gaussian models applied to finance and proposed in the literature in the last years. We study the models focusing on the parsimony of the number of parameters, the properties of the dependence structure, and the computational tractability. For each model we analyze the main features, we provide the characteristic function, the marginal moments up to order four, the covariances and the correlations. Thus, we describe how to calibrate them on the time-series of log-returns with a view toward practical applications and possible numerical issues. To empirically compare these models, we conduct an analysis on a five-dimensional series of stock index log-returns.
Michele Leonardo Bianchi
Asmerilda Hitaj
Gian Luca Tassinari
2020-05
Arbitrage Pricing, Weak Beta, Strong Beta: Identification-Robust and Simultaneous Inference
http://d.repec.org/n?u=RePEc:cir:cirwor:2020s-30&r=ecm
Factor models based on Arbitrage Pricing Theory (APT) characterize key parameters jointly and nonlinearly, which complicates identification. We propose simultaneous inference methods which preserve equilibrium relations between all model parameters including ex-post sample-dependent ones, without assuming identification. Confidence sets based on inverting joint tests are derived, and tractable analytical solutions are supplied. These allow one to assess whether traded and nontraded factors are priced risk-drivers, and to take account of cross-sectional intercepts. A formal test for traded factor assumptions is proposed. Simulation and empirical analyses are conducted with Fama-French factors. Simulation results underscore the information content of cross-sectional intercept and traded factor restrictions. Three empirical results are especially noteworthy: (1) the Fama-French three factors are priced before 1970; thereafter, we find no evidence favoring any factor relative to the market; (2) heterogeneity is not sufficient to distinguish priced momentum from profitability or investment risk; (3) after the 1970s, factors are rejected or appear to be weak, depending on intercept restrictions or test portfolios.
Marie-Claude Beaulieu
Jean-Marie Dufour
Lynda Khalaf
Capital Asset Pricing Model,CAPM,Arbitrage Pricing Theory,Black,Fama-French Factors,Meanvariance Efficiency,Non-Normality,Weak Identification,Identification-Robust,Projection,Fieller,Multivariate Linear Regression,Uniform Linear Hypothesis,Exact Test,Monte Carlo Test,Bootstrap,Nuisance Parameters,
2020-05-08
Kotlarski with a Factor Loading
http://d.repec.org/n?u=RePEc:boc:bocoec:1001&r=ecm
This note extends the Kotlarski (1967) Lemma to show exactly what is identified when we allow for an unknown factor loading on the common unobserved factor. Potential applications include measurement error models and panel data factor models.
Arthur Lewbel
unobserved factor, factor loading
2020-04-15
Time Varying Markov Process with Partially Observed Aggregate Data; An Application to Coronavirus
http://d.repec.org/n?u=RePEc:crs:wpaper:2020-11&r=ecm
A major difficulty in the analysis of propagation of the coronavirus is that many infected individuals show no symptoms of Covid-19. This implies a lack of information on the total counts of infected individuals and of recovered and immunized individuals. In this paper, we consider parametric time varying Markov processes of Coronavirus propagation and show how to estimate the model parameters and approximate the unobserved counts from daily numbers of infected and detectedi ndividuals and total daily death counts. This model-based approach is illustrated in an application to French data.
Christian GOURIEROUX
Joann JASIAK
Markov Process; Partial Observability; Information Recovery; Estimating Equations; SIR Model; Coronavirus; Infection Rate.
2020-03-31
Combining Population and Study Data for Inference on Event Rates
http://d.repec.org/n?u=RePEc:arx:papers:2005.06769&r=ecm
This note considers the problem of conducting statistical inference on the share of individuals in some subgroup of a population that experience some event. The specific complication is that the size of the subgroup needs to be estimated, whereas the number of individuals that experience the event is known. The problem is motivated by the recent study of Streeck et al. (2020), who estimate the infection fatality rate (IFR) of SARS-CoV-2 infection in a German town that experienced a super-spreading event in mid-February 2020. In their case the subgroup of interest is comprised of all infected individuals, and the event is death caused by the infection. We clarify issues with the precise definition of the target parameter in this context, and propose confidence intervals (CIs) based on classical statistical principles that result in good coverage properties.
Christoph Rothe
2020-05
Size does matter. A study on the required window size for optimal quality market risk models
http://d.repec.org/n?u=RePEc:war:wpaper:2020-09&r=ecm
When it comes to market risk models, should we use full data that we possess or rather find a sufficient subsample? We have conducted a study of different fixed moving window’s lengths (from 300 to 2000 observations) for three Value-at-Risk models: historical simulation, GARCH and CAViaR model for three different indexes: WIG20, S&P500 and FTSE100. Testing samples contained 250 observations, each ending with the end of years 2015-2019. We have also addressed the subjectivity of choosing the window’s size by testing change points detection algorithms: binary segmentation and Pelt; to find the best matching cut-off point. Results indicate that the size of the training sample greater than 900-1000 observations doesn’t increase the quality of the model, while the lengths lower than such cut-off provide unsatisfactory results and decrease model’s conservatism. Change point detection methods provide more accurate models. Applying the algorithms with every model’s recalculation provides results better by on average 1 exceedance. Our recommendation is to use GARCH or CAViaR model with recalculated window size.
Mateusz Buczyński
Marcin Chlebus
Value at Risk; historical simulation; CAViaR; GARCH; forecast comparison; sample size
2020
A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels
http://d.repec.org/n?u=RePEc:fri:fribow:fribow00513&r=ecm
We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.
Wallimann, Hannes
Imhof, David
Huber, Martin
Bid rigging detection; screening methods; descriptive statistics; machine learning; random forest; lasso; ensemble methods
2020-04-01
The Influence of Hidden Researcher Decisions in Applied Microeconomics
http://d.repec.org/n?u=RePEc:zbw:glodps:537&r=ecm
Researchers make hundreds of decisions about data collection, preparation, and analysis in their research. We use a many-analysts approach to measure the extent and impact of these decisions. Two published causal empirical results are replicated by seven replicators each. We find large differences in data preparation and analysis decisions, many of which would not likely be reported in a publication. No two replicators reported the same sample size. Statistical significance varied across replications, and for one of the studies the effect’s sign varied as well. The standard deviation of estimates across replications was 3-4 times the typical reported standard error.
Huntington-Klein, Nick
Arenas, Andreu
Beam, Emily
Bertoni, Marco
Bloem, Jeffrey R.
Burli, Pralhad
Chen, Naibin
Greico, Paul
Ekpe, Godwin
Pugatch, Todd
Saavedra, Martin
Stopnitzky, Yaniv
Replication,Metascience,Research
2020
Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition
http://d.repec.org/n?u=RePEc:fri:fribow:fribow00514&r=ecm
Causal mediation analysis aims at disentangling a treatment effect into an indirect mechanism operating through an intermediate outcome or mediator, as well as the direct effect of the treatment on the outcome of interest. However, the evaluation of direct and indirect effects is frequently complicated by non-ignorable selection into the treatment and/or mediator, even after controlling for observables, as well as sample selection/outcome attrition. We propose a method for bounding direct and indirect effects in the presence of such complications using a method that is based on a sequence of linear programming problems. Considering inverse probability weighting by propensity scores, we compute the weights that would yield identification in the absence of complications and perturb them by an entropy parameter reflecting a specific amount of propensity score misspecification to set-identify the effects of interest. We apply our method to data from the National Longitudinal Survey of Youth 1979 to derive bounds on the explained and unexplained components of a gender wage gap decomposition that is likely prone to non-ignorable mediator selection and outcome attrition.
Huber, Martin
Laffers, Lukáš
Causal mechanisms; direct effects; indirect effects; causal channels; mediation analysis; sample selection; bounds
2020-05-01
Social Networks with Misclassified or Unobserved Links
http://d.repec.org/n?u=RePEc:boc:bocoec:1004&r=ecm
We identify and estimate social network models when network links are either misclassified or unobserved in the data. First, we derive conditions under which some misclassification of links does not interfere with the consistency or asymptotic properties of standard instrumental variable estimators of social effects. Second, we construct a consistent estimator of social effects in a model where network links are not observed in the data at all. Our method does not require repeated observations of individual network members. We apply our estimator to data from Tennessee's Student/Teacher Achievement Ratio (STAR) Project. Without observing the latent network in each classroom, we identify and estimate peer and contextual effects on students' performance in mathematics. We found that peer effects tend to be larger in bigger classes, and that increasing peer effects would significantly improve students' average test scores.
Arthur Lewbel
Xi Qu
Xun Tang
Social networks, Peer effects, Misclassified links, Missing links, Mismeasured network, Unobserved network, Classroom performance
2019-07-30
Tactics for design and inference in synthetic control studies: An applied example using high-dimensional data
http://d.repec.org/n?u=RePEc:osf:socarx:fc9xt&r=ecm
We describe identification assumptions underlying synthetic control studies and offer recommendations for key---and normally ad hoc---implementation decisions, focusing on model selection; model fit; cross-validation; and decision rules for inference. We outline how to implement a Synthetic Control Using Lasso (SCUL). The method---available as an R package---allows for a high-dimensional donor pool; automates model selection; includes donors from a wide range of variable types; and permits both extrapolation and negative weights. In an application, we employ our recommendations and the SCUL strategy to estimate how recreational marijuana legalization affects sales of alcohol and over-the-counter painkillers, finding reductions in alcohol sales.
Hollingsworth, Alex
Wing, Coady
2020-05-03