nep-ecm 2020-07-20 papers

on Econometrics

Issue of 2020‒07‒20
35 papers chosen by
Sune Karlsson
Örebro universitet

Semiparametric Discrete Choice Models for Bundles By Fu Ouyang; Thomas Tao Yang
Hypothesis tests with a repeatedly singular information matrix By Amengual, Dante; Bei, Xinyue; Sentana, Enrique
Minimax Estimation of Conditional Moment Models By Nishanth Dikkala; Greg Lewis; Lester Mackey; Vasilis Syrgkanis
Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments By Qingliang Fan; Yaqian Wu
Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach By Panos Toulis
On the Exact Statistical Distribution of Econometric Estimators and Test Statistics By Yong Bao; Xiaotian Liu; Aman Ullah
Estimation of High-Dimensional Dynamic Conditional Precision Matrices with an Application to Forecast Combination By Tae-Hwy Lee; Millie Yi Mao; Aman Ullah
Unified Principal Component Analysis for Sparse and Dense Functional Data under Spatial Dependency By Haozhe Zhang; Yehua Li
Approximate Maximum Likelihood for Complex Structural Models By Veronika Czellar; David T. Frazier; Eric Renault
Seemingly Unrelated Regression with Measurement Error: Estimation via Markov chain Monte Carlo and Mean Field Variational Bayes Approximation By Georges Bresson; Anoop Chaturvedi; Mohammad Arshad Rahman; Shalabh
Periodic autoregressive conditional duration By Aknouche, Abdelhakim; Almohaimeed, Bader; Dimitrakopoulos, Stefanos
Valid Causal Inference with (Some) Invalid Instruments By Jason Hartford; Victor Veitch; Dhanya Sridhar; Kevin Leyton-Brown
Sparse Quantile Regression By Le-Yu Chen; Sokbae Lee
Treatment Effects in Interactive Fixed Effects Models By Brantly Callaway; Sonia Karami
Detangling robustness in high dimensions: composite versus model-averaged estimation By Jing Zhou; Gerda Claeskens; Jelena Bradic
Heteroskedastic Proxy Vector Autoregressions By Helmut Lütkepohl; Thore Schlaak
Improved Average Estimation in Seemingly Unrelated Regressions By Ali Mehrabani; Aman Ullah
Synthetic Interventions By Anish Agarwal; Abdullah Alomar; Romain Cosson; Devavrat Shah; Dennis Shen
Sketching for Two-Stage Least Squares Estimation By Sokbae Lee; Serena Ng
Robust and Efficient Approximate Bayesian Computation: A Minimum Distance Approach By David T. Frazier
Hidden Markov Models Applied To Intraday Momentum Trading With Side Information By Hugh Christensen; Simon Godsill; Richard Turner
Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures By Wei Wang; Huifu Xu; Tiejun Ma
A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity By Candelaria, Luis E.
Horseshoe Prior Bayesian Quantile Regression By David Kohns; Tibor Szendrei
Cointegration in large VARs By Anna Bykhovskaya; Vadim Gorin
Identification of intertemporal preferences in history-dependent dynamic discrete choice models By Levy, Matthew; Schiraldi, Pasquale
Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes By Susan Athey; Raj Chetty; Guido Imbens
Flexible Mixture Priors for Time-varying Parameter Models By Niko Hauzenberger
Time series copula models using d-vines and v-transforms: an alternative to GARCH modelling By Martin Bladt; Alexander J. McNeil
Does the choice of balance-measure matter under Genetic Matching? By Adeola Oyenubi; Martin Wittenberg
Identification and Formal Privacy Guarantees By Tatiana Komarova; Denis Nekipelov
Biases in Long-Horizon Predictive Regressions By Jacob Boudoukh; Ronen Israel; Matthew P. Richardson
Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model By Cem Cakmakli; Yasin Simsek
Unified Theory for the Large Family of Time Varying Models with Arma Representations: One Solution Fits All. By Karanasos, Menelaos; Paraskevopoulos,Alexandros; Canepa, Alessandra
Proper scoring rules for evaluating asymmetry in density forecasting By Matteo Iacopini; Francesco Ravazzolo; Luca Rossini

Semiparametric Discrete Choice Models for Bundles

By:	Fu Ouyang (School of Economics, University of Queensland); Thomas Tao Yang (Australian National University)
Abstract:	We propose new identi cation and estimation approaches to semiparametric discrete choice models for bundles in both cross-sectional and panel data settings. The random utility functions of these models take the usual parametric form, while no distributional assumption is imposed on the stochastic disturbances. Our proposed methods permit certain forms of heteroskedasticity and arbitrary correlation in the disturbances across choices. Our identi cation approach is matching-based; it matches observed covariates across agents for the cross-sectional case, and over time for the panel data case. For the cross-sectional model, we propose a kernel-weighted rank procedure and establish N-asymptotic normality of the resulting estimators. We show the validity of the nonparametric bootstrap for the inference. For the panel data model, we propose localized maximum score type estimators which have a non-standard asymptotic distribution. We show that the numerical bootstrap developed by Hong and Li (2020) is a valid inference method for our panel data estimators. Monte Carlo experiments demonstrate that our proposed estimation and inference procedures perform adequately in nite samples.
Keywords:	Bundle choices; rank estimation; panel data; bootstrap.
JEL:	C13 C14 C35
Date:	2020–06–12
URL:	http://d.repec.org/n?u=RePEc:qld:uq2004:625&r=all

Hypothesis tests with a repeatedly singular information matrix

By:	Amengual, Dante; Bei, Xinyue; Sentana, Enrique
Abstract:	We study score-type tests in likelihood contexts in which the nullity of the information matrix under the null is larger than one, thereby generalizing earlier results in the literature. Examples include multivariate skew normal distributions, Hermite expansions of Gaussian copulas, purely non-linear predictive regressions, multiplicative seasonal time series models and multivariate regression models with selectivity. Our proposal, which involves higher order derivatives, is asymptotically equivalent to the likelihood ratio but only requires estimation under the null. We conduct extensive Monte Carlo exercises that study the finite sample size and power properties of our proposal and compare it to alternative approaches.
Keywords:	Generalized extremum tests; Higher-order identifiability; Likelihood ratio test; Non-Gaussian copulas; Predictive regressions; Skew normal distributions
JEL:	C12 C22 C34 C46 C58
Date:	2020–02
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:14415&r=all

Minimax Estimation of Conditional Moment Models

By:	Nishanth Dikkala; Greg Lewis; Lester Mackey; Vasilis Syrgkanis
Abstract:	We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic first-order heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the ill-posedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.07201&r=all

Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments

By:	Qingliang Fan; Yaqian Wu
Abstract:	Instrumental variables (IV) regression is a popular method for the estimation of the endogenous treatment effects. Conventional IV methods require all the instruments are relevant and valid. However, this is impractical especially in high-dimensional models when we consider a large set of candidate IVs. In this paper, we propose an IV estimator robust to the existence of both the invalid and irrelevant instruments (called R2IVE) for the estimation of endogenous treatment effects. This paper extends the scope of Kang et al. (2016) by considering a true high-dimensional IV model and a nonparametric reduced form equation. It is shown that our procedure can select the relevant and valid instruments consistently and the proposed R2IVE is root-n consistent and asymptotically normal. Monte Carlo simulations demonstrate that the R2IVE performs favorably compared to the existing high-dimensional IV estimators (such as, NAIVE (Fan and Zhong, 2018) and sisVIVE (Kang et al., 2016)) when invalid instruments exist. In the empirical study, we revisit the classic question of trade and growth (Frankel and Romer, 1999).
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.14998&r=all

Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach

By:	Panos Toulis
Abstract:	We propose a partial identification method for estimating disease prevalence from serology studies. Our data are results from antibody tests in some population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The proposed method is conservative for marginal inference, in general, but its key advantage over more standard approaches is that it is valid in finite samples even when the underlying model is not point identified. Moreover, our method requires only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent Covid-19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%-2% (at the time of study), and are therefore inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the actual false positive rate of the antibody test was indeed near its empirical estimate (~0.5%). In another study from New York state, Covid-19 prevalence is confidently estimated in the range 13%-17% in mid-April of 2020, which also suggests significant geographic variation in Covid-19 exposure across the US. Combining all datasets yields a 5%-8% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.16214&r=all

On the Exact Statistical Distribution of Econometric Estimators and Test Statistics

By:	Yong Bao (Purdue University); Xiaotian Liu (Purdue University); Aman Ullah (Department of Economics, University of California Riverside)
Abstract:	Barry Arnold has made many fundamental and innovative contributions in different areas of statistics and econometrics, including estimation and inference,distribution theory, Bayesian inference, order statistics, income inequality measures, and characterization problems. His extensive work in the area of distribution theory include studies on income distributions and Lorenz curves, the exact sampling distribution theory of test statistics, and the characterization of distributions. In our paper here we consider the problem of developing exact sampling distributions of various econometric and statistical estimators and test statistics. The motivation stems from the fact that inference procedures based on the asymptotic distributions may provide misleading results if the sample size is small or moderately large. In view of this we develop a unified procedure by first observing that a large number of econometric and statistical estimators can be written as ratios of quadratic forms. Their distributions can then be straightforwardly analyzed by using Imhofâ€™s (1961) method. We show the applications of this procedure to develop the distribution of some commonly used statistics in applied work. The exact results developed will be helpful for practitioners to conduct appropriate inference for any given size of the sample data.
Keywords:	Exact Distribution, Sharp Ratio, Coeficcient of Variation, Durbi-Watson Test, Moran Test, Imhof Distribution ,R-square
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:ucr:wpaper:202014&r=all

Estimation of High-Dimensional Dynamic Conditional Precision Matrices with an Application to Forecast Combination

By:	Tae-Hwy Lee (Department of Economics, University of California Riverside); Millie Yi Mao (Azusa Pacific University); Aman Ullah (University of California, Riverside)
Abstract:	The estimation of a large covariance matrix is challenging when the dimension p is large relative to the sample size n. Common approaches to deal with the challenge have been based on thresholding or shrinkage methods in estimating covariance matrices. However, in many applications (e.g., regression, forecast combination, portfolio selection), what we need is not the covariance matrix but its inverse (the precision matrix). In this paper we introduce a method of estimating the high-dimensional "dynamic conditional precision" (DCP) matrices. The proposed DCP algorithm is based on the estimator of a large unconditional precision matrix by Fan and Lv (2016) to deal with the high-dimension and the dynamic conditional correlation (DCC) model by Engle (2002) to embed a dynamic structure to the conditional precision matrix. The simulation results show that the DCP method performs substantially better than the methods of estimating covariance matrices based on thresholding or shrinkage methods. Finally, inspired by Hsiao and Wan (2014), we examine the "forecast combination puzzle" using the DCP, thresholding, and shrinkage methods.
Keywords:	High-dimensional conditional precision matrix, ISEE, DCP, Forecast combination puzzle.
JEL:	C3 C4 C5
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:ucr:wpaper:202012&r=all

Unified Principal Component Analysis for Sparse and Dense Functional Data under Spatial Dependency

By:	Haozhe Zhang; Yehua Li
Abstract:	We consider spatially dependent functional data collected under a geostatistics setting, where spatial locations are irregular and random. The functional response is the sum of a spatially dependent functional effect and a spatially independent functional nugget effect. Observations on each function are made on discrete time points and contaminated with measurement errors. Under the assumption of spatial stationarity and isotropy, we propose a tensor product spline estimator for the spatio-temporal covariance function. When a coregionalization covariance structure is further assumed, we propose a new functional principal component analysis method that borrows information from neighboring functions. The proposed method also generates nonparametric estimators for the spatial covariance functions, which can be used for functional kriging. Under a unified framework for sparse and dense functional data, infill and increasing domain asymptotic paradigms, we develop the asymptotic convergence rates for the proposed estimators. Advantages of the proposed approach are demonstrated through simulation studies and two real data applications representing sparse and dense functional data, respectively.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.13489&r=all

Approximate Maximum Likelihood for Complex Structural Models

By:	Veronika Czellar; David T. Frazier; Eric Renault
Abstract:	Indirect Inference (I-I) is a popular technique for estimating complex parametric models whose likelihood function is intractable, however, the statistical efficiency of I-I estimation is questionable. While the efficient method of moments, Gallant and Tauchen (1996), promises efficiency, the price to pay for this efficiency is a loss of parsimony and thereby a potential lack of robustness to model misspecification. This stands in contrast to simpler I-I estimation strategies, which are known to display less sensitivity to model misspecification precisely due to their focus on specific elements of the underlying structural model. In this research, we propose a new simulation-based approach that maintains the parsimony of I-I estimation, which is often critical in empirical applications, but can also deliver estimators that are nearly as efficient as maximum likelihood. This new approach is based on using a constrained approximation to the structural model, which ensures identification and can deliver estimators that are nearly efficient. We demonstrate this approach through several examples, and show that this approach can deliver estimators that are nearly as efficient as maximum likelihood, when feasible, but can be employed in many situations where maximum likelihood is infeasible.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.10245&r=all

By:	Georges Bresson; Anoop Chaturvedi; Mohammad Arshad Rahman; Shalabh
Abstract:	Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multi-equation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates arising due to different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.07074&r=all

Periodic autoregressive conditional duration

By:	Aknouche, Abdelhakim; Almohaimeed, Bader; Dimitrakopoulos, Stefanos
Abstract:	We propose an autoregressive conditional duration (ACD) model with periodic time-varying parameters and multiplicative error form. We name this model periodic autoregressive conditional duration (PACD). First, we study the stability properties and the moment structures of it. Second, we estimate the model parameters, using (profile and two-stage) Gamma quasi-maximum likelihood estimates (QMLEs), the asymptotic properties of which are examined under general regularity conditions. Our estimation method encompasses the exponential QMLE, as a particular case. The proposed methodology is illustrated with simulated data and two empirical applications on forecasting Bitcoin trading volume and realized volatility. We found that the PACD produces better in-sample and out-of-sample forecasts than the standard ACD.
Keywords:	Positive time series, autoregressive conditional duration, periodic time-varying models, multiplicative error models, exponential QMLE, two-stage Gamma QMLE.
JEL:	C13 C18 C4 C41 C5 C51 C58
Date:	2020–07–08
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:101696&r=all

Valid Causal Inference with (Some) Invalid Instruments

By:	Jason Hartford; Victor Veitch; Dhanya Sridhar; Kevin Leyton-Brown
Abstract:	Instrumental variable methods provide a powerful approach to estimating causal effects in the presence of unobserved confounding. But a key challenge when applying them is the reliance on untestable "exclusion" assumptions that rule out any relationship between the instrument variable and the response that is not mediated by the treatment. In this paper, we show how to perform consistent IV estimation despite violations of the exclusion assumption. In particular, we show that when one has multiple candidate instruments, only a majority of these candidates---or, more generally, the modal candidate-response relationship---needs to be valid to estimate the causal effect. Our approach uses an estimate of the modal prediction from an ensemble of instrumental variable estimators. The technique is simple to apply and is "black-box" in the sense that it may be used with any instrumental variable estimator as long as the treatment effect is identified for each valid instrument independently. As such, it is compatible with recent machine-learning based estimators that allow for the estimation of conditional average treatment effects (CATE) on complex, high dimensional data. Experimentally, we achieve accurate estimates of conditional average treatment effects using an ensemble of deep network-based estimators, including on a challenging simulated Mendelian Randomization problem.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.11386&r=all

Sparse Quantile Regression

By:	Le-Yu Chen; Sokbae Lee
Abstract:	We consider both $\ell _{0}$-penalized and $\ell _{0}$-constrained quantile regression estimators. For the $\ell _{0}$-penalized estimator, we derive an exponential inequality on the tail probability of excess quantile prediction risk and apply it to obtain non-asymptotic upper bounds on the mean-square parameter and regression function estimation errors. We also derive analogous results for the $\ell _{0}$-constrained estimator. The resulting rates of convergence are minimax-optimal and the same as those for $\ell _{1} $-penalized estimators. Further, we characterize expected Hamming loss for the $\ell _{0}$-penalized estimator. We implement the proposed procedure via mixed integer linear programming and also a more scalable first-order approximation algorithm. We illustrate the finite-sample performance of our approach in Monte Carlo experiments and its usefulness in a real data application concerning conformal prediction of infant birth weights (with $% n\approx 10^{3}$ and up to $p>10^{3}$). In sum, our $\ell _{0}$-based method produces a much sparser estimator than the $\ell _{1}$-penalized approach without compromising precision.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.11201&r=all

Treatment Effects in Interactive Fixed Effects Models

By:	Brantly Callaway; Sonia Karami
Abstract:	This paper considers identifying and estimating the Average Treatment Effect on the Treated (ATT) in interactive fixed effects models. We focus on the case where there is a single unobserved time-invariant variable whose effect is allowed to change over time, though we also allow for time fixed effects and unobserved individual-level heterogeneity. The models that we consider in this paper generalize many commonly used models in the treatment effects literature including difference in differences and individual-specific linear trend models. Unlike the majority of the literature on interactive fixed effects models, we do not require the number of time periods to go to infinity to consistently estimate the ATT. Our main identification result relies on having the effect of some time invariant covariate (e.g., race or sex) not vary over time. Using our approach, we show that the ATT can be identified with as few as three time periods and with panel or repeated cross sections data.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.15780&r=all

Detangling robustness in high dimensions: composite versus model-averaged estimation

By:	Jing Zhou; Gerda Claeskens; Jelena Bradic
Abstract:	Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted model-averaged as well as composite $l_1$-regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that model-averaged and composite quantile estimators often outperform least-squares methods, even in the case of Gaussian model noise. Real data application witnesses the method's practical use through the reconstruction of compressed audio signals.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.07457&r=all

Heteroskedastic Proxy Vector Autoregressions

By:	Helmut Lütkepohl; Thore Schlaak
Abstract:	In proxy vector autoregressive models, the structural shocks of interest are identified by an instrument. Although heteroskedasticity is occasionally allowed for, it is typically taken for granted that the impact effects of the structural shocks are time-invariant despite the change in their variances. We develop a test for this implicit assumption and present evidence that the assumption of time-invariant impact effects may be violated in previously used empirical models.
Keywords:	Structural vector autoregression, proxy VAR, identification through heteroskedasticity
JEL:	C32
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:diw:diwwpp:dp1876&r=all

Improved Average Estimation in Seemingly Unrelated Regressions

By:	Ali Mehrabani (UCR); Aman Ullah (Department of Economics, University of California Riverside)
Abstract:	In this paper, we propose an efficient weighted average estimator in Seemingly Unrelated Regressions. This average estimator shrinks a generalized least squares (GLS) estimator towards a restricted GLS estimator, where the restrictions represent possible parameter homogeneity specifications. The shrinkage weight is inversely proportional to a weighted quadratic loss function. The approximate bias and second moment matrix of the average estimator using the large-sample approximations are provided. We give the conditions under which the average estimator dominates the GLS estimator on the basis of their mean squared errors. We illustrate our estimator by applying it to a cost system for U.S. Commercial banks, over the period from 2000 to 2018. Our results indicate that on average most of the banks have been operating under increasing returns to scale. We find that over the recent years, scale economies are a plausible reason for the growth in average size of banks and the tendency toward increasing scale is likely to continue.
Keywords:	Key Words: Stein-type Shrinkage Estimator; Asymptotic Approximations; SUR; GLS
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:ucr:wpaper:202013&r=all

Synthetic Interventions

By:	Anish Agarwal; Abdullah Alomar; Romain Cosson; Devavrat Shah; Dennis Shen
Abstract:	We develop a method to help quantify the impact different levels of mobility restrictions could have had on COVID-19 related deaths across nations. Synthetic control (SC) has emerged as a standard tool in such scenarios to produce counterfactual estimates if a particular intervention had not occurred, using just observational data. However, it remains an important open problem of how to extend SC to obtain counterfactual estimates if a particular intervention had occurred - this is exactly the question of the impact of mobility restrictions stated above. As our main contribution, we introduce synthetic interventions (SI), which helps resolve this open problem by allowing one to produce counterfactual estimates if there are multiple interventions of interest. We prove SI produces consistent counterfactual estimates under a tensor factor model. Our finite sample analysis shows the test error decays as $1/T_0$, where $T_0$ is the amount of observed pre-intervention data. As a special case, this improves upon the $1/\sqrt{T_0}$ bound on test error for SC in prior works. Our test error bound holds under a certain "subspace inclusion" condition; we furnish a data-driven hypothesis test with provable guarantees to check for this condition. This also provides a quantitative hypothesis test for when to use SC, currently absent in the literature. Technically, we establish the parameter estimation and test error for Principal Component Regression (a key subroutine in SI and several SC variants) under the setting of error-in-variable regression decays as $1/T_0$, where $T_0$ is the number of samples observed; this improves the best prior test error bound of $1/\sqrt{T_0}$. In addition to the COVID-19 case study, we show how SI can be used to run data-efficient, personalized randomized control trials using real data from a large e-commerce website and a large developmental economics study.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.07691&r=all

Sketching for Two-Stage Least Squares Estimation

By:	Sokbae Lee; Serena Ng
Abstract:	When there is so much data that they become a computation burden, it is not uncommon to compute quantities of interest using a sketch of data of size $m$ instead of the full sample of size $n$. This paper investigates the implications for two-stage least squares (2SLS) estimation when the sketches are obtained by a computationally efficient method known as CountSketch. We obtain three results. First, we establish conditions under which given the full sample, a sketched 2SLS estimate can be arbitrarily close to the full-sample 2SLS estimate with high probability. Second, we give conditions under which the sketched 2SLS estimator converges in probability to the true parameter at a rate of $m^{-1/2}$ and is asymptotically normal. Third, we show that the asymptotic variance can be consistently estimated using the sketched sample and suggest methods for determining an inference-conscious sketch size $m$. The sketched 2SLS estimator is used to estimate returns to education.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2007.07781&r=all

Robust and Efficient Approximate Bayesian Computation: A Minimum Distance Approach

By:	David T. Frazier
Abstract:	In many instances, the application of approximate Bayesian methods is hampered by two practical features: 1) the requirement to project the data down to low-dimensional summary, including the choice of this projection, which ultimately yields inefficient inference; 2) a possible lack of robustness to deviations from the underlying model structure. Motivated by these efficiency and robustness concerns, we construct a new Bayesian method that can deliver efficient estimators when the underlying model is well-specified, and which is simultaneously robust to certain forms of model misspecification. This new approach bypasses the calculation of summaries by considering a norm between empirical and simulated probability measures. For specific choices of the norm, we demonstrate that this approach can deliver point estimators that are as efficient as those obtained using exact Bayesian inference, while also simultaneously displaying robustness to deviations from the underlying model assumptions.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.14126&r=all

Hidden Markov Models Applied To Intraday Momentum Trading With Side Information

By:	Hugh Christensen; Simon Godsill; Richard Turner
Abstract:	A Hidden Markov Model for intraday momentum trading is presented which specifies a latent momentum state responsible for generating the observed securities' noisy returns. Existing momentum trading models suffer from time-lagging caused by the delayed frequency response of digital filters. Time-lagging results in a momentum signal of the wrong sign, when the market changes trend direction. A key feature of this state space formulation, is no such lagging occurs, allowing for accurate shifts in signal sign at market change points. The number of latent states in the model is estimated using three techniques, cross validation, penalized likelihood criteria and simulation-based model selection for the marginal likelihood. All three techniques suggest either 2 or 3 hidden states. Model parameters are then found using Baum-Welch and Markov Chain Monte Carlo, whilst assuming a single (discretized) univariate Gaussian distribution for the emission matrix. Often a momentum trader will want to condition their trading signals on additional information. To reflect this, learning is also carried out in the presence of side information. Two sets of side information are considered, namely a ratio of realized volatilities and intraday seasonality. It is shown that splines can be used to capture statistically significant relationships from this information, allowing returns to be predicted. An Input Output Hidden Markov Model is used to incorporate these univariate predictive signals into the transition matrix, presenting a possible solution for dealing with the signal combination problem. Bayesian inference is then carried out to predict the securities $t+1$ return using the forward algorithm. Simple modifications to the current framework allow for a fully non-parametric model with asynchronous prediction.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.08307&r=all

Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures

By:	Wei Wang; Huifu Xu; Tiejun Ma
Abstract:	When estimating the risk of a financial position with empirical data or Monte Carlo simulations via a tail-dependent law invariant risk measure such as the Conditional Value-at-Risk (CVaR), it is important to ensure the robustness of the statistical estimator particularly when the data contain noise. Kratscher et al. [1] propose a new framework to examine the qualitative robustness of estimators for tail-dependent law invariant risk measures on Orlicz spaces, which is a step further from earlier work for studying the robustness of risk measurement procedures by Cont et al. [2]. In this paper, we follow the stream of research to propose a quantitative approach for verifying the statistical robustness of tail-dependent law invariant risk measures. A distinct feature of our approach is that we use the Fortet-Mourier metric to quantify the variation of the true underlying probability measure in the analysis of the discrepancy between the laws of the plug-in estimators of law invariant risk measure based on the true data and perturbed data, which enables us to derive an explicit error bound for the discrepancy when the risk functional is Lipschitz continuous with respect to a class of admissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us to examine the degree of robustness for tail-dependent risk measures. Finally, we apply our quantitative approach to some well-known risk measures to illustrate our theory.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.15491&r=all

A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity

By:	Candelaria, Luis E. (University of Warwick)
Abstract:	This paper analyzes a semiparametric model of network formation in the presence of unobserved agent-speciﬁc heterogeneity. The objective is to identify and estimate the preference parameters associated with homophily on observed attributes when the distributions of the unobserved factors are not parametrically speciﬁed. This paper oﬀers two main contributions to the literature on network formation. First, it establishes a new point identiﬁcation result for the vector of parameters that relies on the existence of a special regressor. The identiﬁcation proof is constructive and characterizes a closed-form for the parameter of interest. Second, it introduces a simple two-step semiparametric estimator for the vector of parameters with a ﬁrst-step kernel estimator. The estimator is computationally tractable and can be applied to both dense and sparse networks. Moreover, I show that the estimator is consistent and has a limiting normal distribution as the number of individuals in the network increases. Monte Carlo experiments demonstrate that the estimator performs well in ﬁnite samples and in networks with diﬀerent levels of sparsity.
Keywords:	Network formation ; Unobserved heterogeneity ; Semiparametrics ; Special regressor ; Inverse weighting
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:wrk:warwec:1279&r=all

Horseshoe Prior Bayesian Quantile Regression

By:	David Kohns; Tibor Szendrei
Abstract:	This paper extends the horseshoe prior of Carvalho et al. (2010) to the Bayesian quantile regression (HS-BQR) and provides a fast sampling algorithm that speeds up computation significantly in high dimensions. The performance of the HS-BQR is tested on large scale Monte Carlo simulations and an empirical application relevant to macroeoncomics. The Monte Carlo design considers several sparsity structures (sparse, dense, block) and error structures (i.i.d. errors and heteroskedastic errors). A number of LASSO based estimators (frequentist and Bayesian) are pitted against the HS-BQR to better gauge the performance of the method on the different designs. The HS-BQR yields just as good, or better performance than the other estimators considered when evaluated using coefficient bias and forecast error. We find that the HS-BQR is particularly potent in sparse designs and when estimating extreme quantiles. The simulations also highlight how the high dimensional quantile estimators fail to correctly identify the quantile function of the variables when both location and scale effects are present. In the empirical application, in which we evaluate forecast densities of US inflation, the HS-BQR provides well calibrated forecast densities whose individual quantiles, have the highest pseudo R squared, highlighting its potential for Value-at-Risk estimation.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.07655&r=all

Cointegration in large VARs

By:	Anna Bykhovskaya; Vadim Gorin
Abstract:	The paper analyses cointegration in vector autoregressive processes (VARs) for the cases when both the number of coordinates, $N$, and the number of time periods, $T$, are large and of the same order. We propose a way to examine a VAR for the presence of cointegration based on a modification of the Johansen likelihood ratio test. The advantage of our procedure over the original Johansen test and its finite sample corrections is that our test does not suffer from over-rejection. This is achieved through novel asymptotic theorems for eigenvalues of matrices in the test statistic in the regime of proportionally growing $N$ and $T$. Our theoretical findings are supported by Monte Carlo simulations and an empirical illustration. Moreover, we find a surprising connection with multivariate analysis of variance (MANOVA) and explain why it emerges.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.14179&r=all

Identification of intertemporal preferences in history-dependent dynamic discrete choice models

By:	Levy, Matthew; Schiraldi, Pasquale
Abstract:	We study the identification of intertemporal preferences in a stationary dynamic discrete decision model. We propose a new approach which focuses on problems which are intrinsically dynamic: either there is endogenous variation in the choice set, or preferences depend directly on the history. History dependence links the choices of the decision-maker across periods in a more fundamental sense standard dynamic discrete choice models typically assume. We consider both exponential discounting as well as the quasi-hyperbolic discounting models of time preferences. We show that if the utility function or the choice set depends on the current states as well as the past choices and/or states, then time preferences are non-parametrically point-identified separately from the utility function under mild conditions on the data and we may also recover the instantaneous utility function without imposing any normalization on the utility across states.
Keywords:	dynamic discrete choice; identification; quasi-hyperbolic discounting; Time preferences
Date:	2020–02
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:14447&r=all

Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes

By:	Susan Athey; Raj Chetty; Guido Imbens
Abstract:	There has been an increase in interest in experimental evaluations to estimate causal effects, partly because their internal validity tends to be high. At the same time, as part of the big data revolution, large, detailed, and representative, administrative data sets have become more widely available. However, the credibility of estimates of causal effects based on such data sets alone can be low. In this paper, we develop statistical methods for systematically combining experimental and observational data to obtain credible estimates of the causal effect of a binary treatment on a primary outcome that we only observe in the observational sample. Both the observational and experimental samples contain data about a treatment, observable individual characteristics, and a secondary (often short term) outcome. To estimate the effect of a treatment on the primary outcome while addressing the potential confounding in the observational sample, we propose a method that makes use of estimates of the relationship between the treatment and the secondary outcome from the experimental sample. If assignment to the treatment in the observational sample were unconfounded, we would expect the treatment effects on the secondary outcome in the two samples to be similar. We interpret differences in the estimated causal effects on the secondary outcome between the two samples as evidence of unobserved confounders in the observational sample, and develop control function methods for using those differences to adjust the estimates of the treatment effects on the primary outcome. We illustrate these ideas by combining data on class size and third grade test scores from the Project STAR experiment with observational data on class size and both third and eighth grade test scores from the New York school system.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.09676&r=all

Flexible Mixture Priors for Time-varying Parameter Models

By:	Niko Hauzenberger
Abstract:	Time-varying parameter (TVP) models often assume that the TVPs evolve according to a random walk. This assumption, however, might be questionable since it implies that coefficients change smoothly and in an unbounded manner. In this paper, we relax this assumption by proposing a flexible law of motion for the TVPs in large-scale vector autoregressions (VARs). Instead of imposing a restrictive random walk evolution of the latent states, we carefully design hierarchical mixture priors on the coefficients in the state equation. These priors effectively allow for discriminating between periods where coefficients evolve according to a random walk and times where the TVPs are better characterized by a stationary stochastic process. Moreover, this approach is capable of introducing dynamic sparsity by pushing small parameter changes towards zero if necessary. The merits of the model are illustrated by means of two applications. Using synthetic data we show that our approach yields precise parameter estimates. When applied to US data, the model reveals interesting patterns of low-frequency dynamics in coefficients and forecasts well relative to a wide range of competing models.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.10088&r=all

Time series copula models using d-vines and v-transforms: an alternative to GARCH modelling

By:	Martin Bladt; Alexander J. McNeil
Abstract:	An approach to modelling volatile financial return series using d-vine copulas combined with uniformity preserving transformations known as v-transforms is proposed. By generalizing the concept of stochastic inversion of v-transforms, models are obtained that can describe both stochastic volatility in the magnitude of price movements and serial correlation in their directions. In combination with parametric marginal distributions it is shown that these models can rival and sometimes outperform well-known models in the extended GARCH family.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.11088&r=all

Does the choice of balance-measure matter under Genetic Matching?

By:	Adeola Oyenubi; Martin Wittenberg
Abstract:	In applied studies, the influence of balance measures on the performance of matching estimators is often taken for granted. This paper considers the performance of different balance measures that have been used in the literature when balance is being optimized. We also propose the use of the entropy measure in assessing balance. To examine the effect of balance measures, we conduct a simulation study where we optimize balance using Genetic Algorithm (GenMatch).We found that balance measures do influence matching estimates under the GenMatch algorithm. The bias and Root Mean Square Error (RMSE) of the estimated treatment effect vary with the choice of balance measure. In the artificial Data Generating Process (DGP) with one covariate considered in this study, the proposed entropy balance measure has the lowest RMSE.The implication of these results is that sensitivity of matching estimates to the choice of balance measure should be given greater attention in empirical studies.
Keywords:	Genetic matching, balance measures, Information Theory, entropy metric
JEL:	I38 H53 C21 D13
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:rza:wpaper:819&r=all

Identification and Formal Privacy Guarantees

By:	Tatiana Komarova; Denis Nekipelov
Abstract:	Empirical economic research crucially relies on highly sensitive individual datasets. At the same time, increasing availability of public individual-level data makes it possible for adversaries to potentially de-identify anonymized records in sensitive research datasets. This increasing disclosure risk has incentivised large data curators, most notably the US Census bureau and several large companies including Apple, Facebook and Microsoft to look for algorithmic solutions to provide formal non-disclosure guarantees for their secure data. The most commonly accepted formal data security concept in the Computer Science community is differential privacy. It restricts the interaction of researchers with the data by allowing them to issue queries to the data. The differential privacy mechanism then replaces the actual outcome of the query with a randomised outcome. While differential privacy does provide formal data security guarantees, its impact on the identification of empirical economic models and on the performance of estimators in those models has not been sufficiently studied. Since privacy protection mechanisms are inherently finite-sample procedures, we define the notion of identifiability of the parameter of interest as a property of the limit of experiments. It is linked to the asymptotic behavior in measure of differentially private estimators. We demonstrate that particular instances of regression discontinuity design and average treatment effect may be problematic for inference with differential privacy because their estimators can only be ensured to converge weakly with their asymptotic limit remaining random and, thus, may not be estimated consistently. This result is clearly supported by our simulation evidence. Our analysis suggests that many other estimators that rely on nuisance parameters may have similar properties with the requirement of differential privacy.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.14732&r=all

Biases in Long-Horizon Predictive Regressions

By:	Jacob Boudoukh; Ronen Israel; Matthew P. Richardson
Abstract:	Analogous to Stambaugh (1999), this paper derives the small sample bias of estimators in J-horizon predictive regressions, providing a plug-in adjustment for these estimators. A number of surprising results emerge, including (i) a higher bias for overlapping than nonoverlapping regressions despite the greater number of observations, and (ii) particularly higher bias for an alternative long-horizon predictive regression commonly advocated for in the literature. For large J, the bias is linear in (J/T) with a slope that depends on the predictive variable’s persistence. The bias adjustment substantially reduces the existing magnitude of long-horizon estimates of predictability.
JEL:	C01 C1 C22 C53 C58 G12 G17
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:27410&r=all

Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model

By:	Cem Cakmakli (Koç University); Yasin Simsek (Koç University)
Abstract:	This paper extends the canonical model of epidemiology, SIRD model, to allow for time varying parameters for real-time measurement of the stance of the COVID-19 pandemic. Time variation in model parameters is captured using the generalized autoregressive score modelling structure designed for the typically daily count data related to pandemic. The resulting specification permits a flexible yet parsimonious model structure with a very low computational cost. This is especially crucial at the onset of the pandemic when the data is scarce and the uncertainty is abundant. Full sample results show that countries including US, Brazil and Russia are still not able to contain the pandemic with the US having the worst performance. Furthermore, Iran and South Korea are likely to experience the second wave of the pandemic. A real-time exercise show that the proposed structure delivers timely and precise information on the current stance of the pandemic ahead of the competitors that use rolling window. This, in turn, transforms into accurate short-term predictions of the active cases. We further modify the model to allow for unreported cases. Results suggest that the effects of the presence of these cases on the estimation results diminish towards the end of sample with the increasing number of testing.
Keywords:	COVID-19, SIRD, Observation driven models, Score models, Count data, Time varying parameters.
JEL:	C13 C32 C51 I19
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:koc:wpaper:2013&r=all

Unified Theory for the Large Family of Time Varying Models with Arma Representations: One Solution Fits All.

By:	Karanasos, Menelaos; Paraskevopoulos,Alexandros; Canepa, Alessandra (University of Turin)
Abstract:	For the large family of ARMA models with variable coeffcients we obtain an explicit and computationally tractable solution that generates all their fundamental properties, including theWold-Cramer decomposition and their covariance structure, thus unifying the invertibility conditions which guarantee both their asymptotic stability and main properties. The one sided Green's function, associated with the homogeneous solution, is expressed as a banded Hessenbergian formulated exclusively in terms of the autoregressive parameters of the model. The proposed methodology allows for a unified treatment of these `time varying' systems. We also illustrate mathematically one of the focal points in Hallin's (1986) analysis. Namely, that in a time varying setting the backward asymptotic effciency is different from the forward one. Equally important it is shown how the linear algebra techniques, used to obtain the general solution, are equivalent to a simple procedure for manipulating polynomials with variable coeffcients. The practical significance of the suggested approach is illustrated with an application to U.S. in ation data. The main finding is that in ation persistence increased after 1976, whereas from 1986 onwards the persistence reduces and stabilizes to even lower levels than the pre-1976 period.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:uto:dipeco:202008&r=all

Proper scoring rules for evaluating asymmetry in density forecasting

By:	Matteo Iacopini; Francesco Ravazzolo; Luca Rossini
Abstract:	This paper proposes a novel asymmetric continuous probabilistic score (ACPS) for evaluating and comparing density forecasts. It extends the proposed score and defines a weighted version, which emphasizes regions of interest, such as the tails or the center of a variable's range. The ACPS is of general use in any situation where the decision maker has asymmetric preferences in the evaluation of the forecasts. In an artificial experiment, the implications of varying the level of asymmetry are illustrated. Then, the proposed score is applied to assess and compare density forecasts of macroeconomic relevant datasets (unemployment rate) and of commodity prices (oil and electricity prices) with a particular focus on the recent COVID crisis period.
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2006.11265&r=all

This nep-ecm issue is ©2020 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.