
on Econometrics 
By:  Fu Ouyang (School of Economics, University of Queensland); Thomas Tao Yang (Australian National University) 
Abstract:  We propose new identi cation and estimation approaches to semiparametric discrete choice models for bundles in both crosssectional and panel data settings. The random utility functions of these models take the usual parametric form, while no distributional assumption is imposed on the stochastic disturbances. Our proposed methods permit certain forms of heteroskedasticity and arbitrary correlation in the disturbances across choices. Our identi cation approach is matchingbased; it matches observed covariates across agents for the crosssectional case, and over time for the panel data case. For the crosssectional model, we propose a kernelweighted rank procedure and establish Nasymptotic normality of the resulting estimators. We show the validity of the nonparametric bootstrap for the inference. For the panel data model, we propose localized maximum score type estimators which have a nonstandard asymptotic distribution. We show that the numerical bootstrap developed by Hong and Li (2020) is a valid inference method for our panel data estimators. Monte Carlo experiments demonstrate that our proposed estimation and inference procedures perform adequately in nite samples. 
Keywords:  Bundle choices; rank estimation; panel data; bootstrap. 
JEL:  C13 C14 C35 
Date:  2020–06–12 
URL:  http://d.repec.org/n?u=RePEc:qld:uq2004:625&r=all 
By:  Amengual, Dante; Bei, Xinyue; Sentana, Enrique 
Abstract:  We study scoretype tests in likelihood contexts in which the nullity of the information matrix under the null is larger than one, thereby generalizing earlier results in the literature. Examples include multivariate skew normal distributions, Hermite expansions of Gaussian copulas, purely nonlinear predictive regressions, multiplicative seasonal time series models and multivariate regression models with selectivity. Our proposal, which involves higher order derivatives, is asymptotically equivalent to the likelihood ratio but only requires estimation under the null. We conduct extensive Monte Carlo exercises that study the finite sample size and power properties of our proposal and compare it to alternative approaches. 
Keywords:  Generalized extremum tests; Higherorder identifiability; Likelihood ratio test; NonGaussian copulas; Predictive regressions; Skew normal distributions 
JEL:  C12 C22 C34 C46 C58 
Date:  2020–02 
URL:  http://d.repec.org/n?u=RePEc:cpr:ceprdp:14415&r=all 
By:  Nishanth Dikkala; Greg Lewis; Lester Mackey; Vasilis Syrgkanis 
Abstract:  We develop an approach for estimating models described via conditional moment restrictions, with a prototypical application being nonparametric instrumental variable regression. We introduce a minmax criterion function, under which the estimation problem can be thought of as solving a zerosum game between a modeler who is optimizing over the hypothesis space of the target model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for illposed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem (e.g. stochastic firstorder heuristics for neural networks). In several applications, we show how our modified mean squared error rate, combined with conditions that bound the illposedness of the inverse problem, lead to mean squared error rates. We conclude with an extensive experimental analysis of the proposed methods. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.07201&r=all 
By:  Qingliang Fan; Yaqian Wu 
Abstract:  Instrumental variables (IV) regression is a popular method for the estimation of the endogenous treatment effects. Conventional IV methods require all the instruments are relevant and valid. However, this is impractical especially in highdimensional models when we consider a large set of candidate IVs. In this paper, we propose an IV estimator robust to the existence of both the invalid and irrelevant instruments (called R2IVE) for the estimation of endogenous treatment effects. This paper extends the scope of Kang et al. (2016) by considering a true highdimensional IV model and a nonparametric reduced form equation. It is shown that our procedure can select the relevant and valid instruments consistently and the proposed R2IVE is rootn consistent and asymptotically normal. Monte Carlo simulations demonstrate that the R2IVE performs favorably compared to the existing highdimensional IV estimators (such as, NAIVE (Fan and Zhong, 2018) and sisVIVE (Kang et al., 2016)) when invalid instruments exist. In the empirical study, we revisit the classic question of trade and growth (Frankel and Romer, 1999). 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.14998&r=all 
By:  Panos Toulis 
Abstract:  We propose a partial identification method for estimating disease prevalence from serology studies. Our data are results from antibody tests in some population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The proposed method is conservative for marginal inference, in general, but its key advantage over more standard approaches is that it is valid in finite samples even when the underlying model is not point identified. Moreover, our method requires only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent Covid19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%2% (at the time of study), and are therefore inconclusive. However, this range could be narrowed down to 0.7%1.5% if the actual false positive rate of the antibody test was indeed near its empirical estimate (~0.5%). In another study from New York state, Covid19 prevalence is confidently estimated in the range 13%17% in midApril of 2020, which also suggests significant geographic variation in Covid19 exposure across the US. Combining all datasets yields a 5%8% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.16214&r=all 
By:  Yong Bao (Purdue University); Xiaotian Liu (Purdue University); Aman Ullah (Department of Economics, University of California Riverside) 
Abstract:  Barry Arnold has made many fundamental and innovative contributions in different areas of statistics and econometrics, including estimation and inference,distribution theory, Bayesian inference, order statistics, income inequality measures, and characterization problems. His extensive work in the area of distribution theory include studies on income distributions and Lorenz curves, the exact sampling distribution theory of test statistics, and the characterization of distributions. In our paper here we consider the problem of developing exact sampling distributions of various econometric and statistical estimators and test statistics. The motivation stems from the fact that inference procedures based on the asymptotic distributions may provide misleading results if the sample size is small or moderately large. In view of this we develop a unified procedure by first observing that a large number of econometric and statistical estimators can be written as ratios of quadratic forms. Their distributions can then be straightforwardly analyzed by using Imhofâ€™s (1961) method. We show the applications of this procedure to develop the distribution of some commonly used statistics in applied work. The exact results developed will be helpful for practitioners to conduct appropriate inference for any given size of the sample data. 
Keywords:  Exact Distribution, Sharp Ratio, Coeficcient of Variation, DurbiWatson Test, Moran Test, Imhof Distribution ,Rsquare 
Date:  2020–01 
URL:  http://d.repec.org/n?u=RePEc:ucr:wpaper:202014&r=all 
By:  TaeHwy Lee (Department of Economics, University of California Riverside); Millie Yi Mao (Azusa Pacific University); Aman Ullah (University of California, Riverside) 
Abstract:  The estimation of a large covariance matrix is challenging when the dimension p is large relative to the sample size n. Common approaches to deal with the challenge have been based on thresholding or shrinkage methods in estimating covariance matrices. However, in many applications (e.g., regression, forecast combination, portfolio selection), what we need is not the covariance matrix but its inverse (the precision matrix). In this paper we introduce a method of estimating the highdimensional "dynamic conditional precision" (DCP) matrices. The proposed DCP algorithm is based on the estimator of a large unconditional precision matrix by Fan and Lv (2016) to deal with the highdimension and the dynamic conditional correlation (DCC) model by Engle (2002) to embed a dynamic structure to the conditional precision matrix. The simulation results show that the DCP method performs substantially better than the methods of estimating covariance matrices based on thresholding or shrinkage methods. Finally, inspired by Hsiao and Wan (2014), we examine the "forecast combination puzzle" using the DCP, thresholding, and shrinkage methods. 
Keywords:  Highdimensional conditional precision matrix, ISEE, DCP, Forecast combination puzzle. 
JEL:  C3 C4 C5 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:ucr:wpaper:202012&r=all 
By:  Haozhe Zhang; Yehua Li 
Abstract:  We consider spatially dependent functional data collected under a geostatistics setting, where spatial locations are irregular and random. The functional response is the sum of a spatially dependent functional effect and a spatially independent functional nugget effect. Observations on each function are made on discrete time points and contaminated with measurement errors. Under the assumption of spatial stationarity and isotropy, we propose a tensor product spline estimator for the spatiotemporal covariance function. When a coregionalization covariance structure is further assumed, we propose a new functional principal component analysis method that borrows information from neighboring functions. The proposed method also generates nonparametric estimators for the spatial covariance functions, which can be used for functional kriging. Under a unified framework for sparse and dense functional data, infill and increasing domain asymptotic paradigms, we develop the asymptotic convergence rates for the proposed estimators. Advantages of the proposed approach are demonstrated through simulation studies and two real data applications representing sparse and dense functional data, respectively. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.13489&r=all 
By:  Veronika Czellar; David T. Frazier; Eric Renault 
Abstract:  Indirect Inference (II) is a popular technique for estimating complex parametric models whose likelihood function is intractable, however, the statistical efficiency of II estimation is questionable. While the efficient method of moments, Gallant and Tauchen (1996), promises efficiency, the price to pay for this efficiency is a loss of parsimony and thereby a potential lack of robustness to model misspecification. This stands in contrast to simpler II estimation strategies, which are known to display less sensitivity to model misspecification precisely due to their focus on specific elements of the underlying structural model. In this research, we propose a new simulationbased approach that maintains the parsimony of II estimation, which is often critical in empirical applications, but can also deliver estimators that are nearly as efficient as maximum likelihood. This new approach is based on using a constrained approximation to the structural model, which ensures identification and can deliver estimators that are nearly efficient. We demonstrate this approach through several examples, and show that this approach can deliver estimators that are nearly as efficient as maximum likelihood, when feasible, but can be employed in many situations where maximum likelihood is infeasible. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.10245&r=all 
By:  Georges Bresson; Anoop Chaturvedi; Mohammad Arshad Rahman; Shalabh 
Abstract:  Linear regression with measurement error in the covariates is a heavily studied topic, however, the statistics/econometrics literature is almost silent to estimating a multiequation model with measurement error. This paper considers a seemingly unrelated regression model with measurement error in the covariates and introduces two novel estimation methods: a pure Bayesian algorithm (based on Markov chain Monte Carlo techniques) and its mean field variational Bayes (MFVB) approximation. The MFVB method has the added advantage of being computationally fast and can handle big data. An issue pertinent to measurement error models is parameter identification, and this is resolved by employing a prior distribution on the measurement error variance. The methods are shown to perform well in multiple simulation studies, where we analyze the impact on posterior estimates arising due to different values of reliability ratio or variance of the true unobserved quantity used in the data generating process. The paper further implements the proposed algorithms in an application drawn from the health literature and shows that modeling measurement error in the data can improve model fitting. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.07074&r=all 
By:  Aknouche, Abdelhakim; Almohaimeed, Bader; Dimitrakopoulos, Stefanos 
Abstract:  We propose an autoregressive conditional duration (ACD) model with periodic timevarying parameters and multiplicative error form. We name this model periodic autoregressive conditional duration (PACD). First, we study the stability properties and the moment structures of it. Second, we estimate the model parameters, using (profile and twostage) Gamma quasimaximum likelihood estimates (QMLEs), the asymptotic properties of which are examined under general regularity conditions. Our estimation method encompasses the exponential QMLE, as a particular case. The proposed methodology is illustrated with simulated data and two empirical applications on forecasting Bitcoin trading volume and realized volatility. We found that the PACD produces better insample and outofsample forecasts than the standard ACD. 
Keywords:  Positive time series, autoregressive conditional duration, periodic timevarying models, multiplicative error models, exponential QMLE, twostage Gamma QMLE. 
JEL:  C13 C18 C4 C41 C5 C51 C58 
Date:  2020–07–08 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:101696&r=all 
By:  Jason Hartford; Victor Veitch; Dhanya Sridhar; Kevin LeytonBrown 
Abstract:  Instrumental variable methods provide a powerful approach to estimating causal effects in the presence of unobserved confounding. But a key challenge when applying them is the reliance on untestable "exclusion" assumptions that rule out any relationship between the instrument variable and the response that is not mediated by the treatment. In this paper, we show how to perform consistent IV estimation despite violations of the exclusion assumption. In particular, we show that when one has multiple candidate instruments, only a majority of these candidatesor, more generally, the modal candidateresponse relationshipneeds to be valid to estimate the causal effect. Our approach uses an estimate of the modal prediction from an ensemble of instrumental variable estimators. The technique is simple to apply and is "blackbox" in the sense that it may be used with any instrumental variable estimator as long as the treatment effect is identified for each valid instrument independently. As such, it is compatible with recent machinelearning based estimators that allow for the estimation of conditional average treatment effects (CATE) on complex, high dimensional data. Experimentally, we achieve accurate estimates of conditional average treatment effects using an ensemble of deep networkbased estimators, including on a challenging simulated Mendelian Randomization problem. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.11386&r=all 
By:  LeYu Chen; Sokbae Lee 
Abstract:  We consider both $\ell _{0}$penalized and $\ell _{0}$constrained quantile regression estimators. For the $\ell _{0}$penalized estimator, we derive an exponential inequality on the tail probability of excess quantile prediction risk and apply it to obtain nonasymptotic upper bounds on the meansquare parameter and regression function estimation errors. We also derive analogous results for the $\ell _{0}$constrained estimator. The resulting rates of convergence are minimaxoptimal and the same as those for $\ell _{1} $penalized estimators. Further, we characterize expected Hamming loss for the $\ell _{0}$penalized estimator. We implement the proposed procedure via mixed integer linear programming and also a more scalable firstorder approximation algorithm. We illustrate the finitesample performance of our approach in Monte Carlo experiments and its usefulness in a real data application concerning conformal prediction of infant birth weights (with $% n\approx 10^{3}$ and up to $p>10^{3}$). In sum, our $\ell _{0}$based method produces a much sparser estimator than the $\ell _{1}$penalized approach without compromising precision. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.11201&r=all 
By:  Brantly Callaway; Sonia Karami 
Abstract:  This paper considers identifying and estimating the Average Treatment Effect on the Treated (ATT) in interactive fixed effects models. We focus on the case where there is a single unobserved timeinvariant variable whose effect is allowed to change over time, though we also allow for time fixed effects and unobserved individuallevel heterogeneity. The models that we consider in this paper generalize many commonly used models in the treatment effects literature including difference in differences and individualspecific linear trend models. Unlike the majority of the literature on interactive fixed effects models, we do not require the number of time periods to go to infinity to consistently estimate the ATT. Our main identification result relies on having the effect of some time invariant covariate (e.g., race or sex) not vary over time. Using our approach, we show that the ATT can be identified with as few as three time periods and with panel or repeated cross sections data. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.15780&r=all 
By:  Jing Zhou; Gerda Claeskens; Jelena Bradic 
Abstract:  Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between modelaveraged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted modelaveraged as well as composite $l_1$regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that modelaveraged and composite quantile estimators often outperform leastsquares methods, even in the case of Gaussian model noise. Real data application witnesses the method's practical use through the reconstruction of compressed audio signals. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.07457&r=all 
By:  Helmut Lütkepohl; Thore Schlaak 
Abstract:  In proxy vector autoregressive models, the structural shocks of interest are identified by an instrument. Although heteroskedasticity is occasionally allowed for, it is typically taken for granted that the impact effects of the structural shocks are timeinvariant despite the change in their variances. We develop a test for this implicit assumption and present evidence that the assumption of timeinvariant impact effects may be violated in previously used empirical models. 
Keywords:  Structural vector autoregression, proxy VAR, identification through heteroskedasticity 
JEL:  C32 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:diw:diwwpp:dp1876&r=all 
By:  Ali Mehrabani (UCR); Aman Ullah (Department of Economics, University of California Riverside) 
Abstract:  In this paper, we propose an efficient weighted average estimator in Seemingly Unrelated Regressions. This average estimator shrinks a generalized least squares (GLS) estimator towards a restricted GLS estimator, where the restrictions represent possible parameter homogeneity specifications. The shrinkage weight is inversely proportional to a weighted quadratic loss function. The approximate bias and second moment matrix of the average estimator using the largesample approximations are provided. We give the conditions under which the average estimator dominates the GLS estimator on the basis of their mean squared errors. We illustrate our estimator by applying it to a cost system for U.S. Commercial banks, over the period from 2000 to 2018. Our results indicate that on average most of the banks have been operating under increasing returns to scale. We find that over the recent years, scale economies are a plausible reason for the growth in average size of banks and the tendency toward increasing scale is likely to continue. 
Keywords:  Key Words: Steintype Shrinkage Estimator; Asymptotic Approximations; SUR; GLS 
Date:  2020–01 
URL:  http://d.repec.org/n?u=RePEc:ucr:wpaper:202013&r=all 
By:  Anish Agarwal; Abdullah Alomar; Romain Cosson; Devavrat Shah; Dennis Shen 
Abstract:  We develop a method to help quantify the impact different levels of mobility restrictions could have had on COVID19 related deaths across nations. Synthetic control (SC) has emerged as a standard tool in such scenarios to produce counterfactual estimates if a particular intervention had not occurred, using just observational data. However, it remains an important open problem of how to extend SC to obtain counterfactual estimates if a particular intervention had occurred  this is exactly the question of the impact of mobility restrictions stated above. As our main contribution, we introduce synthetic interventions (SI), which helps resolve this open problem by allowing one to produce counterfactual estimates if there are multiple interventions of interest. We prove SI produces consistent counterfactual estimates under a tensor factor model. Our finite sample analysis shows the test error decays as $1/T_0$, where $T_0$ is the amount of observed preintervention data. As a special case, this improves upon the $1/\sqrt{T_0}$ bound on test error for SC in prior works. Our test error bound holds under a certain "subspace inclusion" condition; we furnish a datadriven hypothesis test with provable guarantees to check for this condition. This also provides a quantitative hypothesis test for when to use SC, currently absent in the literature. Technically, we establish the parameter estimation and test error for Principal Component Regression (a key subroutine in SI and several SC variants) under the setting of errorinvariable regression decays as $1/T_0$, where $T_0$ is the number of samples observed; this improves the best prior test error bound of $1/\sqrt{T_0}$. In addition to the COVID19 case study, we show how SI can be used to run dataefficient, personalized randomized control trials using real data from a large ecommerce website and a large developmental economics study. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.07691&r=all 
By:  Sokbae Lee; Serena Ng 
Abstract:  When there is so much data that they become a computation burden, it is not uncommon to compute quantities of interest using a sketch of data of size $m$ instead of the full sample of size $n$. This paper investigates the implications for twostage least squares (2SLS) estimation when the sketches are obtained by a computationally efficient method known as CountSketch. We obtain three results. First, we establish conditions under which given the full sample, a sketched 2SLS estimate can be arbitrarily close to the fullsample 2SLS estimate with high probability. Second, we give conditions under which the sketched 2SLS estimator converges in probability to the true parameter at a rate of $m^{1/2}$ and is asymptotically normal. Third, we show that the asymptotic variance can be consistently estimated using the sketched sample and suggest methods for determining an inferenceconscious sketch size $m$. The sketched 2SLS estimator is used to estimate returns to education. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.07781&r=all 
By:  David T. Frazier 
Abstract:  In many instances, the application of approximate Bayesian methods is hampered by two practical features: 1) the requirement to project the data down to lowdimensional summary, including the choice of this projection, which ultimately yields inefficient inference; 2) a possible lack of robustness to deviations from the underlying model structure. Motivated by these efficiency and robustness concerns, we construct a new Bayesian method that can deliver efficient estimators when the underlying model is wellspecified, and which is simultaneously robust to certain forms of model misspecification. This new approach bypasses the calculation of summaries by considering a norm between empirical and simulated probability measures. For specific choices of the norm, we demonstrate that this approach can deliver point estimators that are as efficient as those obtained using exact Bayesian inference, while also simultaneously displaying robustness to deviations from the underlying model assumptions. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.14126&r=all 
By:  Hugh Christensen; Simon Godsill; Richard Turner 
Abstract:  A Hidden Markov Model for intraday momentum trading is presented which specifies a latent momentum state responsible for generating the observed securities' noisy returns. Existing momentum trading models suffer from timelagging caused by the delayed frequency response of digital filters. Timelagging results in a momentum signal of the wrong sign, when the market changes trend direction. A key feature of this state space formulation, is no such lagging occurs, allowing for accurate shifts in signal sign at market change points. The number of latent states in the model is estimated using three techniques, cross validation, penalized likelihood criteria and simulationbased model selection for the marginal likelihood. All three techniques suggest either 2 or 3 hidden states. Model parameters are then found using BaumWelch and Markov Chain Monte Carlo, whilst assuming a single (discretized) univariate Gaussian distribution for the emission matrix. Often a momentum trader will want to condition their trading signals on additional information. To reflect this, learning is also carried out in the presence of side information. Two sets of side information are considered, namely a ratio of realized volatilities and intraday seasonality. It is shown that splines can be used to capture statistically significant relationships from this information, allowing returns to be predicted. An Input Output Hidden Markov Model is used to incorporate these univariate predictive signals into the transition matrix, presenting a possible solution for dealing with the signal combination problem. Bayesian inference is then carried out to predict the securities $t+1$ return using the forward algorithm. Simple modifications to the current framework allow for a fully nonparametric model with asynchronous prediction. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.08307&r=all 
By:  Wei Wang; Huifu Xu; Tiejun Ma 
Abstract:  When estimating the risk of a financial position with empirical data or Monte Carlo simulations via a taildependent law invariant risk measure such as the Conditional ValueatRisk (CVaR), it is important to ensure the robustness of the statistical estimator particularly when the data contain noise. Kratscher et al. [1] propose a new framework to examine the qualitative robustness of estimators for taildependent law invariant risk measures on Orlicz spaces, which is a step further from earlier work for studying the robustness of risk measurement procedures by Cont et al. [2]. In this paper, we follow the stream of research to propose a quantitative approach for verifying the statistical robustness of taildependent law invariant risk measures. A distinct feature of our approach is that we use the FortetMourier metric to quantify the variation of the true underlying probability measure in the analysis of the discrepancy between the laws of the plugin estimators of law invariant risk measure based on the true data and perturbed data, which enables us to derive an explicit error bound for the discrepancy when the risk functional is Lipschitz continuous with respect to a class of admissible laws. Moreover, the newly introduced notion of Lipschitz continuity allows us to examine the degree of robustness for taildependent risk measures. Finally, we apply our quantitative approach to some wellknown risk measures to illustrate our theory. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.15491&r=all 
By:  Candelaria, Luis E. (University of Warwick) 
Abstract:  This paper analyzes a semiparametric model of network formation in the presence of unobserved agentspeciﬁc heterogeneity. The objective is to identify and estimate the preference parameters associated with homophily on observed attributes when the distributions of the unobserved factors are not parametrically speciﬁed. This paper oﬀers two main contributions to the literature on network formation. First, it establishes a new point identiﬁcation result for the vector of parameters that relies on the existence of a special regressor. The identiﬁcation proof is constructive and characterizes a closedform for the parameter of interest. Second, it introduces a simple twostep semiparametric estimator for the vector of parameters with a ﬁrststep kernel estimator. The estimator is computationally tractable and can be applied to both dense and sparse networks. Moreover, I show that the estimator is consistent and has a limiting normal distribution as the number of individuals in the network increases. Monte Carlo experiments demonstrate that the estimator performs well in ﬁnite samples and in networks with diﬀerent levels of sparsity. 
Keywords:  Network formation ; Unobserved heterogeneity ; Semiparametrics ; Special regressor ; Inverse weighting 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:wrk:warwec:1279&r=all 
By:  David Kohns; Tibor Szendrei 
Abstract:  This paper extends the horseshoe prior of Carvalho et al. (2010) to the Bayesian quantile regression (HSBQR) and provides a fast sampling algorithm that speeds up computation significantly in high dimensions. The performance of the HSBQR is tested on large scale Monte Carlo simulations and an empirical application relevant to macroeoncomics. The Monte Carlo design considers several sparsity structures (sparse, dense, block) and error structures (i.i.d. errors and heteroskedastic errors). A number of LASSO based estimators (frequentist and Bayesian) are pitted against the HSBQR to better gauge the performance of the method on the different designs. The HSBQR yields just as good, or better performance than the other estimators considered when evaluated using coefficient bias and forecast error. We find that the HSBQR is particularly potent in sparse designs and when estimating extreme quantiles. The simulations also highlight how the high dimensional quantile estimators fail to correctly identify the quantile function of the variables when both location and scale effects are present. In the empirical application, in which we evaluate forecast densities of US inflation, the HSBQR provides well calibrated forecast densities whose individual quantiles, have the highest pseudo R squared, highlighting its potential for ValueatRisk estimation. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.07655&r=all 
By:  Anna Bykhovskaya; Vadim Gorin 
Abstract:  The paper analyses cointegration in vector autoregressive processes (VARs) for the cases when both the number of coordinates, $N$, and the number of time periods, $T$, are large and of the same order. We propose a way to examine a VAR for the presence of cointegration based on a modification of the Johansen likelihood ratio test. The advantage of our procedure over the original Johansen test and its finite sample corrections is that our test does not suffer from overrejection. This is achieved through novel asymptotic theorems for eigenvalues of matrices in the test statistic in the regime of proportionally growing $N$ and $T$. Our theoretical findings are supported by Monte Carlo simulations and an empirical illustration. Moreover, we find a surprising connection with multivariate analysis of variance (MANOVA) and explain why it emerges. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.14179&r=all 
By:  Levy, Matthew; Schiraldi, Pasquale 
Abstract:  We study the identification of intertemporal preferences in a stationary dynamic discrete decision model. We propose a new approach which focuses on problems which are intrinsically dynamic: either there is endogenous variation in the choice set, or preferences depend directly on the history. History dependence links the choices of the decisionmaker across periods in a more fundamental sense standard dynamic discrete choice models typically assume. We consider both exponential discounting as well as the quasihyperbolic discounting models of time preferences. We show that if the utility function or the choice set depends on the current states as well as the past choices and/or states, then time preferences are nonparametrically pointidentified separately from the utility function under mild conditions on the data and we may also recover the instantaneous utility function without imposing any normalization on the utility across states. 
Keywords:  dynamic discrete choice; identification; quasihyperbolic discounting; Time preferences 
Date:  2020–02 
URL:  http://d.repec.org/n?u=RePEc:cpr:ceprdp:14447&r=all 
By:  Susan Athey; Raj Chetty; Guido Imbens 
Abstract:  There has been an increase in interest in experimental evaluations to estimate causal effects, partly because their internal validity tends to be high. At the same time, as part of the big data revolution, large, detailed, and representative, administrative data sets have become more widely available. However, the credibility of estimates of causal effects based on such data sets alone can be low. In this paper, we develop statistical methods for systematically combining experimental and observational data to obtain credible estimates of the causal effect of a binary treatment on a primary outcome that we only observe in the observational sample. Both the observational and experimental samples contain data about a treatment, observable individual characteristics, and a secondary (often short term) outcome. To estimate the effect of a treatment on the primary outcome while addressing the potential confounding in the observational sample, we propose a method that makes use of estimates of the relationship between the treatment and the secondary outcome from the experimental sample. If assignment to the treatment in the observational sample were unconfounded, we would expect the treatment effects on the secondary outcome in the two samples to be similar. We interpret differences in the estimated causal effects on the secondary outcome between the two samples as evidence of unobserved confounders in the observational sample, and develop control function methods for using those differences to adjust the estimates of the treatment effects on the primary outcome. We illustrate these ideas by combining data on class size and third grade test scores from the Project STAR experiment with observational data on class size and both third and eighth grade test scores from the New York school system. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.09676&r=all 
By:  Niko Hauzenberger 
Abstract:  Timevarying parameter (TVP) models often assume that the TVPs evolve according to a random walk. This assumption, however, might be questionable since it implies that coefficients change smoothly and in an unbounded manner. In this paper, we relax this assumption by proposing a flexible law of motion for the TVPs in largescale vector autoregressions (VARs). Instead of imposing a restrictive random walk evolution of the latent states, we carefully design hierarchical mixture priors on the coefficients in the state equation. These priors effectively allow for discriminating between periods where coefficients evolve according to a random walk and times where the TVPs are better characterized by a stationary stochastic process. Moreover, this approach is capable of introducing dynamic sparsity by pushing small parameter changes towards zero if necessary. The merits of the model are illustrated by means of two applications. Using synthetic data we show that our approach yields precise parameter estimates. When applied to US data, the model reveals interesting patterns of lowfrequency dynamics in coefficients and forecasts well relative to a wide range of competing models. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.10088&r=all 
By:  Martin Bladt; Alexander J. McNeil 
Abstract:  An approach to modelling volatile financial return series using dvine copulas combined with uniformity preserving transformations known as vtransforms is proposed. By generalizing the concept of stochastic inversion of vtransforms, models are obtained that can describe both stochastic volatility in the magnitude of price movements and serial correlation in their directions. In combination with parametric marginal distributions it is shown that these models can rival and sometimes outperform wellknown models in the extended GARCH family. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.11088&r=all 
By:  Adeola Oyenubi; Martin Wittenberg 
Abstract:  In applied studies, the influence of balance measures on the performance of matching estimators is often taken for granted. This paper considers the performance of different balance measures that have been used in the literature when balance is being optimized. We also propose the use of the entropy measure in assessing balance. To examine the effect of balance measures, we conduct a simulation study where we optimize balance using Genetic Algorithm (GenMatch).We found that balance measures do influence matching estimates under the GenMatch algorithm. The bias and Root Mean Square Error (RMSE) of the estimated treatment effect vary with the choice of balance measure. In the artificial Data Generating Process (DGP) with one covariate considered in this study, the proposed entropy balance measure has the lowest RMSE.The implication of these results is that sensitivity of matching estimates to the choice of balance measure should be given greater attention in empirical studies. 
Keywords:  Genetic matching, balance measures, Information Theory, entropy metric 
JEL:  I38 H53 C21 D13 
Date:  2020–05 
URL:  http://d.repec.org/n?u=RePEc:rza:wpaper:819&r=all 
By:  Tatiana Komarova; Denis Nekipelov 
Abstract:  Empirical economic research crucially relies on highly sensitive individual datasets. At the same time, increasing availability of public individuallevel data makes it possible for adversaries to potentially deidentify anonymized records in sensitive research datasets. This increasing disclosure risk has incentivised large data curators, most notably the US Census bureau and several large companies including Apple, Facebook and Microsoft to look for algorithmic solutions to provide formal nondisclosure guarantees for their secure data. The most commonly accepted formal data security concept in the Computer Science community is differential privacy. It restricts the interaction of researchers with the data by allowing them to issue queries to the data. The differential privacy mechanism then replaces the actual outcome of the query with a randomised outcome. While differential privacy does provide formal data security guarantees, its impact on the identification of empirical economic models and on the performance of estimators in those models has not been sufficiently studied. Since privacy protection mechanisms are inherently finitesample procedures, we define the notion of identifiability of the parameter of interest as a property of the limit of experiments. It is linked to the asymptotic behavior in measure of differentially private estimators. We demonstrate that particular instances of regression discontinuity design and average treatment effect may be problematic for inference with differential privacy because their estimators can only be ensured to converge weakly with their asymptotic limit remaining random and, thus, may not be estimated consistently. This result is clearly supported by our simulation evidence. Our analysis suggests that many other estimators that rely on nuisance parameters may have similar properties with the requirement of differential privacy. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.14732&r=all 
By:  Jacob Boudoukh; Ronen Israel; Matthew P. Richardson 
Abstract:  Analogous to Stambaugh (1999), this paper derives the small sample bias of estimators in Jhorizon predictive regressions, providing a plugin adjustment for these estimators. A number of surprising results emerge, including (i) a higher bias for overlapping than nonoverlapping regressions despite the greater number of observations, and (ii) particularly higher bias for an alternative longhorizon predictive regression commonly advocated for in the literature. For large J, the bias is linear in (J/T) with a slope that depends on the predictive variable’s persistence. The bias adjustment substantially reduces the existing magnitude of longhorizon estimates of predictability. 
JEL:  C01 C1 C22 C53 C58 G12 G17 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:27410&r=all 
By:  Cem Cakmakli (Koç University); Yasin Simsek (Koç University) 
Abstract:  This paper extends the canonical model of epidemiology, SIRD model, to allow for time varying parameters for realtime measurement of the stance of the COVID19 pandemic. Time variation in model parameters is captured using the generalized autoregressive score modelling structure designed for the typically daily count data related to pandemic. The resulting specification permits a flexible yet parsimonious model structure with a very low computational cost. This is especially crucial at the onset of the pandemic when the data is scarce and the uncertainty is abundant. Full sample results show that countries including US, Brazil and Russia are still not able to contain the pandemic with the US having the worst performance. Furthermore, Iran and South Korea are likely to experience the second wave of the pandemic. A realtime exercise show that the proposed structure delivers timely and precise information on the current stance of the pandemic ahead of the competitors that use rolling window. This, in turn, transforms into accurate shortterm predictions of the active cases. We further modify the model to allow for unreported cases. Results suggest that the effects of the presence of these cases on the estimation results diminish towards the end of sample with the increasing number of testing. 
Keywords:  COVID19, SIRD, Observation driven models, Score models, Count data, Time varying parameters. 
JEL:  C13 C32 C51 I19 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:koc:wpaper:2013&r=all 
By:  Karanasos, Menelaos; Paraskevopoulos,Alexandros; Canepa, Alessandra (University of Turin) 
Abstract:  For the large family of ARMA models with variable coeffcients we obtain an explicit and computationally tractable solution that generates all their fundamental properties, including theWoldCramer decomposition and their covariance structure, thus unifying the invertibility conditions which guarantee both their asymptotic stability and main properties. The one sided Green's function, associated with the homogeneous solution, is expressed as a banded Hessenbergian formulated exclusively in terms of the autoregressive parameters of the model. The proposed methodology allows for a unified treatment of these `time varying' systems. We also illustrate mathematically one of the focal points in Hallin's (1986) analysis. Namely, that in a time varying setting the backward asymptotic effciency is different from the forward one. Equally important it is shown how the linear algebra techniques, used to obtain the general solution, are equivalent to a simple procedure for manipulating polynomials with variable coeffcients. The practical significance of the suggested approach is illustrated with an application to U.S. in ation data. The main finding is that in ation persistence increased after 1976, whereas from 1986 onwards the persistence reduces and stabilizes to even lower levels than the pre1976 period. 
Date:  2020–04 
URL:  http://d.repec.org/n?u=RePEc:uto:dipeco:202008&r=all 
By:  Matteo Iacopini; Francesco Ravazzolo; Luca Rossini 
Abstract:  This paper proposes a novel asymmetric continuous probabilistic score (ACPS) for evaluating and comparing density forecasts. It extends the proposed score and defines a weighted version, which emphasizes regions of interest, such as the tails or the center of a variable's range. The ACPS is of general use in any situation where the decision maker has asymmetric preferences in the evaluation of the forecasts. In an artificial experiment, the implications of varying the level of asymmetry are illustrated. Then, the proposed score is applied to assess and compare density forecasts of macroeconomic relevant datasets (unemployment rate) and of commodity prices (oil and electricity prices) with a particular focus on the recent COVID crisis period. 
Date:  2020–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2006.11265&r=all 