
on Econometrics 
By:  Jinyuan Chang; Zhentao Shi; Jia Zhang 
Abstract:  Models defined by moment conditions are at the center of structural econometric estimation, but economic theory is mostly silent about moment selection. A large pool of valid moments can potentially improve estimation efficiency, whereas a few invalid ones may undermine consistency. This paper investigates the empirical likelihood estimation of these momentdefined models in highdimensional settings. We propose a penalized empirical likelihood (PEL) estimation and show that it achieves the oracle property under which the invalid moments can be consistently detected. While the PEL estimator is asymptotically normally distributed, a projected PEL procedure can further eliminate its asymptotic bias and provide more accurate normal approximation to the finite sample distribution. Simulation exercises are carried out to demonstrate excellent numerical performance of these methods in estimation and inference. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.03382&r= 
By:  Hanno Reuvers; Etienne Wijler 
Abstract:  We consider sparse estimation of a class of highdimensional spatiotemporal models. Unlike classical spatial autoregressive models, we do not rely on a predetermined spatial interaction matrix. Instead, under the assumption of sparsity, we estimate the relationships governing both the spatial and temporal dependence in a fully datadriven way by penalizing a set of YuleWalker equations. While this regularization can be left unstructured, we also propose a customized form of shrinkage to further exploit diagonally structured forms of sparsity that follow intuitively when observations originate from spatial grids such as satellite images. We derive finite sample error bounds for this estimator, as well estimation consistency in an asymptotic framework wherein the sample size and the number of spatial units diverge jointly. A simulation exercise shows strong finite sample performance compared to competing procedures. As an empirical application, we model satellite measured NO2 concentrations in London. Our approach delivers forecast improvements over a competitive benchmark and we discover evidence for strong spatial interactions between subregions. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.02864&r= 
By:  Stephen Coussens; Jann Spiess 
Abstract:  Instrumental variables (IV) regression is widely used to estimate causal treatment effects in settings where receipt of treatment is not fully random, but there exists an instrument that generates exogenous variation in treatment exposure. While IV can recover consistent treatment effect estimates, they are often noisy. Building upon earlier work in biostatistics (Joffe and Brensinger, 2003) and relating to an evolving literature in econometrics (including Abadie et al., 2019; HuntingtonKlein, 2020; Borusyak and Hull, 2020), we study how to improve the efficiency of IV estimates by exploiting the predictable variation in the strength of the instrument. In the case where both the treatment and instrument are binary and the instrument is independent of baseline covariates, we study weighting each observation according to its estimated compliance (that is, its conditional probability of being affected by the instrument), which we motivate from a (constrained) solution of the firststage prediction problem implicit to IV. The resulting estimator can leverage machine learning to estimate compliance as a function of baseline covariates. We derive the largesample properties of a specific implementation of a weighted IV estimator in the potential outcomes and local average treatment effect (LATE) frameworks, and provide tools for inference that remain valid even when the weights are estimated nonparametrically. With both theoretical results and a simulation study, we demonstrate that compliance weighting meaningfully reduces the variance of IV estimates when firststage heterogeneity is present, and that this improvement often outweighs any difference between the complianceweighted and unweighted IV estimands. These results suggest that in a variety of applied settings, the precision of IV estimates can be substantially improved by incorporating compliance estimation. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.03726&r= 
By:  Tadao Hoshino; Takahide Yanagi 
Abstract:  In this paper, we investigate a treatment effect model in which individuals interact in a social network and they may not comply with the assigned treatments. We introduce a new concept of exposure mapping, which summarizes spillover effects into a fixed dimensional statistic of instrumental variables, and we call this mapping the instrumental exposure mapping (IEM). We investigate identification conditions for the intentiontotreat effect and the average causal effect for compliers, while explicitly considering the possibility of misspecification of IEM. Based on our identification results, we develop nonparametric estimation procedures for the treatment parameters. Their asymptotic properties, including consistency and asymptotic normality, are investigated using an approximate neighborhood interference framework by Leung (2021). For an empirical illustration of our proposed method, we revisit Paluck et al.'s (2016) experimental data on the anticonflict intervention school program. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.07455&r= 
By:  Kenwin Maung 
Abstract:  Maximum likelihood estimation of large Markovswitching vector autoregressions (MSVARs) can be challenging or infeasible due to parameter proliferation. To accommodate situations where dimensionality may be of comparable order to or exceeds the sample size, we adopt a sparse framework and propose two penalized maximum likelihood estimators with either the Lasso or the smoothly clipped absolute deviation (SCAD) penalty. We show that both estimators are estimation consistent, while the SCAD estimator also selects relevant parameters with probability approaching one. A modified EMalgorithm is developed for the case of Gaussian errors and simulations show that the algorithm exhibits desirable finite sample performance. In an application to shorthorizon return predictability in the US, we estimate a 15 variable 2state MSVAR(1) and obtain the often reported countercyclicality in predictability. The variable selection property of our estimators helps to identify predictors that contribute strongly to predictability during economic contractions but are otherwise irrelevant in expansions. Furthermore, outofsample analyses indicate that large MSVARs can significantly outperform "hardtobeat" predictors like the historical average. 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2107.12552&r= 
By:  Dante Amengual (CEMFI, Centro de Estudios Monetarios y Financieros); Xinyue Bei (Duke University); Enrique Sentana (CEMFI, Centro de Estudios Monetarios y Financieros) 
Abstract:  We propose a multivariate normality test against skew normal distributions using higherorder loglikelihood derivatives which is asymptotically equivalent to the likelihood ratio but only requires estimation under the null. Numerically, it is the supremum of the univariate skewness coefficient test over all linear combinations of the variables. We can simulate its exact finite sample distribution for any multivariate dimension and sample size. Our Monte Carlo exercises confirm its power advantages over alternative approaches. Finally, we apply it to the joint distribution of US city sizes in two consecutive censuses finding that nonnormality is very clearly seen in their growth rates. 
Keywords:  City size distribution, exact test, extremum test, Gibrat's law, skew normal distribution. 
JEL:  C46 R11 
Date:  2021–05 
URL:  http://d.repec.org/n?u=RePEc:cmf:wpaper:wp2021_2104&r= 
By:  Ruoxuan Xiong; Allison Koenecke; Michael Powell; Zhu Shen; Joshua T. Vogelstein; Susan Athey 
Abstract:  Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individuallevel information sharing across data sets. This paper develops federated methods that only utilize summarylevel information from heterogeneous data sets. Our federated methods provide doublyrobust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individuallevel data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets. 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2107.11732&r= 
By:  Guido Imbens; Nathan Kallus; Xiaojie Mao 
Abstract:  We develop a new approach for identifying and estimating average causal effects in panel data under a linear factor model with unmeasured confounders. Compared to other methods tackling factor models such as synthetic controls and matrix completion, our method does not require the number of time periods to grow infinitely. Instead, we draw inspiration from the twoway fixed effect model as a special case of the linear factor model, where a simple differenceindifferences transformation identifies the effect. We show that analogous, albeit more complex, transformations exist in the more general linear factor model, providing a new means to identify the effect in that model. In fact many such transformations exist, called bridge functions, all identifying the same causal effect estimand. This poses a unique challenge for estimation and inference, which we solve by targeting the minimal bridge function using a regularized estimation approach. We prove that our resulting average causal effect estimator is rootN consistent and asymptotically normal, and we provide asymptotically valid confidence intervals. Finally, we provide extensions for the case of a linear factor model with timevarying unmeasured confounders. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.03849&r= 
By:  Sakae Oya; Teruo Nakatsuma 
Abstract:  Harvey et al. (2010) extended the Bayesian estimation method by Sahu et al. (2003) to a multivariate skewelliptical distribution with a general skewness matrix, and applied it to Bayesian portfolio optimization with higher moments. Although their method is epochal in the sense that it can handle the skewness dependency among asset returns and incorporate higher moments into portfolio optimization, it cannot identify all elements in the skewness matrix due to label switching in the Gibbs sampler. To deal with this identification issue, we propose to modify their sampling algorithm by imposing a positive lowertriangular constraint on the skewness matrix of the multivariate skew elliptical distribution and improved interpretability. Furthermore, we propose a Bayesian sparse estimation of the skewness matrix with the horseshoe prior to further improve the accuracy. In the simulation study, we demonstrate that the proposed method with the identification constraint can successfully estimate the true structure of the skewness dependency while the existing method suffers from the identification issue. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.04019&r= 
By:  Amadou Barry; Karim Oualkacha; Arthur Charpentier 
Abstract:  The fixedeffects model estimates the regressor effects on the mean of the response, which is inadequate to summarize the variable relationships in the presence of heteroscedasticity. In this paper, we adapt the asymmetric least squares (expectile) regression to the fixedeffects model and propose a new model: expectile regression with fixedeffects $(\ERFE).$ The $\ERFE$ model applies the within transformation strategy to concentrate out the incidental parameter and estimates the regressor effects on the expectiles of the response distribution. The $\ERFE$ model captures the data heteroscedasticity and eliminates any bias resulting from the correlation between the regressors and the omitted factors. We derive the asymptotic properties of the $\ERFE$ estimators and suggest robust estimators of its covariance matrix. Our simulations show that the $\ERFE$ estimator is unbiased and outperforms its competitors. Our real data analysis shows its ability to capture data heteroscedasticity (see our R package, \url{github.com/AmBarry/erfe}). 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.04737&r= 
By:  Nicolas Debarsy (LEM  Lille économie management  UMR 9221  UA  Université d'Artois  UCL  Université catholique de Lille  Université de Lille  CNRS  Centre National de la Recherche Scientifique); James Lesage (Texas State University) 
Abstract:  There is a great deal of literature regarding use of nongeographically based connectivity matrices or combinations of geographic and nongeographic structures in spatial econometrics models. We focus on convex combinations of weight matrices that result in a single weight matrix reflecting multiple types of connectivity, where coefficients from the convex combination can be used for inference regarding the relative importance of each type of connectivity. This type of model specification raises the question — which connectivity matrices should be used and which should be ignored. For example, in the case of L candidate weight matrices, there are M = 2L −L−1 possible ways to employ two or more of the L weight matrices in alternative model specifications. When L = 5, we have M = 26 possible models involving two or more weight matrices, and for L = 10, M = 1, 013. We use MetropolisHastings guided Monte Carlo integration during MCMC estimation of the models to produce logmarginal likelihoods and associated posterior model probabilities for the set of M possible models, which allows 1 for Bayesian model averaged estimates. We focus on MCMC estimation for a set of M models, estimates of posterior model probabilities, model averaged estimates of the parameters, scalar summary measures of the nonlinear partial derivative impacts, and associated empirical measures of dispersion for the impacts. 
Keywords:  crosssectional dependence,SAR,block sampling parameters for a convex combination,Markov Chain Monte Carlo estimation,hedonic price model 
Date:  2021 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal03046651&r= 
By:  Bastian Schäfer (Paderborn University); Yuanhua Feng (Paderborn University) 
Abstract:  This paper examines datadriven estimation of the mean surface in nonparamet ric regression for huge functional time series. In this framework, we consider the use of the double conditional smoothing (DCS), an equivalent but much faster translation of the 2Dkernel regression. An even faster, but again equivalent func tional DCS (FCDS) scheme and a boundary correction method for the DCS/FCDS is proposed. The asymptotically optimal bandwidths are obtained and selected by an IPI (iterative plugin) algorithm. We show that the IPI algorithm works well in practice in a simulation study and apply the proposals to estimate the spotvolatility and trading volume surface in highfrequency nancial data under a functional representation. Our proposals also apply to large lattice spatial or spatialtemporal data from any research area. 
Keywords:  Spatial nonparametric regression, boundary correction, functional double conditional smoothing, bandwidth selection, spot volatility surface 
JEL:  C14 C51 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:pdn:ciepap:143&r= 
By:  Zhaonan Qu; Ruoxuan Xiong; Jizhou Liu; Guido Imbens 
Abstract:  In many observational studies in social science and medical applications, subjects or individuals are connected, and one unit's treatment and attributes may affect another unit's treatment and outcome, violating the stable unit treatment value assumption (SUTVA) and resulting in interference. To enable feasible inference, many previous works assume the ``exchangeability'' of interfering units, under which the effect of interference is captured by the number or ratio of treated neighbors. However, in many applications with distinctive units, interference is heterogeneous. In this paper, we focus on the partial interference setting, and restrict units to be exchangeable conditional on observable characteristics. Under this framework, we propose generalized augmented inverse propensity weighted (AIPW) estimators for general causal estimands that include direct treatment effects and spillover effects. We show that they are consistent, asymptotically normal, semiparametric efficient, and robust to heterogeneous interference as well as model misspecifications. We also apply our method to the Add Health dataset and find that smoking behavior exhibits interference on academic outcomes. 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2107.12420&r= 
By:  Igor L. Kheifets; Peter C. B. Phillips 
Abstract:  Multicointegration is traditionally defined as a particular long run relationship among variables in a parametric vector autoregressive model that introduces additional cointegrating links between these variables and partial sums of the equilibrium errors. This paper departs from the parametric model, using a semiparametric formulation that reveals the explicit role that singularity of the long run conditional covariance matrix plays in determining multicointegration. The semiparametric framework has the advantage that short run dynamics do not need to be modeled and estimation by standard techniques such as fully modified least squares (FMOLS) on the original I(1) system is straightforward. The paper derives FMOLS limit theory in the multicointegrated setting, showing how faster rates of convergence are achieved in the direction of singularity and that the limit distribution depends on the distribution of the conditional onesided long run covariance estimator used in FMOLS estimation. Wald tests of restrictions on the regression coefficients have nonstandard limit theory which depends on nuisance parameters in general. The usual tests are shown to be conservative when the restrictions are isolated to the directions of singularity and, under certain conditions, are invariant to singularity otherwise. Simulations show that approximations derived in the paper work well in finite samples. The findings are illustrated empirically in an analysis of fiscal sustainability of the US government over the postwar period. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.03486&r= 
By:  Subhadeep Mukhopadhyay 
Abstract:  This article introduces a general statistical modeling principle called "Density Sharpening" and applies it to the analysis of discrete count data. The underlying foundation is based on a new theory of nonparametric approximation and smoothing methods for discrete distributions which play a useful role in explaining and uniting a large class of applied statistical methods. The proposed modeling framework is illustrated using several real applications, from seismology to healthcare to physics. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.07372&r= 
By:  Zheng Fang 
Abstract:  This paper makes the following econometric contributions. First, we develop a unifying framework for testing shape restrictions based on the Wald principle. Second, we examine the applicability and usefulness of some prominent shape enforcing operators in implementing our test, including rearrangement and the greatest convex minorization (or the least concave majorization). In particular, the influential rearrangement operator is inapplicable due to a lack of convexity, while the greatest convex minorization is shown to enjoy the analytic properties required to employ our framework. The importance of convexity in establishing size control has been noted elsewhere in the literature. Third, we show that, despite that the projection operator may not be welldefined/behaved in general nonHilbert parameter spaces (e.g., ones defined by uniform norms), one may nonetheless devise a powerful distancebased test by applying our framework. The finite sample performance of our test is evaluated through Monte Carlo simulations, and its empirical relevance is showcased by investigating the relationship between weekly working hours and the annual wage growth in the highend labor market. 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2107.12494&r= 
By:  Michael Keane (School of Economics); Timothy Neal (UNSW School of Economics) 
Abstract:  We provide a simple survey of the weak instrument literature, aimed at giving practical advice to applied researchers. It is wellknown that 2SLS has poor properties if instruments are exogenous but weak. We clarify these properties, explain weak instrument tests, and examine how behavior of 2SLS depends on instrument strength. A common standard for acceptable instruments is a ï¬ rststage Fstatistic of at least 10. But 2SLS has poor properties in that context: It has very little power, and generates artiï¬ cially low standard errors precisely in those samples where it generates estimates most contaminated by endogeneity. This causes standard ttests to give misleading results. In fact, onetailed 2SLS ttests suï¬€er from severe size distortions unless F is in the thousands. AndersonRubin and conditional ttests alleviate this problem, and should be used even with strong instruments. A ï¬ rststage F of 50 or more is necessary to give reasonable conï¬ dence that 2SLS will outperform OLS. Otherwise, OLS combined with controls for sources of endogeneity may be a superior research strategy to IV. 
Keywords:  Instrumental variables, weak instruments, 2SLS, endogeneity, Ftest, size distortions of tests, AndersonRubin test, conditional ttest, Fuller, JIVE 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:swe:wpaper:202105a&r= 
By:  Yuanhua Feng (Paderborn University); Bastian Schäfer (Paderborn University) 
Abstract:  This paper discusses the suitable choice of the weighting function at a boundary point in local polynomial regression and introduces two new boundary modi cation methods by adapting known ideas for generating boundary kernels. Now continuous estimates at endpoints are achievable. Under given conditions the use of those quite different weighting functions at an interior point is equivalent. At a boundary point the use of dierent methods will lead to different estimates. It is also shown that the optimal weighting function at the endpoints is a natural extension of one of the optimal weighting functions in the interior. Furthermore, it is shown that the most well known boundary kernels proposed in the literature can be generated by local polynomial regression using corresponding weighting functions. The proposals are particularly useful, when oneside smoothing or de tection of change points in nonparametric regression are considered. 
Keywords:  Local polynomial regression, equivalent weighting methods, boundary modification, boundary kernels, finite sample property 
JEL:  C14 C51 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:pdn:ciepap:144&r= 
By:  Serena Ng 
Abstract:  The coronavirus is a global event of historical proportions and just a few months changed the time series properties of the data in ways that make many precovid forecasting models inadequate. It also creates a new problem for estimation of economic factors and dynamic causal effects because the variations around the outbreak can be interpreted as outliers, as shifts to the distribution of existing shocks, or as addition of new shocks. I take the latter view and use covid indicators as controls to 'decovid' the data prior to estimation. I find that economic uncertainty remains high at the end of 2020 even though real economic activity has recovered and covid uncertainty has receded. Dynamic responses of variables to shocks in a VAR similar in magnitude and shape to the ones identified before 2020 can be recovered by directly or indirectly modeling covid and treating it as exogenous. These responses to economic shocks are distinctly different from those to a covid shock, and distinguishing between the two types of shocks can be important in macroeconomic modeling postcovid. 
JEL:  C18 E0 E32 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:29060&r= 
By:  Muhammed Taher AlMudafer; Benjamin Avanzi; Greg Taylor; Bernard Wong 
Abstract:  Neural networks offer a versatile, flexible and accurate approach to loss reserving. However, such applications have focused primarily on the (important) problem of fitting accurate central estimates of the outstanding claims. In practice, properties regarding the variability of outstanding claims are equally important (e.g., quantiles for regulatory purposes). In this paper we fill this gap by applying a Mixture Density Network ("MDN") to loss reserving. The approach combines a neural network architecture with a mixture Gaussian distribution to achieve simultaneously an accurate central estimate along with flexible distributional choice. Model fitting is done using a rollingorigin approach. Our approach consistently outperforms the classical overdispersed model both for central estimates and quantiles of interest, when applied to a wide range of simulated environments of various complexity and specifications. We further extend the MDN approach by proposing two extensions. Firstly, we present a hybrid GLMMDN approach called "ResMDN". This hybrid approach balances the tractability and ease of understanding of a traditional GLM model on one hand, with the additional accuracy and distributional flexibility provided by the MDN on the other. We show that it can successfully improve the errors of the baseline ccODP, although there is generally a loss of performance when compared to the MDN in the examples we considered. Secondly, we allow for explicit projection constraints, so that actuarial judgement can be directly incorporated in the modelling process. Throughout, we focus on aggregate loss triangles, and show that our methodologies are tractable, and that they outperform traditional approaches even with relatively limited amounts of data. We use both simulated data  to validate properties, and real data  to illustrate and ascertain practicality of the approaches. 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2108.07924&r= 
By:  Emmanouil Sfendourakis; Ioane Muni Toke 
Abstract:  A point process model for order flows in limit order books is proposed, in which the conditional intensity is the product of a Hawkes component and a statedependent factor. In the LOB context, state observations may include the observed imbalance or the observed spread. Full technical details for the computationallyefficient estimation of such a process are provided, using either direct likelihood maximization or EMtype estimation. Applications include models for bid and ask market orders, or for upwards and downwards price movements. Empirical results on multiple stocks traded in Euronext Paris underline the benefits of statedependent formulations for LOB modeling, e.g. in terms of goodnessoffit to financial data. 
Date:  2021–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2107.12872&r= 
By:  Jing Tian; Jan P.A.M. Jacobs; Denise R. Osborn 
Abstract:  Multivariate analysis can help to focus on economic phenomena, including trend and cyclical movements. To allow for potential correlation with seasonality, the present paper studies a three component multivariate unobserved component model, focusing on the case of quarterly data and showing that economic restrictions, including common trends and common cycles, can ensure identification. Applied to seasonal aggregate gender employment in Australia, a bivariate male/female model with a common cycle is preferred to both univariate correlated component and bivariate uncorrelated component specifications. This model evidences distinct genderbased seasonal patterns with seasonality declining over time for females and increasing for males. 
Keywords:  trendcycleseasonal decomposition, multivariate unobserved components models, correlated component models, identification, gender employment, Australia 
JEL:  C22 E24 E32 E37 F01 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:een:camaaa:202172&r= 
By:  Arie Beresteanu 
Abstract:  We provide a sharp identification region for discrete choice models in which consumers' preferences are not necessarily complete and only aggregate choice data is available to the analysts. Behavior with non complete preferences is modeled using an upper and a lower utility for each alternative so that noncomparability can arise. The identification region places intuitive bounds on the probability distribution of upper and lower utilities. We show that the existence of an instrumental variable can be used to reject the hypothesis that all consumers' preferences are complete, while attention sets can be used to rule out the hypothesis that all individuals cannot compare any two alternatives. We apply our methods to data from the 2018 midterm elections in Ohio. 
Date:  2021–01 
URL:  http://d.repec.org/n?u=RePEc:pit:wpaper:7145&r= 