
on Econometrics 
By:  Ulrich Hounyo (University at Albany and CREATES); Kajal Lahiri (University at Albany) 
Abstract:  This paper considers bootstrap inference in model averaging for predictive regressions. We first consider two different types of bootstrap methods in predictive regressions: standard pairwise bootstrap and standard fixeddesign residualbased bootstrap. We show that these procedures are not valid in the context of model averaging. These common bootstrap approaches induce a biasrelated term in the bootstrap variance of averaging estimators. We then propose and justify a fixeddesign residualbased bootstrap resampling approach for model averaging. In a local asymptotic framework, we show the validity of the bootstrap in estimating the variance of a combined forecast and the asymptotic covariance matrix of a combined parameter vector with fixed weights. Our proposed method preserves nonparametrically the crosssectional dependence between different models and the time series dependence in the errors simultaneously. The finite sample performance of these methods are assessed via Monte Carlo simulations. We illustrate our approach using an empirical study of the Taylor rule equation with 24 alternative specifications. 
Keywords:  Bootstrap, Local asymptotic theory, Model average estimators, Wild bootstrap, Variance of consensus forecast 
JEL:  C33 C53 C80 
Date:  2021–09–28 
URL:  http://d.repec.org/n?u=RePEc:aah:create:202114&r= 
By:  Pengzhou Wu; Kenji Fukumizu 
Abstract:  As an important problem of causal inference, we discuss the estimation of treatment effects (TEs) under unobserved confounding. Representing the confounder as a latent variable, we propose IntactVAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying TEs. Our VAE also naturally gives representation balanced for treatment groups, using its prior. Experiments on (semi)synthetic datasets show stateoftheart performance under diverse settings. Based on the identifiability of our model, further theoretical developments on identification and consistent estimation are also discussed. This paves the way towards principled causal effect estimation by deep neural networks. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.15062&r= 
By:  Savi Virolainen 
Abstract:  A new mixture vector autoressive model based on Gaussian and Student's $t$ distributions is introduced. The GStMVAR model incorporates conditionally homoskedastic linear Gaussian vector autoregressions and conditionally heteroskedastic linear Student's $t$ vector autoregressions as its mixture components, and mixing weights that, for a $p$th order model, depend on the full distribution of the preceding $p$ observations. Also a structural version of the model with timevarying Bmatrix and statistically identified shocks is proposed. We derive the stationary distribution of $p+1$ consecutive observations and show that the process is ergodic. It is also shown that the maximum likelihood estimator is strongly consistent, and thereby has the conventional limiting distribution under conventional highlevel conditions. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.13648&r= 
By:  Anish Agarwal; Munther Dahleh; Devavrat Shah; Dennis Shen 
Abstract:  Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed that the entries of the matrix are "missing completely at random" (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of "latent confounders", i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems  a canonical application for matrix completion  a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield "missing not at random" (MNAR) data, which can severely impact any inference procedure that does not correct for this bias. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We design a procedure, which we call "synthetic nearest neighbors" (SNN), to estimate these causal estimands. We prove finitesample consistency and asymptotic normality of our estimator. Our analysis also leads to new theoretical results for the matrix completion literature. In particular, we establish entrywise, i.e., maxnorm, finitesample consistency and asymptotic normality results for matrix completion with MNAR data. As a special case, this also provides entrywise bounds for matrix completion with MCAR data. Across simulated and real data, we demonstrate the efficacy of our proposed estimator. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.15154&r= 
By:  Hugo Freeman; Martin Weidner 
Abstract:  This paper studies linear panel regression models in which the unobserved error term is an unknown smooth function of twoway unobserved fixed effects. In standard additive or interactive fixed effect models the individual specific and time specific effects are assumed to enter with a known functional form (additive or multiplicative), while we allow for this functional form to be more general and unknown. We discuss two different estimation approaches that allow consistent estimation of the regression parameters in this setting as the number of individuals and the number of time periods grow to infinity. The first approach uses the interactive fixed effect estimator in Bai (2009), which is still applicable here, as long as the number of factors in the estimation grows asymptotically. The second approach first discretizes the twoway unobserved heterogeneity (similar to what Bonhomme, Lamadon and Manresa 2017 are doing for oneway heterogeneity) and then estimates a simple linear fixed effect model with additive twoway grouped fixed effects. For both estimation methods we obtain asymptotic convergence results, perform Monte Carlo simulations, and employ the estimators in an empirical application to UK house price data. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.11911&r= 
By:  Loh, Wen Wei; Ren, Dongning 
Abstract:  Valid inference of causeandeffect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a datadriven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining nonconfounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inference following datadriven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the proposed method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available realworld datasets. 
Date:  2021–09–24 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:yve6u&r= 
By:  David C. Mallinson 
Abstract:  Sibling fixed effects (FE) models are useful for estimating causal treatment effects while offsetting unobserved siblinginvariant confounding. However, treatment estimates are biased if an individual's outcome affects their sibling's outcome. We propose a robustness test for assessing the presence of outcometooutcome interference in linear twosibling FE models. We regress a gainscorethe difference between siblings' continuous outcomeson both siblings' treatments and on a pretreatment observed FE. Under certain restrictions, the observed FE's partial regression coefficient signals the presence of outcometooutcome interference. Monte Carlo simulations demonstrated the robustness test under several models. We found that an observed FE signaled outcometooutcome spillover if it was directly associated with an siblinginvariant confounder of treatments and outcomes, directly associated with a sibling's treatment, or directly and equally associated with both siblings' outcomes. However, the robustness test collapsed if the observed FE was directly but differentially associated with siblings' outcomes or if outcomes affected siblings' treatments. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.13399&r= 
By:  Michael Keane (School of Economics); Timothy Neal (UNSW School of Economics) 
Abstract:  We provide a simple survey of the literature on weak instruments, aimed at giving practical advice to applied researchers. It is wellknown that 2SLS has poor properties if instruments are exogenous but weak. We clarify these properties, explain weak instrument tests, and examine how behavior of 2SLS depends on instrument strength. A common standard for acceptable instruments is a ï¬ rststage Fstatistic of at least 10. But 2SLS has poor properties in that context: It has very little power, and generates artiï¬ cially low standard errors precisely in those samples where it generates estimates most contaminated by endogeneity. This causes ttests to give misleading results. In fact, the distribution of tstatistics is highly nonnormal unless F is in the thousands. AndersonRubin and conditional ttests greatly alleviate this problem, and should be used even with strong instruments. A ï¬ rststage F well above 10 is necessary to give high conï¬ dence that 2SLS will outperform OLS. Otherwise, OLS combined with controls for sources of endogeneity may be a superior research strategy to IV. 
Keywords:  Instrumental variables, weak instruments, 2SLS, endogeneity, Ftest, size distortions of tests, AndersonRubin test, conditional ttest, Fuller, JIVE 
Date:  2021–08 
URL:  http://d.repec.org/n?u=RePEc:swe:wpaper:202105b&r= 
By:  Tim Janke; Mohamed Ghanmi; Florian Steinke 
Abstract:  Copulas are a powerful tool for modeling multivariate distributions as they allow to separately estimate the univariate marginal distributions and the joint dependency structure. However, known parametric copulas offer limited flexibility especially in high dimensions, while commonly used nonparametric methods suffer from the curse of dimensionality. A popular remedy is to construct a treebased hierarchy of conditional bivariate copulas. In this paper, we propose a flexible, yet conceptually simple alternative based on implicit generative neural networks. The key challenge is to ensure marginal uniformity of the estimated copula distribution. We achieve this by learning a multivariate latent distribution with unspecified marginals but the desired dependency structure. By applying the probability integral transform, we can then obtain samples from the highdimensional copula distribution without relying on parametric assumptions or the need to find a suitable tree structure. Experiments on synthetic and real data from finance, physics, and image generation demonstrate the performance of this approach. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.14567&r= 
By:  Milian Bachem; Lerby Ergun; Casper de Vries 
Abstract:  Scaling behavior measured in crosssectional studies through the tail index of a power law is prone to a bias. This hampers inference; in particular, time variation in estimated tail indices may be erroneous. In the case of a linear factor model, the factor biases the tail indices in the left and right tail in opposite directions. This fact can be exploited to reduce the bias. We show how this bias arises from the factor, how to remedy for the bias and how to apply our methods to financial data and geographic location data. 
Keywords:  Econometric and statistical methods 
JEL:  C01 C14 C58 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:bca:bocawp:2145&r= 
By:  Anik Burman; Sayantan Banerjee 
Abstract:  We consider the problem of optimizing a portfolio of financial assets, where the number of assets can be much larger than the number of observations. The optimal portfolio weights require estimating the inverse covariance matrix of excess asset returns, classical solutions of which behave badly in highdimensional scenarios. We propose to use a regressionbased joint shrinkage method for estimating the partial correlation among the assets. Extensive simulation studies illustrate the superior performance of the proposed method with respect to variance, weight, and risk estimation errors compared with competing methods for both the global minimum variance portfolios and Markowitz meanvariance portfolios. We also demonstrate the excellent empirical performances of our method on daily and monthly returns of the components of the S&P 500 index. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.13633&r= 
By:  Sarun Kamolthip 
Abstract:  This paper demonstrates the potentials of the long shortterm memory (LSTM) when applyingwith macroeconomic time series data sampled at different frequencies. We first present how theconventional LSTM model can be adapted to the time series observed at mixed frequencies when thesame mismatch ratio is applied for all pairs of lowfrequency output and higherfrequency variable. Togeneralize the LSTM to the case of multiple mismatch ratios, we adopt the unrestricted Mixed DAtaSampling (UMIDAS) scheme (Foroni et al., 2015) into the LSTM architecture. We assess via bothMonte Carlo simulations and empirical application the outofsample predictive performance. Ourproposed models outperform the restricted MIDAS model even in a set up favorable to the MIDASestimator. For real world application, we study forecasting a quarterly growth rate of Thai realGDP using a vast array of macroeconomic indicators both quarterly and monthly. Our LSTM withUMIDAS scheme easily beats the simple benchmark AR(1) model at all horizons, but outperformsthe strong benchmark univariate LSTM only at one and six months ahead. Nonetheless, we find thatour proposed model could be very helpful in the period of large economic downturns for shorttermforecast. Simulation and empirical results seem to support the use of our proposed LSTM withUMIDAS scheme to nowcasting application. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.13777&r= 
By:  Ioanna Arkoudi; Carlos Lima Azevedo; Francisco C. Pereira 
Abstract:  This study proposes a novel approach that combines theory and datadriven choice models using Artificial Neural Networks (ANNs). In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables with a special focus on interpretability and model transparency. Although embedding representations within the logit framework have been conceptualized by Camara (2019), their dimensions do not have an absolute definitive meaning, hence offering limited behavioral insights. The novelty of our work lies in enforcing interpretability to the embedding vectors by formally associating each of their dimensions to a choice alternative. Thus, our approach brings benefits much beyond a simple parsimonious representation improvement over dummy encoding, as it provides behaviorally meaningful outputs that can be used in travel demand analysis and policy decisions. Additionally, in contrast to previously suggested ANNbased Discrete Choice Models (DCMs) that either sacrifice interpretability for performance or are only partially interpretable, our models preserve interpretability of the utility coefficients for all the input variables despite being based on ANN principles. The proposed models were tested on two real world datasets and evaluated against benchmark and baseline models that use dummyencoding. The results of the experiments indicate that our models deliver stateoftheart predictive performance, outperforming existing ANNbased models while drastically reducing the number of required network parameters. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.12042&r= 
By:  Anthony D. Hall; Annastiina Silvennoinen (NCER, Queensland University of Technology); Timo Teräsvirta (Aarhus University, CREATES, C.A.S.E, HumboldtUniversität zu Berlin) 
Abstract:  This paper looks at changes in the correlations of daily returns between the four major banks in Australia. Revelations from the analysis are of importance to investors, but also to government involvement, due to the large proportion of the highly concentrated financial sector relying on the stability of the Big Four. For this purpose, a methodology for building Multivariate TimeVarying STCCGARCH models is developed. The novel contributions in this area are the specification tests related to the correlation component, the extension of the general model to allow for additional correlation regimes, and a detailed exposition of the systematic, improved modelling cycle required for such nonlinear models. There is an Rpackage that includes the steps in the modelling cycle. Simulations evidence the robustness of the recommended model building approach. The empirical analysis reveals an increase in correlations of the Australia's four largest banks that coincides with the stagnation of the home loan market, technology changes, the mining boom, and Basel II alignment, increasing the exposure of the Australian financial sector to shocks. 
Keywords:  Unconditional correlation, modelling volatility, modelling correlations, multivariate autoregressive conditional heteroskedasticity 
JEL:  C32 C52 C58 
Date:  2021–09–28 
URL:  http://d.repec.org/n?u=RePEc:aah:create:202113&r= 
By:  Lena Janys 
Abstract:  It is widely accepted that women are underrepresented in academia in general and economics in particular. This paper introduces a test to detect an underresearched form of hiring bias: implicit quotas. I derive a test under the Null of random hiring that requires no information about individual hires under some assumptions. I derive the asymptotic distribution of this test statistic and, as an alternative, propose a parametric bootstrap procedure that samples from the exact distribution. This test can be used to analyze a variety of other hiring settings. I analyze the distribution of female professors at German universities across 50 different disciplines. I show that the distribution of women, given the average number of women in the respective field, is highly unlikely to result from a random allocation of women across departments and more likely to stem from an implicit quota of one or two women on the department level. I also show that a large part of the variation in the share of women across STEM and nonSTEM disciplines could be explained by a twowomen quota on the department level. These findings have important implications for the potential effectiveness of policies aimed at reducing underrepresentation and providing evidence of how stakeholders perceive and evaluate diversity. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.14343&r= 
By:  Evan Munro; Stefan Wager; Kuang Xu 
Abstract:  In evaluating social programs, it is important to measure treatment effects within a market economy, where interference arises due to individuals buying and selling various goods at the prevailing market price. We introduce a stochastic model of potential outcomes in market equilibrium, where the market price is an exposure mapping. We prove that average direct and indirect treatment effects converge to interpretable meanfield treatment effects, and provide estimators for these effects through a unitlevel randomized experiment augmented with randomization in prices. We also provide a central limit theorem for the estimators that depends on the sensitivity of outcomes to prices. For a variant where treatments are continuous, we show that the sum of direct and indirect effects converges to the total effect of a marginal policy change. We illustrate the coverage and consistency properties of the estimators in simulations of different interventions in a twosided market. 
Date:  2021–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2109.11647&r= 