
on Econometrics 
By:  Sayoni Roychowdhury; Indrila Ganguly; Abhik Ghosh 
Abstract:  In order to evaluate the impact of a policy intervention on a group of units over time, it is important to correctly estimate the average treatment effect (ATE) measure. Due to lack of robustness of the existing procedures of estimating ATE from panel data, in this paper, we introduce a robust estimator of the ATE and the subsequent inference procedures using the popular approach of minimum density power divergence inference. Asymptotic properties of the proposed ATE estimator are derived and used to construct robust test statistics for testing parametric hypotheses related to the ATE. Besides asymptotic analyses of efficiency and powers, extensive simulation studies are conducted to study the finitesample performances of our proposed estimation and testing procedures under both pure and contaminated data. The robustness of the ATE estimator is further investigated theoretically through the influence functions analyses. Finally our proposal is applied to study the longterm economic effects of the 2004 Indian Ocean earthquake and tsunami on the (percapita) gross domestic products (GDP) of five mostly affected countries, namely Indonesia, Sri Lanka, Thailand, India and Maldives. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.13228&r= 
By:  Doko Tchatoka, Firmin; Wang, Wenjie 
Abstract:  Pretesting for exogeneity has become a routine in many empirical applications involving instrumental variables to decide whether the ordinary least squares or the twostage least squares (2SLS) method is appropriate. Guggenberger (2010) shows that the secondstage ttest – based on the outcome of a DurbinWuHausman type pretest for exogeneity in the first stage – has extreme size distortion with asymptotic size equal to 1 when the standard asymptotic critical values are used. In this paper, we first show that both conditional and unconditional on the data, the standard wild bootstrap procedures are invalid for the twostage testing and a closely related shrinkage method, and therefore are not viable solutions to such sizedistortion problem. Then, we propose a novel sizecorrected wild bootstrap approach, which combines certain wild bootstrap critical values along with an appropriate sizecorrection method. We establish uniform validity of this procedure under either conditional heteroskedasticity or clustering in the sense that the resulting tests achieve correct asymptotic size. Monte Carlo simulations confirm our theoretical findings. In particular, our proposed method has remarkable power gains over the standard 2SLSbased ttest in many settings, especially when the identification is not strong. 
Keywords:  DWH Pretest; Shrinkage; Instrumental Variable; Asymptotic Size; Wild Bootstrap; Bonferronibased Sizecorrection; Clustering. 
JEL:  C12 C13 C26 
Date:  2021–11–29 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:110899&r= 
By:  Aknouche, Abdelhakim; Dimitrakopoulos, Stefanos 
Abstract:  We propose a multiplicative autoregressive conditional proportion (ARCP) model for (0,1)valued time series, in the spirit of GARCH (generalized autoregressive conditional heteroscedastic) and ACD (autoregressive conditional duration) models. In particular, our underlying process is defined as the product of a (0,1)valued iid sequence and the inverted conditional mean, which, in turn, depends on past reciprocal observations in such a way that is larger than unity. The probability structure of the model is studied in the context of the stochastic recurrence equation theory, while estimation of the model parameters is performed by the exponential quasimaximum likelihood estimator (EQMLE). The consistency and asymptotic normality of the EQMLE are both established under general regularity assumptions. Finally, the usefulness of our proposed model is illustrated with simulated and two real datasets. 
Keywords:  Proportional time series data, BetaARMA model, Simplex ARMA, Autoregressive conditional duration, Exponential QMLE. 
JEL:  C13 C22 C25 C46 C51 C58 
Date:  2021–12–06 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:110954&r= 
By:  Victor Chernozhukov; Carlos Cinelli; Whitney Newey; Amit Sharma; Vasilis Syrgkanis 
Abstract:  We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution  all for general, nonparametric causal models. Our construction relies on the RieszFrechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial $R^2$ (Pearson's "correlation ratio") of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.13398&r= 
By:  Purevdorj Tuvaandorj 
Abstract:  This paper develops permutation versions of identificationrobust tests in linear instrumental variables (IV) regression. Unlike the existing randomization and rankbased tests in which independence between the instruments and the error terms is assumed, the permutation Anderson Rubin (AR), Lagrange Multiplier (LM) and Conditional Likelihood Ratio (CLR) tests are asymptotically similar and robust to conditional heteroskedasticity under standard exclusion restriction i.e. the orthogonality between the instruments and the error terms. Moreover, when the instruments are independent of the structural error term, the permutation AR tests are exact, hence robust to heavy tails. As such, these tests share the strengths of the rankbased tests and the wild bootstrap AR tests. Numerical illustrations corroborate the theoretical results. 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2111.13774&r= 
By:  Marc Hallin; Hongjian Shi; Mathias Drton; Fang Han 
Abstract:  Defining multivariate generalizations of the classical univariate ranks has been a longstanding open problem in statistics. Optimal transport has been showed to offer a solution by transporting data points to grid approximating a reference measure (Chernozhukov et al. 2017;Hallin, 2017; Hallin et al. 2021a). We take up this new perspective to develop and study multivariate analogues of popular correlations measures including the sign covariance, Kendall's tau and Spearman's rho. Our tests are genuinely distributionfree, hence valid irrespective of the actual (absolutely continuous) distributions of the observations. We present asymptotic distribution theory for these new statistics, providing asymptotic approximations to critical values to be used for testing independence as well as an analysis of power of the resulting tests. Interestingly, we are able to establish a multivariate elliptical ChernoffSavage property, which guarantees that, under ellipticity, our nonparametric tests of independence when compared to Gaussian procedures enjoy an asymptotic relative efficiency of one or larger. Hence, the nonparametric tests constitute a safe replacement for procedures based on multivariate Gaussianity. 
Keywords:  distributionfreeness, vector independence, rank tests, multivariate ranks 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:eca:wpaper:2013/334590&r= 
By:  Yuichi Goto; Tobias Kley; Ria Van Hecke; Stanislav Volgushev; Holger Dette; Marc Hallin 
Abstract:  Frequency domain methods form a ubiquitous part of the statistical toolbox for time series analysis. In recent years, considerable interest has been given to the development of new spectral methodology and tools capturing dynamics in the entire joint distributions and thus avoiding the limitations of classical, L2based spectral methods. Most of the spectral concepts proposed in that literature suffer from one major drawback, though: their estimation requires the choice of a smoothing parameter, which has a considerable impact on estimation quality and poses challenges for statistical inference. In this paper, associated with the concept of copulabased spectrum, we introduce the notion of copula spectral distribution function or integrated copula spectrum. This integrated copula spectrum retains the advantages of copulabased spectra but can be estimated without the need for smoothing parameters. We provide such estimators, along with a thorough theoretical analysis, based on a functional central limit theorem, of their asymptotic properties.We leverage these results to test various hypotheses that cannot be addressed by classical spectral methods, such as the lack of timereversibility or asymmetry in tail dynamics. 
Keywords:  Copula; Ranks; Time series; Frequency domain; Timereversibility 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:eca:wpaper:2013/335426&r= 
By:  Odran Bonnet; Alfred Galichon; YuWei Hsieh; Keith O'Hara; Matt Shum 
Abstract:  The problem of demand inversion  a crucial step in the estimation of random utility discretechoice models  is equivalent to the determination of stable outcomes in twosided matching models. This equivalence applies to random utility models that are not necessarily additive, smooth, nor even invertible. Based on this equivalence, algorithms for the determination of stable matchings provide effective computational methods for estimating these models. For noninvertible models, the identified set of utility vectors is a lattice, and the matching algorithms recover sharp upper and lower bounds on the utilities. Our matching approach facilitates estimation of models that were previously difficult to estimate, such as the pure characteristics model. An empirical application to voting data from the 1999 European Parliament elections illustrates the good performance of our matchingbased demand inversion algorithms in practice. 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2111.13744&r= 
By:  Florian Gunsilius; Yuliang Xu 
Abstract:  Matching on covariates is a wellestablished framework for estimating causal effects in observational studies. The principal challenge in these settings stems from the often highdimensional structure of the problem. Many methods have been introduced to deal with this challenge, with different advantages and drawbacks in computational and statistical performance and interpretability. Moreover, the methodological focus has been on matching two samples in binary treatment scenarios, but a dedicated method that can optimally balance samples across multiple treatments has so far been unavailable. This article introduces a natural optimal matching method based on entropyregularized multimarginal optimal transport that possesses many useful properties to address these challenges. It provides interpretable weights of matched individuals that converge at the parametric rate to the optimal weights in the population, can be efficiently implemented via the classical iterative proportional fitting procedure, and can even match several treatment arms simultaneously. It also possesses demonstrably excellent finite sample properties. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.04398&r= 
By:  YuChin Hsu; Robert P. Lieli 
Abstract:  We provide a comprehensive theory of conducting insample statistical inference about receiver operating characteristic (ROC) curves that are based on predicted values from a first stage model with estimated parameters (such as a logit regression). The term "insample" refers to the practice of using the same data for model estimation (training) and subsequent evaluation, i.e., the construction of the ROC curve. We show that in this case the first stage estimation error has a generally nonnegligible impact on the asymptotic distribution of the ROC curve and develop the appropriate pointwise and functional limit theory. We propose methods for simulating the distribution of the limit process and show how to use the results in practice in comparing ROC curves. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.01772&r= 
By:  Laurens Cherchye; Thomas Demuynck; Bram De Rock; Cédric Duprez; Glenn Magerman; Marijn Verschelde 
Abstract:  We propose a novel method for structural production analysis in the presence of unobserved heterogeneity in productivity. Our approach is intrinsically nonparametric and does not require the stringent assumption of Hicks neutrality. We assume cost minimization as the firms’ behavioral objective, and we model productivity on which firms condition the input demand. Our model can equivalently be represented in terms of endogenously chosen latent input costs, which avoids an endogeneity bias in a natural way. Our empirical application to unique and detailed Belgian manufacturing data shows that our method allows for drawing strong and robust conclusions, despite its nonparametric orientation. For example, we confirm the welldocumented productivity slowdown, and we highlight a potential bias when using a commonscale intermediate inputs price deflator in the estimation of productivity. In addition, we provide robust empirical evidence against the assumption of Hicks neutrality for the setting at hand. 
Keywords:  productivity, unobserved heterogeneity, simultaneity bias, nonparametric production analysis, cost minimisation, manufacturing 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:eca:wpaper:2013/335089&r= 
By:  Niko Hauzenberger; Florian Huber; Massimiliano Marcellino; Nico Petz 
Abstract:  We develop a nonparametric multivariate time series model that remains agnostic on the precise relationship between a (possibly) large set of macroeconomic time series and their lagged values. The main building block of our model is a Gaussian Process prior on the functional relationship that determines the conditional mean of the model, hence the name of Gaussian Process Vector Autoregression (GPVAR). We control for changes in the error variances by introducing a stochastic volatility specification. To facilitate computation in high dimensions and to introduce convenient statistical properties tailored to match stylized facts commonly observed in macro time series, we assume that the covariance of the Gaussian Process is scaled by the latent volatility factors. We illustrate the use of the GPVAR by analyzing the effects of macroeconomic uncertainty, with a particular emphasis on time variation and asymmetries in the transmission mechanisms. Using US data, we find that uncertainty shocks have timevarying effects, they are less persistent during recessions but their larger size in these specific periods causes more than proportional effects on real growth and employment. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.01995&r= 
By:  Kumar Yashaswi 
Abstract:  Stochastic volatility models have existed in Option pricing theory ever since the crash of 1987 which violated the BlackScholes model assumption of constant volatility. Heston model is one such stochastic volatility model that is widely used for volatility estimation and option pricing. In this paper, we design a novel method to estimate parameters of Heston model under statespace representation using Bayesian filtering theory and Posterior CramerRao Lower Bound (PCRLB), integrating it with Normal Maximum Likelihood Estimation (NMLE) proposed in [1]. Several Bayesian filters like Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), Particle Filter (PF) are used for latent state and parameter estimation. We employ a switching strategy proposed in [2] for adaptive state estimation of the nonlinear, discretetime statespace model (SSM) like Heston model. We use a particle filter approximated PCRLB [3] based performance measure to judge the best filter at each time step. We test our proposed framework on pricing data from S&P 500 and NSE Index, estimating the underlying volatility and parameters from the index. Our proposed method is compared with the VIX measure and historical volatility for both the indexes. The results indicate an effective framework for estimating volatility adaptively with changing market dynamics. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.04576&r= 
By:  Xinkun Nie; Guido Imbens; Stefan Wager 
Abstract:  The ability to generalize experimental results from randomized control trials (RCTs) across locations is crucial for informing policy decisions in targeted regions. Such generalization is often hindered by the lack of identifiability due to unmeasured effect modifiers that compromise direct transport of treatment effect estimates from one location to another. We build upon sensitivity analysis in observational studies and propose an optimization procedure that allows us to get bounds on the treatment effects in targeted regions. Furthermore, we construct more informative bounds by balancing on the moments of covariates. In simulation experiments, we show that the covariate balancing approach is promising in getting sharper identification intervals. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.04723&r= 
By:  Marc Hallin; Gilles Mordant 
Abstract:  Extending to dimension 2 and higher the dual univariate concepts of ranks and quantiles has remained an open problem for more than half a century. Based on measure transportation results, a solution has been proposed recently under the name centeroutward ranks and quantiles which, contrary to previous proposals, enjoys all the properties that make univariate ranks a successful tool for statistical inference. Just as their univariate counterparts (to which they reduce in dimension one), centeroutward ranks allow for the construction of distributionfree and asymptotically efficient tests for a variety of problemswhere the density of some noise or innovation remains unspecified. The actual implementation of these testsinvolves the somewhat arbitrary choice of a grid. While the asymptotic impact of that choice is nil, its finitesample consequences are not. In this note, we investigate the finitesample impact of that choice in the typical context of the multivariate twosample location problem. 
Keywords:  Ranks; Time series; Frequency domain; Timereversibility 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:eca:wpaper:2013/335403&r= 
By:  Patrick Bajari; Brian Burdick; Guido W. Imbens; Lorenzo Masoero; James McQueen; Thomas Richardson; Ido M. Rosen 
Abstract:  In this study we introduce a new class of experimental designs. In a classical randomized controlled trial (RCT), or A/B test, a randomly selected subset of a population of units (e.g., individuals, plots of land, or experiences) is assigned to a treatment (treatment A), and the remainder of the population is assigned to the control treatment (treatment B). The difference in average outcome by treatment group is an estimate of the average effect of the treatment. However, motivating our study, the setting for modern experiments is often different, with the outcomes and treatment assignments indexed by multiple populations. For example, outcomes may be indexed by buyers and sellers, by content creators and subscribers, by drivers and riders, or by travelers and airlines and travel agents, with treatments potentially varying across these indices. Spillovers or interference can arise from interactions between units across populations. For example, sellers' behavior may depend on buyers' treatment assignment, or vice versa. This can invalidate the simple comparison of means as an estimator for the average effect of the treatment in classical RCTs. We propose new experiment designs for settings in which multiple populations interact. We show how these designs allow us to study questions about interference that cannot be answered by classical randomized experiments. Finally, we develop new statistical methods for analyzing these Multiple Randomization Designs. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.13495&r= 
By:  Konrad Menzel 
Abstract:  This paper explores the use of deep neural networks for semiparametric estimation of economic models of maximizing behavior in production or discrete choice. We argue that certain deep networks are particularly well suited as a nonparametric sieve to approximate regression functions that result from nonlinear latent variable models of continuous or discrete optimization. Multistage models of this type will typically generate rich interaction effects between regressors ("inputs") in the regression function so that there may be no plausible separability restrictions on the "reducedform" mapping form inputs to outputs to alleviate the curse of dimensionality. Rather, economic shape, sparsity, or separability restrictions either at a global level or intermediate stages are usually stated in terms of the latent variable model. We show that restrictions of this kind are imposed in a more straightforward manner if a sufficiently flexible version of the latent variable model is in fact used to approximate the unknown regression function. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.01377&r= 
By:  Cl\'ement de Chaisemartin; Xavier D'Haultfoeuille 
Abstract:  Linear regressions with period and group fixed effects are widely used to estimate policies' effects: 26 of the 100 most cited papers published by the American Economic Review from 2015 to 2019 estimate such regressions. It has recently been show that those regressions may produce misleading estimates, if the policy's effect is heterogeneous between groups or over time, as is often the case. This survey reviews a fastgrowing literature that documents this issue, and that proposes alternative estimators robust to heterogeneous effects. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.04565&r= 
By:  Mike Tsionas (Montpellier Business School & Lancaster University Management School); Valentin Zelenyuk (School of Economics, University of Queensland, Brisbane, Qld 4072, Australia) 
Abstract:  We propose a very general approach for modeling production technologies that allows for modeling both inefficiency and noise that are specific for each input and each output. The approach is based on amalgamating ideas from nonparametric activity analysis models for production and consumption theory with stochastic frontier models. We do this by effectively reinterpreting the activity analysis models as simultaneous equations models in Bayesian compression and artificial neural networks frameworks. We make minimal assumption about noise in the data and we allow for flexible approximations to input and outputspecific slacks. We use compression to solve the problem of an exceeding number of parameters in general production technologies and we also incorporate environmental variables in the estimation. We present Monte Carlo simulation results and empirical illustration and comparison of this approach for US banking data. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:qld:uqcepa:172&r= 
By:  Filippo Pellegrino 
Abstract:  This article proposes an extension for standard timeseries regression tree modelling to handle predictors that show irregularities such as missing observations, periodic patterns in the form of seasonality and cycles, and nonstationary trends. In doing so, this approach permits also to enrich the information set used in treebased autoregressions via unobserved components. Furthermore, this manuscript also illustrates a relevant approach to control overfitting based on ensemble learning and recent developments in the jackknife literature. This is strongly beneficial when the number of observed time periods is small and advantageous compared to benchmark resampling methods. Empirical results show the benefits of predicting equity squared returns as a function of their own past and a set of macroeconomic data via factoraugmented tree ensembles, with respect to simpler benchmarks. As a byproduct, this approach allows to study the realtime importance of economic news on equity volatility. 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2111.14000&r= 
By:  JeanDavid Fermanian (EnsaeCrest); Dominique Guégan (Université Paris1 PanthéonSorbonne, Centre d'Economie de la Sorbonne,  Ca' Foscari University of Venezia) 
Abstract:  The central question of this paper is how to enhance supervised learning algorithms with fairness requirement ensuring that any sensitive input does not "unfairly"' influence the outcome of the learning algorithm. To attain this objective we proceed by three steps. First after introducing several notions of fairness in a uniform approach, we introduce a more general notion through conditional fairness definition which englobes most of the well known fairness definitions. Second we use a ensemble of binary and continuous classifiers to get an optimal solution for a fair predictive outcome using a relatedpostprocessing procedure without any transformation on the data, nor on the training algorithms. Finally we introduce several tests to verify the fairness of the predictions. Some empirics are provided to illustrate our approach 
Keywords:  fairness; nonparametric regression; classification; accuracy 
JEL:  C10 C38 C53 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:mse:cesdoc:21034&r= 
By:  Jianian Wang; Sheng Zhang; Yanghua Xiao; Rui Song 
Abstract:  Keeping the individual features and the complicated relations, graph data are widely utilized and investigated. Being able to capture the structural information by updating and aggregating nodes' representations, graph neural network (GNN) models are gaining popularity. In the financial context, the graph is constructed based on realworld data, which leads to complex graph structure and thus requires sophisticated methodology. In this work, we provide a comprehensive review of GNN models in recent financial context. We first categorize the commonlyused financial graphs and summarize the feature processing step for each node. Then we summarize the GNN methodology for each graph type, application in each area, and propose some potential research areas. 
Date:  2021–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2111.15367&r= 
By:  Grigory Franguridi 
Abstract:  I suggest an enhancement of the procedure of Chiong, Hsieh, and Shum (2017) for calculating bounds on counterfactual demand in semiparametric discrete choice models. Their algorithm relies on a system of inequalities indexed by cycles of a large number $M$ of observed markets and hence seems to require computationally infeasible enumeration of all such cycles. I show that such enumeration is unnecessary because solving the "fully efficient" inequality system exploiting cycles of all possible lengths $K=1,\dots,M$ can be reduced to finding the length of the shortest path between every pair of vertices in a complete bidirected weighted graph on $M$ vertices. The latter problem can be solved using the FloydWarshall algorithm with computational complexity $O\left(M^3\right)$, which takes only seconds to run even for thousands of markets. Monte Carlo simulations illustrate the efficiency gain from using cycles of all lengths, which turns out to be positive, but small. 
Date:  2021–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2112.04637&r= 