|
on Econometrics |
By: | Jooyoung Cha; Harold D. Chiang; Yuya Sasaki |
Abstract: | This paper proposes a new method of inference in high-dimensional regression models and high-dimensional IV regression models. Estimation is based on a combined use of the orthogonal greedy algorithm, high-dimensional Akaike information criterion, and double/debiased machine learning. The method of inference for any low-dimensional subvector of high-dimensional parameters is based on a root-$N$ asymptotic normality, which is shown to hold without requiring the exact sparsity condition or the $L^p$ sparsity condition. Simulation studies demonstrate superior finite-sample performance of this proposed method over those based on the LASSO or the random forest, especially under less sparse models. We illustrate an application to production analysis with a panel of Chilean firms. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.09520&r= |
By: | Zongwu Cai (Department of Economics, The University of Kansas, Lawrence, KS 66045, USA); Ying Fang (The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian 361005, China and Department of Statistics, School of Economics, Xiamen University, Xiamen, Fujian 361005, China); Ming Lin (The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, Fujian 361005, China and Department of Statistics, School of Economics, Xiamen University, Xiamen, Fujian 361005, China); Shengfang Tang (Department of Statistics, School of Economics, Xiamen University, Xiamen, Fujian 361005, China) |
Abstract: | This paper proposes a novel test to assess whether there exists heterogeneously distributional effect for an intervention on outcome of interest across different sub-populations defined by covariates of interest. Specifically, we develop a nonparametric test, a consistent test statistic based on the Cramer-von Mises type criterion, which is for the null hypothesis that the treatment has a constant quantile effect for all subpopulations defined by covariates of interest. Under some regular conditions, we establish the asymptotic distribution of the proposed test statistic under the null hypothesis and its consistency against fixed alternatives, together with studying the power of our test against a sequence of local alternatives. Furthermore, a nonparametric Bootstrap procedure is suggested to approximate the finite-sample null distribution of the proposed test and the asymptotic validity of the proposed Bootstrap test is also established. Through Monte Carlo simulations, we demonstrate the power properties of the test in finite samples. Finally, the proposed testing approach is applied to investigating whether there exists heterogeneity for the quantile treatment effect of maternal smoking during pregnancy on infant birth weight across different age groups of mothers. |
Keywords: | Bootstrap; Conditional quantile treatment effect; Heterogeneity test; Nonparametric quantile regression; Nonparametric test. |
JEL: | C12 C13 C14 |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:kan:wpaper:202117&r= |
By: | Zhihao Xu; Clifford M. Hurvich |
Abstract: | We propose a unified frequency domain cross-validation (FDCV) method to obtain an HAC standard error. Our proposed method allows for model/tuning parameter selection across parametric and nonparametric spectral estimators simultaneously. Our candidate class consists of restricted maximum likelihood-based (REML) autoregressive spectral estimators and lag-weights estimators with the Parzen kernel. We provide a method for efficiently computing the REML estimators of the autoregressive models. In simulations, we demonstrate the reliability of our FDCV method compared with the popular HAC estimators of Andrews-Monahan and Newey-West. Supplementary material for the article is available online. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.06093&r= |
By: | Royer, Julien |
Abstract: | We consider an extension of ARCH($\infty$) models to account for conditional asymmetry in the presence of high persistence. After stating existence and stationarity conditions, this paper develops the statistical inference of such models and proves the consistency and asymptotic distribution of a Quasi Maximum Likelihood estimator. Some particular specifications are studied and we introduce a Portmanteau test of goodness-of-fit. In addition, test procedures for asymmetry and GARCH validity are derived. Finally, we present an application on a set of equity indices to reexamine the preeminence of GARCH(1,1) specifications. We find strong evidence that the short memory feature of such models is not suitable for peripheral assets. |
Keywords: | Quasi Maximum Likelihood Estimation, Moderate memory, Testing parameters on the boundary, Recursive design bootstrap |
JEL: | C22 C51 C58 |
Date: | 2021–07 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:109118&r= |
By: | Kedagni, Desire |
Abstract: | In this paper, I consider identification of treatment effects when the treatment is endogenous. The use of instrumental variables is a popular solution to deal with endogeneity, but this may give misleading answers when the instrument is invalid. I show that when the instrument is invalid due to correlation with the first stage unobserved heterogeneity, a second (also possibly invalid) instrument allows to partially identify not only the local average treatment effect but also the entire potential outcomes distributions for compliers. I exploit the fact that the distribution of the observed outcome in each group defined by the treatment and the instrument is a mixture of the distributions of interest. I write the identified set in the form of conditional moment inequalities, and provide an easily implementable inference procedure. Under some (testable) tail restrictions, the potential outcomes distributions are point-identified for compliers. Finally, I illustrate my methodology on data from the National Longitudinal Survey of Young Men to estimate returns to college using college proximity as (potential) instrument. I find that a college degree increases the average hourly wage of the compliers by 38-79%. |
Date: | 2021–06–05 |
URL: | http://d.repec.org/n?u=RePEc:isu:genstf:202106050700001056&r= |
By: | Sung Hoon Choi |
Abstract: | I develop a feasible weighted projected principal component (FPPC) analysis for factor models in which observable characteristics partially explain the latent factors. This novel method provides more efficient and accurate estimators than existing methods. To increase estimation efficiency, I take into account both cross-sectional dependence and heteroskedasticity by using a consistent estimator of the inverse error covariance matrix as the weight matrix. To improve accuracy, I employ a projection approach using characteristics because it removes noise components in high-dimensional factor analysis. By using the FPPC method, estimators of the factors and loadings have faster rates of convergence than those of the conventional factor analysis. Moreover, I propose an FPPC-based diffusion index forecasting model. The limiting distribution of the parameter estimates and the rate of convergence for forecast errors are obtained. Using U.S. bond market and macroeconomic data, I demonstrate that the proposed model outperforms models based on conventional principal component estimators. I also show that the proposed model performs well among a large group of machine learning techniques in forecasting excess bond returns. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.10250&r= |
By: | Matias D. Cattaneo; Rocio Titiunik |
Abstract: | The Regression Discontinuity (RD) design is one of the most widely used non-experimental methods for causal inference and program evaluation. Over the last two decades, statistical and econometric methods for RD analysis have expanded and matured, and there is now a large number of methodological results for RD identification, estimation, inference, and validation. We offer a curated review of this methodological literature organized around the two most popular frameworks for the analysis and interpretation of RD designs: the continuity framework and the local randomization framework. For each framework, we discuss three main areas: (i) designs and parameters, which focuses on different types of RD settings and treatment effects of interest; (ii) estimation and inference, which presents the most popular methods based on local polynomial regression and analysis of experiments, as well as refinements, extensions and other methods; and (iii) validation and falsification, which summarizes an array of mostly empirical approaches to support the validity of RD designs in practice. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.09400&r= |
By: | Subhadeep Mukhopadhyay |
Abstract: | A new nonparametric model of maximum-entropy (MaxEnt) copula density function is proposed, which offers the following advantages: (i) it is valid for mixed random vector. By `mixed' we mean the method works for any combination of discrete or continuous variables in a fully automated manner; (ii) it yields a bonafide density estimate with intepretable parameters. By `bonafide' we mean the estimate guarantees to be a non-negative function, integrates to 1; and (iii) it plays a unifying role in our understanding of a large class of statistical methods. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.09438&r= |
By: | Seisho Sato (University of Tokyo); Naoto Kunimoto (Tokyo Keizai University) |
Abstract: | We develop a new regression method called frequency regression and smoothing. This method is based on the separating information maximum likelihood developed by Kunitomo and Sato (2021) and Sato and Kunitomo (2020) for estimating the hidden states of random variables and handling noisy nonstationary (small sample) time series data. Many economic time series include not only the trend-cycle, seasonal, and measurement error components, but also factors such as structural breaks, abrupt changes, trading-day effects, and institutional changes. Frequency regression and smoothing can be applied to handle such factors in nonstationary time series. The proposed method is simple and applicable to several problems when analyzing nonstationary economic time series and handling seasonal adjustments. An illustrative empirical analysis of the macroconsumption in Japan is provided. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:cfi:fseres:cf519&r= |
By: | Shantanu Gupta; Zachary C. Lipton; David Childers |
Abstract: | Researchers often face data fusion problems, where multiple data sources are available, each capturing a distinct subset of variables. While problem formulations typically take the data as given, in practice, data acquisition can be an ongoing process. In this paper, we aim to estimate any functional of a probabilistic model (e.g., a causal effect) as efficiently as possible, by deciding, at each time, which data source to query. We propose online moment selection (OMS), a framework in which structural assumptions are encoded as moment conditions. The optimal action at each step depends, in part, on the very moments that identify the functional of interest. Our algorithms balance exploration with choosing the best action as suggested by current estimates of the moments. We propose two selection strategies: (1) explore-then-commit (OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero asymptotic regret as assessed by MSE. We instantiate our setup for average treatment effect estimation, where structural assumptions are given by a causal graph and data sources may include subsets of mediators, confounders, and instrumental variables. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.09265&r= |
By: | Simon Freyaldenhoven; Christian Hansen; Jorge Pérez Pérez; Jesse M. Shapiro |
Abstract: | Linear panel models, and the "event-study plots" that often accompany them, are popular tools for learning about policy effects. We discuss the construction of event-study plots and suggest ways to make them more informative. We examine the economic content of different possible identifying assumptions. We explore the performance of the corresponding estimators in simulations, highlighting that a given estimator can perform well or poorly depending on the economic environment. An accompanying Stata package, -xtevent-, facilitates adoption of our suggestions. |
JEL: | C23 C52 |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29170&r= |
By: | Luca Rigotti; Arie Beresteanu |
Abstract: | We provide a sharp identification region for discrete choice models in which consumers' preferences are not necessarily complete and only aggregate choice data is available to the analysts. Behavior with non complete preferences is modeled using an upper and a lower utility for each alternative so that non-comparability can arise. The identification region places intuitive bounds on the probability distribution of upper and lower utilities. We show that the existence of an instrumental variable can be used to reject the hypothesis that all consumers' preferences are complete, while attention sets can be used to rule out the hypothesis that all individuals cannot compare any two alternatives. We apply our methods to data from the 2018 mid-term elections in Ohio. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.06282&r= |
By: | Paul H\"unermund (Copenhagen Business School); Beyers Louw (Maastricht University); Itamar Caspi (Bank of Israel) |
Abstract: | Double machine learning (DML) is becoming an increasingly popular tool for automated model selection in high-dimensional settings. At its core, DML assumes unconfoundedness, or exogeneity of all considered controls, which might likely be violated if the covariate space is large. In this paper, we lay out a theory of bad controls building on the graph-theoretic approach to causality. We then demonstrate, based on simulation studies and an application to real-world data, that DML is very sensitive to the inclusion of bad controls and exhibits considerable bias even with only a few endogenous variables present in the conditioning set. The extent of this bias depends on the precise nature of the assumed causal model, which calls into question the ability of selecting appropriate controls for regressions in a purely data-driven way. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.11294&r= |
By: | Varsha S. Kulkarni |
Abstract: | The higher dimensional autoregressive models would describe some of the econometric processes relatively generically if they incorporate the heterogeneity in dependence on times. This paper analyzes the stationarity of an autoregressive process of dimension $k>1$ having a sequence of coefficients $\beta$ multiplied by successively increasing powers of $0 |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.09083&r= |
By: | Yufeng Mao; Bin Peng; Mervyn J Silvapulle; Param Silvapulle; Yanrong Yang |
Abstract: | This study decomposes the bilateral trade flows using a three-dimensional panel data model. Under the scenario that all three dimensions diverge to infinity, we propose an estimation approach to identify the number of global shocks and countryspecific shocks sequentially, and establish the asymptotic theories accordingly. From the practical point of view, being able to separate the pervasive and nonpervasive shocks in a multi-dimensional panel data is crucial for a range of applications, such as, international financial linkages, migration flows, etc. In the numerical studies, we first conduct intensive simulations to examine the theoretical findings, and then use the proposed approach to investigate the international trade flows from two major trading groups (APEC and EU) over 1982-2019, and quantify the network of bilateral trade. |
Keywords: | three-dimensional panel data, bilateral trade, asymptotic theory |
JEL: | C23 P45 |
Date: | 2021 |
URL: | http://d.repec.org/n?u=RePEc:msh:ebswps:2021-7&r= |
By: | Lin William Cong; Ke Tang; Jingyuan Wang; Yang Zhang |
Abstract: | We predict asset returns and measure risk premia using a prominent technique from artificial intelligence -- deep sequence modeling. Because asset returns often exhibit sequential dependence that may not be effectively captured by conventional time series models, sequence modeling offers a promising path with its data-driven approach and superior performance. In this paper, we first overview the development of deep sequence models, introduce their applications in asset pricing, and discuss their advantages and limitations. We then perform a comparative analysis of these methods using data on U.S. equities. We demonstrate how sequence modeling benefits investors in general through incorporating complex historical path dependence, and that Long- and Short-term Memory (LSTM) based models tend to have the best out-of-sample performance. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.08999&r= |
By: | Heidar Eyjolfsson; Dag Tj{\o}stheim |
Abstract: | The paper discusses multivariate self- and cross-exciting processes. We define a class of multivariate point processes via their corresponding stochastic intensity processes that are driven by stochastic jumps. Essentially, there is a jump in an intensity process whenever the corresponding point process records an event. An attribute of our modelling class is that not only a jump is recorded at each instance, but also its magnitude. This allows large jumps to influence the intensity to a larger degree than smaller jumps. We give conditions which guarantee that the process is stable, in the sense that it does not explode, and provide a detailed discussion on when the subclass of linear models is stable. Finally, we fit our model to financial time series data from the S\&P 500 and Nikkei 225 indices respectively. We conclude that a nonlinear variant from our modelling class fits the data best. This supports the observation that in times of crises (high intensity) jumps tend to arrive in clusters, whereas there are typically longer times between jumps when the markets are calmer. We moreover observe more variability in jump sizes when the intensity is high, than when it is low. |
Date: | 2021–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2108.10176&r= |