|
on Econometrics |
By: | Cheng Chou (University of Leicester); Geert Ridder (University of Southern California); Ruoyao Shi (Department of Economics, University of California Riverside) |
Abstract: | In a dynamic binary choice model that allows for general forms of nonstationarity, we transform the identification of the flow utility parameters into the solution of a (linear) system of equations. The identification of the parameters, therefore, follows the usual argument for linear GMM. In particular, we show that the state transition distribution is not essential for the identification and estimation of the parameters. We propose a three-step conditional-choice-probability-based semiparametric estimator that bypasses estimation of and simulating from the state transition distribution. Simulation experiments show that our estimator gives comparable or better estimates than a competitor estimator, yet it requires fewer assumptions in certain scenarios, is substantially easier to implement, and is computationally much less demanding. The asymptotic distribution of the estimator is provided, and the sensitivity of the estimator to a key assumption is also examined. |
Keywords: | dynamic binary choice model, Markov property, linear system, identification, semiparametric estimation |
JEL: | C35 |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:ucr:wpaper:202402&r=ecm |
By: | Yike Wang; Chris Gu; Taisuke Otsu |
Abstract: | This paper presents a novel application of graph neural networks for modeling and estimating network heterogeneity. Network heterogeneity is characterized by variations in unit's decisions or outcomes that depend not only on its own attributes but also on the conditions of its surrounding neighborhood. We delineate the convergence rate of the graph neural networks estimator, as well as its applicability in semiparametric causal inference with heterogeneous treatment effects. The finite-sample performance of our estimator is evaluated through Monte Carlo simulations. In an empirical setting related to microfinance program participation, we apply the new estimator to examine the average treatment effects and outcomes of counterfactual policies, and to propose an enhanced strategy for selecting the initial recipients of program information in social networks. |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2401.16275&r=ecm |
By: | Dante Amengual (CEMFI, Centro de Estudios Monetarios y Financieros); Gabriele Fiorentini (Università di Firenze and RCEA); Enrique Sentana (CEMFI, Centro de Estudios Monetarios y Financieros) |
Abstract: | In incomplete data models the EM principle implies the moments the Information Matrix test assesses are the expectation given the observations of the moments it would assess were the underlying components observed. This principle also leads to interpretable expressions for their asymptotic covariance matrix adjusted for sampling variability in the parameter estimators under correct specification. Monte Carlo simulations for finite Gaussian mixtures indicate that the parametric bootstrap provides reliable finite sample sizes and good power against various misspecification alternatives. We confirm that 3-component Gaussian mixtures accurately describe cross-sectional distributions of per capita income in the 1960-2000 Penn World Tables. |
Keywords: | Expectation-Maximisation principle, incomplete data, Hessian matrix, outer product of the score. |
JEL: | C46 C52 O47 |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:cmf:wpaper:wp2024_2401&r=ecm |
By: | Victor Chernozhukov; Iv\'an Fern\'andez-Val; Chen Huang; Weining Wang |
Abstract: | The Arellano-Bond estimator can be severely biased when the time series dimension of the data, $T$, is long. The source of the bias is the large degree of overidentification. We propose a simple two-step approach to deal with this problem. The first step applies LASSO to the cross-section data at each time period to select the most informative moment conditions. The second step applies a linear instrumental variable estimator using the instruments constructed from the moment conditions selected in the first step. The two stages are combined using sample-splitting and cross-fitting to avoid overfitting bias. Using asymptotic sequences where the two dimensions of the panel grow with the sample size, we show that the new estimator is consistent and asymptotically normal under much weaker conditions on $T$ than the Arellano-Bond estimator. Our theory covers models with high dimensional covariates including multiple lags of the dependent variable, which are common in modern applications. We illustrate our approach with an application to the short and long-term effects of the opening of K-12 schools and other policies on the spread of COVID-19 using weekly county-level panel data from the United States. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.00584&r=ecm |
By: | Chen, Weilin; Lam, Clifford |
Abstract: | The idiosyncratic components of a tensor time series factor model can exhibit serial correlations, (e.g. finance or economic data), ruling out many state-of-the-art methods that assume white/independent idiosyncratic components. While the traditional higher order orthogonal iteration (HOOI) is proved to be convergent to a set of factor loading matrices, the closeness of them to the true underlying factor loading matrices are in general not established, or only under i.i.d. Gaussian noises. Under the presence of serial and cross-correlations in the idiosyncratic components and time series variables with only bounded fourth order moments, for tensor time series data with tensor order two or above, we propose a pre-averaging procedure that can be considered a random projection method. The estimated directions corresponding to the strongest factors are then used for projecting the data for a potentially improved re-estimation of the factor loading spaces themselves, with theoretical guarantees and rate of convergence spelt out when not all factors are pervasive. We also propose a new rank estimation method which utilizes correlation information from the projected data. Extensive simulations are performed and compared to other state-of-the-art or traditional alternatives. A set of tensor-valued NYC taxi data is also analyzed. |
Keywords: | core rank tensor; tensor fibres pre-averaging; strongest factors projection; iterative projection algorithm; bootstrap tensor fibres |
JEL: | C1 |
Date: | 2023–12–30 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:121958&r=ecm |
By: | Sven Klaassen; Jan Teichert-Kluge; Philipp Bach; Victor Chernozhukov; Martin Spindler; Suhas Vijaykumar |
Abstract: | This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.01785&r=ecm |
By: | Seyed Morteza Emadi |
Abstract: | We provide a Copula-based approach to test the exogeneity of instrumental variables in linear regression models. We show that the exogeneity of instrumental variables is equivalent to the exogeneity of their standard normal transformations with the same CDF value. Then, we establish a Wald test for the exogeneity of the instrumental variables. We demonstrate the performance of our test using simulation studies. Our simulations show that if the instruments are actually endogenous, our test rejects the exogeneity hypothesis approximately 93% of the time at the 5% significance level. Conversely, when instruments are truly exogenous, it dismisses the exogeneity assumption less than 30% of the time on average for data with 200 observations and less than 2% of the time for data with 1, 000 observations. Our results demonstrate our test's effectiveness, offering significant value to applied econometricians. |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2401.15253&r=ecm |
By: | Cai, Hengrui; Shi, Chengchun; Song, Rui; Lu, Wenbin |
Abstract: | An individualized decision rule (IDR) is a decision function that assigns each individual a given treatment based on his/her observed characteristics. Most of the existing works in the literature consider settings with binary or finitely many treatment options. In this paper, we focus on the continuous treatment setting and propose a jump interval-learning to develop an individualized interval-valued decision rule (I2DR) that maximizes the expected outcome. Unlike IDRs that recommend a single treatment, the proposed I2DR yields an interval of treatment options for each individual, making it more flexible to implement in practice. To derive an optimal I2DR, our jump interval-learning method estimates the conditional mean of the outcome given the treatment and the covariates via jump penalized regression, and derives the corresponding optimal I2DR based on the estimated outcome regression function. The regressor is allowed to be either linear for clear interpretation or deep neural network to model complex treatment-covariates interactions. To implement jump interval-learning, we develop a searching algorithm based on dynamic programming that efficiently computes the outcome regression function. Statistical properties of the resulting I2DR are established when the outcome regression function is either a piecewise or continuous function over the treatment space. We further develop a procedure to infer the mean outcome under the (estimated) optimal policy. Extensive simulations and a real data application to a Warfarin study are conducted to demonstrate the empirical validity of the proposed I2DR. |
Keywords: | continuous treatment; dynamic programming; individualized interval-valued decision rule; jump interval-learning; precision medicine |
JEL: | C1 |
Date: | 2023–02–13 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:118231&r=ecm |
By: | Orville Mondal; Rui Wang |
Abstract: | This paper provides partial identification of various binary choice models with misreported dependent variables. We propose two distinct approaches by exploiting different instrumental variables respectively. In the first approach, the instrument is assumed to only affect the true dependent variable but not misreporting probabilities. The second approach uses an instrument that influences misreporting probabilities monotonically while having no effect on the true dependent variable. Moreover, we derive identification results under additional restrictions on misreporting, including bounded/monotone misreporting probabilities. We use simulations to demonstrate the robust performance of our approaches, and apply the method to study educational attainment. |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2401.17137&r=ecm |
By: | Zulj, Valentin (Department of Statistics, Uppsala University); Jin, Shaobo (Department of Statistics, Uppsala University) |
Abstract: | When drawing causal inferences from observational data, researchers often model the propen sity score. To date, the literature on the estimation of propensity scores is vast, and includes covariate selection algorithms as well as super learners and model averaging procedures. The latter often tune the estimated scores to be either very accurate or to provide the best possible result in terms of covariate balance. This paper focuses on using inverse probability weighting to estimate average treatment effects, and makes the assertion that the context requires both accuracy and balance to yield suitable propensity scores. Using Monte Carlo simulation, the paper studies whether frequentist model averaging can be used to simultaneously account for both balance and accuracy in order to reduce the bias of estimated treatment effects. The candidate propensity scores are estimated using reproducing kernel Hilbert space regression, and the simulation results suggest that model averaging does not improve the performance of the individual estimators. |
Keywords: | . |
JEL: | C59 |
Date: | 2024–01–30 |
URL: | http://d.repec.org/n?u=RePEc:hhs:ifauwp:2024_001&r=ecm |
By: | Federico Crippa |
Abstract: | The causal inference model proposed by Lee (2008) for the regression discontinuity design (RDD) relies on assumptions that imply the continuity of the density of the assignment (running) variable. The test for this implication is commonly referred to as the manipulation test and is regularly reported in applied research to strengthen the design's validity. The multidimensional RDD (MRDD) extends the RDD to contexts where treatment assignment depends on several running variables. This paper introduces a manipulation test for the MRDD. First, it develops a theoretical model for causal inference with the MRDD, used to derive a testable implication on the conditional marginal densities of the running variables. Then, it constructs the test for the implication based on a quadratic form of a vector of statistics separately computed for each marginal density. Finally, the proposed test is compared with alternative procedures commonly employed in applied research. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.10836&r=ecm |
By: | Niko Hauzenberger; Massimiliano Marcellino; Michael Pfarrhofer; Anna Stelzer |
Abstract: | We propose and discuss Bayesian machine learning methods for mixed data sampling (MIDAS) regressions. This involves handling frequency mismatches with restricted and unrestricted MIDAS variants and specifying functional relationships between many predictors and the dependent variable. We use Gaussian processes (GP) and Bayesian additive regression trees (BART) as flexible extensions to linear penalized estimation. In a nowcasting and forecasting exercise we focus on quarterly US output growth and inflation in the GDP deflator. The new models leverage macroeconomic Big Data in a computationally efficient way and offer gains in predictive accuracy along several dimensions. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.10574&r=ecm |
By: | Brantly Callaway; Andrew Goodman-Bacon; Pedro H. C. Sant'Anna |
Abstract: | This paper considers methods for defining aggregate parameters of interest in a difference-in-differences design with a continuous and staggered treatment. It also discusses how aggregation choices often simplify estimation. |
JEL: | C01 C14 |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:32118&r=ecm |
By: | Gareth Liu-Evans; Garry DA Phillips |
Keywords: | LIML, Modified LIML, 2SLS, bias approximation, bias correction |
URL: | http://d.repec.org/n?u=RePEc:liv:livedp:202303&r=ecm |
By: | Lukas Leitner; |
Abstract: | The subjective well-being (SWB) method has become a popular tool to estimate the willingness to pay for non-market goods. In this method, the willingness to pay measure contains the ratio of two coefficients (of the nonmarket good and consumption), which are both estimated in a regression on subjective well-being. Computing confidence intervals for such ratios turns out to be error-prone, in particular when the consumption coefficient is imprecisely estimated. In this paper, five different ways of computing the confidence intervals are compared: the delta, Fieller, parametric bootstrapping, and bootstrapping method, and a numerical integration of Hinkley’s formula. Using a large number of simulated SWB data sets, confidence intervals and their coverage rates are computed for each method. The findings suggest that the delta method is accurate only if the consumption coefficient is estimated with very high precision. All other methods turn out to be more robust, with minor differences in accuracy. |
Date: | 2023–09 |
URL: | http://d.repec.org/n?u=RePEc:hdl:wpaper:2310&r=ecm |
By: | Nicolas Astier; Frank A. Wolak |
Abstract: | Econometric software packages typically report a fixed number of decimal digits for coefficient estimates and their associated standard errors. This practice misses the opportunity to use rounding rules that convey statistical precision. Using insights from the testing statistical hypotheses of equivalence literature, we propose a methodology that only reports decimal digits in a parameter estimate that reject a hypothesis of statistical equivalence. Applying this methodology to all articles published in the American Economic Review between 2000 and 2022, we find that over 60% of the printed digits in coefficient estimates do not convey statistically meaningful information according to our definition of a significant digit. If one additional digit beyond the last significant digit is reported for each coefficient estimate, then approximately one-third of the printed digits in our sample would not be reported. |
JEL: | C01 C10 C12 |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:32124&r=ecm |
By: | James D. Hamilton; Jin Xi |
Abstract: | This paper develops a procedure for uncovering the common cyclical factors that drive a mix of stationary and nonstationary variables. The method does not require knowing which variables are nonstationary or the nature of the nonstationarity. An application to the FRED-MD macroeconomic dataset demonstrates that the approach offers similar benefits to those of traditional principal component analysis with some added advantages. |
JEL: | C55 E30 |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:32068&r=ecm |
By: | Guillaume Coqueret |
Abstract: | We argue that spanning large numbers of degrees of freedom in empirical analysis allows better characterizations of effects and thus improves the trustworthiness of conclusions. Our ideas are illustrated in three studies: equity premium prediction, asset pricing anomalies and risk premia estimation. In the first, we find that each additional degree of freedom in the protocol expands the average range of $t$-statistics by at least 30%. In the second, we show that resorting to forking paths instead of bootstrapping in multiple testing raises the bar of significance for anomalies: at the 5% confidence level, the threshold for bootstrapped statistics is 4.5, whereas with paths, it is at least 8.2, a bar much higher than those currently used in the literature. In our third application, we reveal the importance of particular steps in the estimation of premia. In addition, we use paths to corroborate prior findings in the three topics. We document heterogeneity in our ability to replicate prior studies: some conclusions seem robust, others do not align with the paths we were able to generate. |
Date: | 2023–11 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2401.08606&r=ecm |