|
on Econometrics |
By: | Daniele Ballinari |
Abstract: | Machine learning methods, particularly the double machine learning (DML) estimator (Chernozhukov et al., 2018), are increasingly popular for the estimation of the average treatment effect (ATE). However, datasets often exhibit unbalanced treatment assignments where only a few observations are treated, leading to unstable propensity score estimations. We propose a simple extension of the DML estimator which undersamples data for propensity score modeling and calibrates scores to match the original distribution. The paper provides theoretical results showing that the estimator retains the DML estimator's asymptotic properties. A simulation study illustrates the finite sample performance of the estimator. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.01585&r=ecm |
By: | Alberto Abadie; Anish Agarwal; Raaz Dwivedi; Abhin Shah |
Abstract: | This article introduces a new framework for estimating average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the practical relevance of the formal properties of the estimators analyzed in this article. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.11652&r=ecm |
By: | Ruixuan Liu; Zhengfei Yu |
Abstract: | We consider a quasi-Bayesian method that combines a frequentist estimation in the first stage and a Bayesian estimation/inference approach in the second stage. The study is motivated by structural discrete choice models that use the control function methodology to correct for endogeneity bias. In this scenario, the first stage estimates the control function using some frequentist parametric or nonparametric approach. The structural equation in the second stage, associated with certain complicated likelihood functions, can be more conveniently dealt with using a Bayesian approach. This paper studies the asymptotic properties of the quasi-posterior distributions obtained from the second stage. We prove that the corresponding quasi-Bayesian credible set does not have the desired coverage in large samples. Nonetheless, the quasi-Bayesian point estimator remains consistent and is asymptotically equivalent to a frequentist two-stage estimator. We show that one can obtain valid inference by bootstrapping the quasi-posterior that takes into account the first-stage estimation uncertainty. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.17374&r=ecm |
By: | Sukjin Han; Adam McCloskey |
Abstract: | Interval identification of parameters such as average treatment effects, average partial effects and welfare is particularly common when using observational data and experimental data with imperfect compliance due to the endogeneity of individuals' treatment uptake. In this setting, a treatment or policy will typically become an object of interest to the researcher when it is either selected from the estimated set of best-performers or arises from a data-dependent selection rule. In this paper, we develop new inference tools for interval-identified parameters chosen via these forms of selection. We develop three types of confidence intervals for data-dependent and interval-identified parameters, discuss how they apply to several examples of interest and prove their uniform asymptotic validity under weak assumptions. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.00422&r=ecm |
By: | Tom Boot; Didier Nibbering |
Abstract: | In theory, two-stage least squares (TSLS) identifies a weighted average of covariate-specific local average treatment effects (LATEs) from a saturated specification without making parametric assumptions on how available covariates enter the model. In practice, TSLS is severely biased when saturation leads to a number of control dummies that is of the same order of magnitude as the sample size, and the use of many, arguably weak, instruments. This paper derives asymptotically valid tests and confidence intervals for an estimand that identifies the weighted average of LATEs targeted by saturated TSLS, even when the number of control dummies and instrument interactions is large. The proposed inference procedure is robust against four key features of saturated economic data: treatment effect heterogeneity, covariates with rich support, weak identification strength, and conditional heteroskedasticity. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.12607&r=ecm |
By: | Yuya Sasaki; Jing Tao; Yulong Wang |
Abstract: | Motivated by the empirical power law of the distributions of credits (e.g., the number of "likes") of viral posts in social media, we introduce the high-dimensional tail index regression and methods of estimation and inference for its parameters. We propose a regularized estimator, establish its consistency, and derive its convergence rate. To conduct inference, we propose to debias the regularized estimate, and establish the asymptotic normality of the debiased estimator. Simulation studies support our theory. These methods are applied to text analyses of viral posts in X (formerly Twitter) concerning LGBTQ+. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.01318&r=ecm |
By: | Sung Hoon Choi; Donggyu Kim |
Abstract: | In this paper, we introduce a novel method for predicting intraday instantaneous volatility based on Ito semimartingale models using high-frequency financial data. Several studies have highlighted stylized volatility time series features, such as interday auto-regressive dynamics and the intraday U-shaped pattern. To accommodate these volatility features, we propose an interday-by-intraday instantaneous volatility matrix process that can be decomposed into low-rank conditional expected instantaneous volatility and noise matrices. To predict the low-rank conditional expected instantaneous volatility matrix, we propose the Two-sIde Projected-PCA (TIP-PCA) procedure. We establish asymptotic properties of the proposed estimators and conduct a simulation study to assess the finite sample performance of the proposed prediction method. Finally, we apply the TIP-PCA method to an out-of-sample instantaneous volatility vector prediction study using high-frequency data from the S&P 500 index and 11 sector index funds. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.02591&r=ecm |
By: | Philipp Aschersleben; Julian Granna; Thomas Kneib; Stefan Lang; Nikolaus Umlauf; Winfried Steiner |
Abstract: | Gaussian Structured Additive Regression provides a flexible framework for additive decomposition of the expected value with nonlinear covariate effects and time trends, unit- or cluster-specific heterogeneity, spatial heterogeneity, and complex interactions between covariates of different types. Within this framework, we present a simultaneous estimation approach for highly complex multiplicative interaction effects. In particular, a possibly nonlinear function f(z) of a covariate z may be scaled by a multiplicative effect of the form exp(η), where η is another possibly structured additive predictor. Inference is fully Bayesian and based on highly efficient Markov Chain Monte Carlo (MCMC) algorithms. We investigate the statistical properties of our approach in extensive simulation experiments. Furthermore, we apply and illustrate the methodology to an analysis of asking prices for 200000 dwellings in Germany. |
Keywords: | IWLS proposals, MCMC, multiplicative interaction effects, structured additive predictor |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:inn:wpaper:2024-01&r=ecm |
By: | Giovanni Angelini; Luca Fanelli; Luca Neri |
Abstract: | When in proxy-SVARs the covariance matrix of VAR disturbances is subject to exogenous, permanent, nonrecurring breaks that generate target impulse response functions (IRFs) that change across volatility regimes, even strong, exogenous external instruments can result in inconsistent estimates of the dynamic causal effects of interest if the breaks are not properly accounted for. In such cases, it is essential to explicitly incorporate the shifts in unconditional volatility in order to point-identify the target structural shocks and possibly restore consistency. We demonstrate that, under a necessary and sufficient rank condition that leverages moments implied by changes in volatility, the target IRFs can be point-identified and consistently estimated. Importantly, standard asymptotic inference remains valid in this context despite (i) the covariance between the proxies and the instrumented structural shocks being local-to-zero, as in Staiger and Stock (1997), and (ii) the potential failure of instrument exogeneity. We introduce a novel identification strategy that appropriately combines external instruments with "informative" changes in volatility, thus obviating the need to assume proxy relevance and exogeneity in estimation. We illustrate the effectiveness of the suggested method by revisiting a fiscal proxy-SVAR previously estimated in the literature, complementing the fiscal instruments with information derived from the massive reduction in volatility observed in the transition from the Great Inflation to the Great Moderation regimes. |
JEL: | C32 C51 C52 E62 |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:bol:bodewp:wp1193&r=ecm |
By: | Yiyan Huang; Cheuk Hang Leung; Siyi Wang; Yijun Li; Qi Wu |
Abstract: | The growing demand for personalized decision-making has led to a surge of interest in estimating the Conditional Average Treatment Effect (CATE). The intersection of machine learning and causal inference has yielded various effective CATE estimators. However, deploying these estimators in practice is often hindered by the absence of counterfactual labels, making it challenging to select the desirable CATE estimator using conventional model selection procedures like cross-validation. Existing approaches for CATE estimator selection, such as plug-in and pseudo-outcome metrics, face two inherent challenges. Firstly, they are required to determine the metric form and the underlying machine learning models for fitting nuisance parameters or plug-in learners. Secondly, they lack a specific focus on selecting a robust estimator. To address these challenges, this paper introduces a novel approach, the Distributionally Robust Metric (DRM), for CATE estimator selection. The proposed DRM not only eliminates the need to fit additional models but also excels at selecting a robust CATE estimator. Experimental studies demonstrate the efficacy of the DRM method, showcasing its consistent effectiveness in identifying superior estimators while mitigating the risk of selecting inferior ones. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.18392&r=ecm |
By: | Santiago Pereda-Fern\'andez |
Abstract: | This paper addresses computational challenges in estimating Quantile Regression with Selection (QRS). The estimation of the parameters that model self-selection requires the estimation of the entire quantile process several times. Moreover, closed-form expressions of the asymptotic variance are too cumbersome, making the bootstrap more convenient to perform inference. Taking advantage of recent advancements in the estimation of quantile regression, along with some specific characteristics of the QRS estimation problem, I propose streamlined algorithms for the QRS estimator. These algorithms significantly reduce computation time through preprocessing techniques and quantile grid reduction for the estimation of the copula and slope parameters. I show the optimization enhancements with some simulations. Lastly, I show how preprocessing methods can improve the precision of the estimates without sacrificing computational efficiency. Hence, they constitute a practical solutions for estimators with non-differentiable and non-convex criterion functions such as those based on copulas. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.16693&r=ecm |
By: | Mohamed Doukali; Xiaojun Song; Abderrahim Taamouti |
Abstract: | We propose an optimization-based estimation of Value-at-Risk that corrects for the effect of measurement errors in prices. We show that measurement errors might pose serious problems for estimating risk measures like Value-at-Risk. In particular, when the stock prices are contaminated, the existing estimators of Value-at-Risk are inconsistent and might lead to an underestimation of risk, which might result in extreme leverage ratios within the held portfolios. Using Fourier transform and a deconvolution kernel estimator of the probability distribution function of true latent prices, we derive a robust estimator of Value-at-Risk in the presence of measurement errors. Monte Carlo simulations and a real data analysis illustrate satisfactory performance of the proposed method. |
Keywords: | Deconvolution kernel, Fourier transform, measurement errors, market microstructure noise, optimization, Value-at-Risk |
JEL: | G11 G19 C14 C61 C63 |
Date: | 2022–03 |
URL: | http://d.repec.org/n?u=RePEc:liv:livedp:202209&r=ecm |
By: | Sukjin Han; Hiroaki Kaido |
Abstract: | The control function approach allows the researcher to identify various causal effects of interest. While powerful, it requires a strong invertibility assumption, which limits its applicability. This paper expands the scope of the nonparametric control function approach by allowing the control function to be set-valued and derive sharp bounds on structural parameters. The proposed generalization accommodates a wide range of selection processes involving discrete endogenous variables, random coefficients, treatment selections with interference, and dynamic treatment selections. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.00347&r=ecm |
By: | Man Chon Iao; Yatheesan J. Selvakumar |
Abstract: | We propose an indirect inference strategy for estimating heterogeneous-agent business cycle models with micro data. At its heart is a first-order vector autoregression that is grounded in linear filtering theory as the cross-section grows large. The result is a fast, simple and robust algorithm for computing an approximate likelihood that can be easily paired with standard classical or Bayesian methods. Importantly, our method is compatible with the popular sequence-space solution method, unlike existing state-of-the-art approaches. We test-drive our method by estimating a canonical HANK model with shocks in both the aggregate and cross-section. Not only do simulation results demonstrate the appeal of our method, they also emphasize the important information contained in the entire micro-level distribution over and above simple moments. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.11379&r=ecm |
By: | Kettlewell, Nathan (University of Technology, Sydney); Walker, Matthew J. (Newcastle University); Yoo, Hong Il (Loughborough University) |
Abstract: | Discrete choice experiments (DCEs) often present concise choice scenarios that may appear incomplete to respondents. To allow respondents to express uncertainty arising from this incompleteness, DCEs may ask them to state probabilities with which they expect to make specific choices. The workhorse method for analyzing the elicited probabilities involves semi-parametric estimation of population average preferences. Despite flexible distributional assumptions, this method presents challenges in estimating unobserved preference heterogeneity, a key element in non-market valuation studies. We introduce a fractional response model based on a mixture of beta distributions. The model enables researchers to uncover preference heterogeneity under comparable parametric assumptions as adopted in conventional choice analysis, and can accommodate multiplicative forms of heterogeneity that make the semi-parametric method inconsistent. Using a DCE on alternative fuel vehicles, we illustrate the complementary roles of the parametric and semi-parametric approaches. We also undertake a separate analysis in which respondents are randomized to either a DCE employing a conventional choice elicitation format or a parallel DCE employing the probability elicitation format. |
Keywords: | discrete choice experiment, probability elicitation, mixed logit, beta regression; willingness to pay |
JEL: | C35 D12 D84 Q42 R41 |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:iza:izadps:dp16821&r=ecm |
By: | Grant Hillier; Kees Jan van Garderen; Noud van Giersbergen |
Abstract: | Testing for a mediation effect is important in many disciplines, but is made difficult - even asymptotically - by the influence of nuisance parameters. Classical tests such as likelihood ratio (LR) and Wald (Sobel) tests have very poor power properties in parts of the parameter space, and many attempts have been made to produce improved tests, with limited success. In this paper we show that augmenting the critical region of the LR test can produce a test with much improved behavior everywhere. In fact, we first show that there exists a test of this type that is (asymptotically) exact for certain test levels $\alpha $, including the common choices $\alpha =.01, .05, .10.$ The critical region of this exact test has some undesirable properties. We go on to show that there is a very simple class of augmented LR critical regions which provides tests that are nearly exact, and avoid the issues inherent in the exact test. We suggest an optimal and coherent member of this class, provide the table needed to implement the test and to report p-values if desired. Simulation confirms validity with non-Gaussian disturbances, under heteroskedasticity, and in a nonlinear (logit) model. A short application of the method to an entrepreneurial attitudes study is included for illustration. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.02144&r=ecm |
By: | Victor Chernozhukov; Christian Hansen; Nathan Kallus; Martin Spindler; Vasilis Syrgkanis |
Abstract: | An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.02467&r=ecm |
By: | Johannes Carow (Johannes Gutenberg University Mainz) |
Abstract: | Since the seminal work from Bertrand & Schoar (2003), the separate estimation of person effects and firm effects remains a widely used method in the analysis of firm-level dependent variables. Recently, this class of models has experienced serious methodological criticism, stating that person effects only reflect spurious variation. Rather than rejecting this estimation technique per se, I recommend a strategy based on simulation analysis to test for the presence of person effects. This strategy takes limitations of a previous test for idiosyncratic person effects into account. Further, I show that the estimation of person effects is subject to attenuation bias and that the size of this bias increases in the number of persons per firm-year. I also demonstrate that the use of Unconditional Quantile Regressions for estimated person effects can produce statistical artefacts at the upper and lower tail of the distribution. Additionally, attenuation bias impairs the analysis of the correlation of person effects pertaining to different dependent variables. |
Keywords: | Two-way fixed-effects, simulations, managers, spurious variation, attenuation bias |
JEL: | C15 C18 C21 L25 |
Date: | 2024–03–18 |
URL: | http://d.repec.org/n?u=RePEc:jgu:wpaper:2405&r=ecm |
By: | Sina Akbari; Negar Kiyavash |
Abstract: | The renowned difference-in-differences (DiD) estimator relies on the assumption of 'parallel trends, ' which does not hold in many practical applications. To address this issue, the econometrics literature has turned to the triple difference estimator. Both DiD and triple difference are limited to assessing average effects exclusively. An alternative avenue is offered by the changes-in-changes (CiC) estimator, which provides an estimate of the entire counterfactual distribution at the cost of relying on (stronger) distributional assumptions. In this work, we extend the triple difference estimator to accommodate the CiC framework, presenting the `triple changes estimator' and its identification assumptions, thereby expanding the scope of the CiC paradigm. Subsequently, we empirically evaluate the proposed framework and apply it to a study examining the impact of Medicaid expansion on children's preventive care. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.12583&r=ecm |
By: | Andrii Babii; Marine Carrasco; Idriss Tsafack |
Abstract: | We consider the functional linear regression model with a scalar response and a Hilbert space-valued predictor, a well-known ill-posed inverse problem. We propose a new formulation of the functional partial least-squares (PLS) estimator related to the conjugate gradient method. We shall show that the estimator achieves the (nearly) optimal convergence rate on a class of ellipsoids and we introduce an early stopping rule which adapts to the unknown degree of ill-posedness. Some theoretical and simulation comparison between the estimator and the principal component regression estimator is provided. |
Date: | 2024–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.11134&r=ecm |
By: | Yuchen Hu; Henry Zhu; Emma Brunskill; Stefan Wager |
Abstract: | Randomized controlled trials (RCTs) are often run in settings with many subpopulations that may have differential benefits from the treatment being evaluated. We consider the problem of sample selection, i.e., whom to enroll in an RCT, such as to optimize welfare in a heterogeneous population. We formalize this problem within the minimax-regret framework, and derive optimal sample-selection schemes under a variety of conditions. We also highlight how different objectives and decisions can lead to notably different guidance regarding optimal sample allocation through a synthetic experiment leveraging historical COVID-19 trial data. |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2403.01386&r=ecm |
By: | Richard Schnorrenberger; Aishameriane Schmidt; Guilherme Valle Moura |
Abstract: | We investigate the predictive ability of machine learning methods to produce weekly inflation nowcasts using high-frequency macro-financial indicators and a survey of professional forecasters. Within an unrestricted mixed-frequency ML framework, we provide clear guidelines to improve inflation nowcasts upon forecasts made by specialists. First, we find that variable selection performed via the LASSO is fundamental for crafting an effective ML model for inflation nowcasting. Second, we underscore the relevance of timely data on price indicators and SPF expectations to better discipline our model-based nowcasts, especially during the inflationary surge following the COVID-19 crisis. Third, we show that predictive accuracy substantially increases when the model specification is free of ragged edges and guided by the real-time data release of price indicators. Finally, incorporating the most recent high-frequency signal is already sufficient for real-time updates of the nowcast, eliminating the need to account for lagged high-frequency information. |
Keywords: | inflation nowcasting; machine learning; mixed-frequency data; survey of professional forecasters; |
JEL: | E31 E37 C53 C55 |
Date: | 2024–03 |
URL: | http://d.repec.org/n?u=RePEc:dnb:dnbwpp:806&r=ecm |
By: | Pengfei Zhao; Haoren Zhu; Wilfred Siu Hung NG; Dik Lun Lee |
Abstract: | Volatility, as a measure of uncertainty, plays a crucial role in numerous financial activities such as risk management. The Econometrics and Machine Learning communities have developed two distinct approaches for financial volatility forecasting: the stochastic approach and the neural network (NN) approach. Despite their individual strengths, these methodologies have conventionally evolved in separate research trajectories with little interaction between them. This study endeavors to bridge this gap by establishing an equivalence relationship between models of the GARCH family and their corresponding NN counterparts. With the equivalence relationship established, we introduce an innovative approach, named GARCH-NN, for constructing NN-based volatility models. It obtains the NN counterparts of GARCH models and integrates them as components into an established NN architecture, thereby seamlessly infusing volatility stylized facts (SFs) inherent in the GARCH models into the neural network. We develop the GARCH-LSTM model to showcase the power of the GARCH-NN approach. Experiment results validate that amalgamating the NN counterparts of the GARCH family models into established NN models leads to enhanced outcomes compared to employing the stochastic and NN models in isolation. |
Date: | 2024–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2402.06642&r=ecm |