|
on Econometrics |
By: | Alkis Kalavasis; Anay Mehrotra; Manolis Zampetakis |
Abstract: | Inverse propensity-score weighted (IPW) estimators are prevalent in causal inference for estimating average treatment effects in observational studies. Under unconfoundedness, given accurate propensity scores and $n$ samples, the size of confidence intervals of IPW estimators scales down with $n$, and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an $\varepsilon>0$ additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with $n$ can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given $\varepsilon$-inaccurate propensity scores and $n$ samples, its confidence interval size scales with $\varepsilon+1/\sqrt{n}$. In contrast, under the same assumptions, existing estimators' confidence interval sizes are $\Omega(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies. |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2410.01658 |
By: | Manuel Stapper |
Abstract: | Standard M-Estimation techniques are biased if an asymmetric distribution is assumed. This article proposes a novel approach that uses an adaptive asymmetric loss function to tackle the bias. Its consistency and asymptotic normality are proven. The robustness properties are assessed in a simulation study showing similar performance compared to existing approaches. Its versatility is demonstrated in three applications to time series data, an instrumental regression and a classification task. |
Keywords: | Robust Statistics, M-Estimation, Computational Statistics |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:cqe:wpaper:10924 |
By: | Timothy B. Armstrong; Patrick Kline; Liyang Sun |
Abstract: | Empirical research typically involves a robustness-efficiency tradeoff. A researcher seeking to estimate a scalar parameter can invoke strong assumptions to motivate a restricted estimator that is precise but may be heavily biased, or they can relax some of these assumptions to motivate a more robust, but variable, unrestricted estimator. When a bound on the bias of the restricted estimator is available, it is optimal to shrink the unrestricted estimator towards the restricted estimator. For settings where a bound on the bias of the restricted estimator is unknown, we propose adaptive estimators that minimize the percentage increase in worst case risk relative to an oracle that knows the bound. We show that adaptive estimators solve a weighted convex minimax problem and provide lookup tables facilitating their rapid computation. Revisiting some well known empirical studies where questions of model specification arise, we examine the advantages of adapting to—rather than testing for—misspecification. |
Date: | 2024–10–01 |
URL: | https://d.repec.org/n?u=RePEc:azt:cemmap:18/24 |
By: | Zhexiao Lin; Pablo Crespo |
Abstract: | Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies. Increasing the sensitivity of these experiments, particularly with a fixed sample size, relies on reducing the variance of the estimator for the average treatment effect (ATE). Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome. In contrast, in-experiment data is often more strongly correlated with the outcome and thus more informative. In this paper, we introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC, without introducing bias or additional computation complexity. We also establish asymptotic theory and provide consistent variance estimators for our method. Applying this method to multiple online experiments at Etsy, we reach substantial variance reduction over CUPAC with the inclusion of only a few in-experiment covariates. These results highlight the potential of our approach to significantly improve experiment sensitivity and accelerate decision-making. |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2410.09027 |
By: | Reese, Benjamin F. (Georgetown University) |
Abstract: | Regression discontinuity designs (RDD) and regression kink designs (RKD) are popular identification strategies across the social sciences. The relatively weak assumptions required for identifying a causal effect with RDD/RKDs make the designs attractive quasi-experimental methods for causal inference. One limitation of RDD/RKDs is that they rely on an exogenously given and previously known treatment rule. To overcome this limitation, economists (Porter & Yu 2015, Hansen 2017, Boehnke & Bonaldi 2019, and Tanu 2020) and computer scientists (Herlands et al. 2018) have begun to develop methods and tests that can find unknown discontinuities. However, these tests are either largely theoretically focused, computationally intensive, inconvenient to implement, or otherwise do not fit the specific needs of applied political scientists. Accordingly, this paper presents a novel, more flexible, and extremely easy to implement approach to estimate unknown cut-points in both RDDs and RKDs. It is the first method that can detect unknown thresholds in both RDDs and RKDs. I call this method the unknown cut-point regression discontinuity/kink design (UCRDD/UCRKD). It works by uniformly dividing an assignment variable into quantiles to create a distribution of “candidate” cut-points. Each candidate threshold is then tested in a RDD model (Imbens & Kalyanaraman 2012) to find the “best” cut-point, the cut-point that has the largest substantive effect and highest degree of statistical significance. Researchers can use UCRDD/UCRKD to find “tipping-points” in behavior; to determine obscured or non-public policy criteria; and as a diagnostic tool to determine if there are other significant thresholds in their traditional RDD/RKD. In the application section, I apply UCRKD to assess how to define the concept of a minority district and to estimate the treatment effect of minority districts on electoral support for minority candidates. |
Date: | 2024–10–09 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:63tns |
By: | Thomas R. Cook; Zach Modig; Nathan M. Palmer |
Abstract: | Machine learning and artificial intelligence are often described as “black boxes.” Traditional linear regression is interpreted through its marginal relationships as captured by regression coefficients. We show that the same marginal relationship can be described rigorously for any machine learning model by calculating the slope of the partial dependence functions, which we call the partial marginal effect (PME). We prove that the PME of OLS is analytically equivalent to the OLS regression coefficient. Bootstrapping provides standard errors and confidence intervals around the point estimates of the PMEs. We apply the PME to a hedonic house pricing example and demonstrate that the PMEs of neural networks, support vector machines, random forests, and gradient boosting models reveal the non-linear relationships discovered by the machine learning models and allow direct comparison between those models and a traditional linear regression. Finally we extend PME to a Shapley value decomposition and explore how it can be used to further explain model outputs. |
Keywords: | Machine learning; House prices; Statistical inference |
JEL: | C14 C18 C15 C45 C52 |
Date: | 2024–09–20 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:2024-75 |
By: | Robert Wojciechowski |
Abstract: | We identify the structural impulse responses of quantiles of the outcome variable to a shock. Our estimation strategy explicitly distinguishes treatment from control variables, allowing us to model responses of unconditional quantiles while using controls for identification. Disentangling the effect of adding control variables on identification versus interpretation brings our structural quantile impulse responses conceptually closer to structural mean impulse responses. Applying our methodology to study the impact of financial shocks on lower quantiles of output growth confirms that financial shocks have an outsized effect on growth-at-risk, but the magnitude of our estimates is more extreme than in previous studies. |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2410.04431 |
By: | Lucio Barabesi; Federico Crescenzi; Lorenzo Mori |
Abstract: | By assuming the design-based paradigm, an analysis of the Theil index and its estimation is carried out. First, by expressing the population Theil index as a statistical functional, we obtain its influence function and prove the corresponding properties. We also provide some new results on the influence function of the Gini index, which are suitable for a methodological comparison of the two inequality measures. Subsequently, on the basis of these findings, we introduce estimators of the Theil index and its variance. By means of a Monte Carlo study, we show that the variance estimator displays suitable performance in terms of bias and provides confidence intervals with adequate coverage. In addition, by considering such benchmarks, the suggested variance estimation outperforms the corresponding methods based on the nonparametric and parametric bootstrap. An application of our achievements is considered by using the data from the “Survey on Vulnerability to Poverty” held in 2021 in Tuscany (Italy) with the goal to map the socio-economic conditions and inequalities of households and individuals after the Covid-19 pandemic. |
Keywords: | design-based; inequality measure; influence function; variance estimation |
JEL: | C13 C15 E21 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:usi:wpaper:915 |
By: | Xiaolin Sun; Xueyan Zhao; D. S. Poskitt |
Abstract: | This paper addresses the sample selection model within the context of the gender gap problem, where even random treatment assignment is affected by selection bias. By offering a robust alternative free from distributional or specification assumptions, we bound the treatment effect under the sample selection model with an exclusion restriction, an assumption whose validity is tested in the literature. This exclusion restriction allows for further segmentation of the population into distinct types based on observed and unobserved characteristics. For each type, we derive the proportions and bound the gender gap accordingly. Notably, trends in type proportions and gender gap bounds reveal an increasing proportion of always-working individuals over time, alongside variations in bounds, including a general decline across time and consistently higher bounds for those in high-potential wage groups. Further analysis, considering additional assumptions, highlights persistent gender gaps for some types, while other types exhibit differing or inconclusive trends. This underscores the necessity of separating individuals by type to understand the heterogeneous nature of the gender gap. |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2410.01159 |
By: | Ignace De Vos; Gerdie Everaert (-) |
Abstract: | Local projections (LPs) are often regarded as more robust to model misspecification than impulse responses (IRs) derived from forward-iterated dynamic model estimates, as LPs impose fewer restrictions on the underlying dynamics. However, because forecast errors accumulate in the LP errors over the projection horizon, this robustness comes at the price of an increase in variance. To address this, several Generalized Least Squares (GLS) estimators have been proposed to reduce error accumulation and enhance efficiency. We demonstrate, however, that the implied conditioning on dynamic model (horizon-one LP) residuals imposes strong restrictions on the underlying data generating process, undermining the very robustness to misspecification that LPs are valued for. In fact, we show that these GLS LP estimators tend to align more closely with forward-iterated IRs from potentially misspecified models, than with OLS-estimated LPs. Furthermore, we find that conditioning on previous horizon LP residuals fails to deliver efficiency improvements over OLS-estimated LPs. |
Keywords: | Impulse response functions, local projections, dynamic models, generalized least squares, efficiency, robustness |
JEL: | C22 C13 C53 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:rug:rugwps:24/1095 |
By: | Anna Kiriliouk; Chen Zhou |
Abstract: | This book chapter illustrates how to apply extreme value statistics to financial time series data. Such data often exhibits strong serial dependence, which complicates assessment of tail risks. We discuss the two main approches to tail risk estimation, unconditional and conditional quantile forecasting. We use the S&P 500 index as a case study to assess serial (extremal) dependence, perform an unconditional and conditional risk analysis, and apply backtesting methods. Additionally, the chapter explores the impact of serial dependence on multivariate tail dependence. |
Date: | 2024–09 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2409.18643 |
By: | Kiss, Tamás (Örebro University School of Business); Mazur, Stepan (Örebro University School of Business); Nguyen, Hoang (Linköping University); Österholm, Pär (Örebro University School of Business) |
Abstract: | In this paper, we extend the standard Gaussian stochastic-volatility Bayesian VAR by employing the generalized hyperbolic skew Student’s t distribution for the innovations. Allowing the skewness parameter to vary over time, our specification permits flexible modelling of innovations in terms of both fat tails and – potentially dynamic – asymmetry. In an empirical application using US data on industrial production, consumer prices and economic policy uncertainty, we find support – although to a moderate extent – for time-varying skewness. In addition, we find that shocks to economic policy uncertainty have a negative effect on both industrial production growth and CPI inflation. |
Keywords: | Bayesian VAR; Generalized hyperbolic skew Students’s t distribution; Stochastic volatility; Economic policy uncertainty |
JEL: | C11 C32 C52 E44 E47 G17 |
Date: | 2024–10–09 |
URL: | https://d.repec.org/n?u=RePEc:hhs:oruesi:2024_008 |
By: | Aknouche, Abdelhakim |
Abstract: | state spaces with periodically time-varying transition probabilities is introduced. The finite-dimensional probability distributions of these time-periodic chains are first studied and their correspondence with the marginal distributions and transition probabilities is shown. Then, the concepts of periodic stability/regularity and limiting behaviors are proposed. The communicability and classification of states necessary for establishing periodic stability are then examined. In particular, periodic irreducibility and the main solidarity/class properties are presented, namely periodic recurrence, periodic positive recurrence, periodic transience, and periodic aperiodicity. Furthermore, sufficient conditions for periodic stochastic stability of time-periodic Markov chains are derived. Finally, various applications to some operations research models and time series analysis are considered. In particular, periodic Markov decision processes, periodic integer-valued time series models, and periodic Markov-switching time series models are examined. |
Keywords: | Time-periodic Markov chains, Harris periodic ergodicity, periodic irreducibility, periodic recurrence, periodic stability, periodic invariant distributions, periodic integer-valued time series models, Markov-switching periodic models, periodic Markov decision process. |
JEL: | C01 C02 C30 |
Date: | 2024–10–04 |
URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:122287 |
By: | Elisa Fusco (Dipartimento di Statistica, Informatica, Applicazioni "G. Parenti", Università di Firenze); Giuseppe Arbia (Dipartimento di Scienze Statistiche, Università Cattolica del Sacro Cuore, Roma); Francesco Vidoli (Dipartimento di Economia, Società , Politica (DESP), Università degli Studi di Urbino Carlo Bo); Vincenzo Nardelli (Università Cattolica del Sacro Cuore, Roma) |
Abstract: | In the literature on stochastic frontier models until the early 2000s, the joint consideration of spatial and temporal dimensions was often inadequately addressed, if not completely neglected. However, from an evolutionary economics perspective, the production process of the decision-making units constantly changes over both dimensions: it is not stable over time due to managerial enhancements and/or internal or external shocks, and is influenced by the nearest territorial neighbours. This paper proposes an extension of the Fusco and Vidoli (2013) SEM-like approach, which globally accounts for spatial and temporal effects in the term of inefficiency. In particular, coherently with the stochastic panel frontier literature, two different versions of the model are proposed: the time-invariant and the time-varying spatial stochastic frontier models. In order to evaluate the inferential properties of the proposed es- timators, we first run Monte Carlo experiments and then present the results of an application to a set of commonly referenced data, demonstrating robustness and stability of estimates across all scenarios. |
Keywords: | Stochastic frontier analysis, Spatio-temporal effects, Productive efficiency |
JEL: | C21 D24 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:fir:econom:wp2024_09 |
By: | Smida, Zaineb; Laurent, Thibault; Cucala, Lionel |
Abstract: | A scan method for functional data indexed in space has been developed. The scan statistic is derived from the Hotelling test statistic for functional data, extending the univariate and multivariate Gaussian spatial scan statistics. This method consistently outperforms existing techniques in detecting and locating spatial clusters, as demonstrated through simulations. It has been applied to two types of real data: economic data in order to identify spatial clusters of abnormal unemployment rates in Spain and climatic data in order to detect unusual climate change patterns in Great Britain, Nigeria, Pakistan, and Venezuela. |
Keywords: | Cluster detection, Functional data, Hotelling T2 test, Spatial Scan statistic. |
JEL: | C12 C21 E24 Q54 |
Date: | 2024–10 |
URL: | https://d.repec.org/n?u=RePEc:tse:wpaper:129819 |
By: | Blazsek, Szabolcs; Escribano, Álvaro; Licht, Adrian |
Abstract: | We introduce the fractionally integrated quasi-vector autoregressive (FI-QVAR) model. We apply FI-QVAR to climate data and introduce the fractionally integrated scoredriven ice-age model. We use global sea ice volume, atmospheric carbon dioxide (CO2) concentration, and Antarctic land surface temperature data from 798, 000 to 1, 000 years ago. We control for the eccentricity of the Earth’s orbit, the obliquity of Earth, and the precession of the equinoxes (i.e. Milankovitch cycles). We estimate FI-QVAR using the maximum likelihood (ML) method for fractional integration parameters ∈ (−1/2, 1/2). The statistical and forecasting performances of FI-QVAR are superior to QVAR and VAR. The impulse response functions (IRF) for FI-QVAR capture better dynamic effects of the shocks than QVAR and VAR. We confirm, with more confidence than previous works for these data, that for the last 12, 000-15, 000 years when humanity influenced the Earth’s climate (i.e. Anthropocene), the global sea ice volume forecasts are above the observed sea ice volume, the atmospheric CO2 concentration forecasts are below the observed atmospheric CO2 concentration, and the Antarctic land surface temperature forecasts are below the observed Antarctic land surface temperature, after controlling for natural forces of climate change due to orbital variables. |
Keywords: | Climate change; Fractional integration; Maximum likelihood (ML); Dynamic conditional score (DCS); Generalized autoregressive score (GAS) |
JEL: | C32 C51 C52 Q54 |
Date: | 2024–10–23 |
URL: | https://d.repec.org/n?u=RePEc:cte:werepe:44712 |