
on Econometrics 
By:  Kaicheng Chen; Timothy J. Vogelsang 
Abstract:  This paper studies a variance estimator proposed by Chiang, Hansen and Sasaki (2022) that is robust to twoway clustering dependence with correlated common time effects in panels. First, we show algebraically that this variance estimator (CHS estimator, hereafter) is a linear combination of three common variance estimators: the cluster estimator by Arellano (1987), the "HAC of averages" estimator by Driscoll and Kraay (1998), and the "average of HACs" estimator (Newey and West (1987) and Vogelsang (2012)). Based on this finding, we obtain a fixedb asymptotic result for the CHS estimator and corresponding test statistics as the crosssection and time sample sizes jointly go to infinity. As the ratio of the bandwidth to the time sample size goes to zero, the fixedb asymptotic results match the asymptotic normality result in Chiang et al. (2022). Furthermore, we propose two alternative biascorrected variance estimators and derive fixedb asymptotics limits. While the test statistics are not asymptotically pivotal using any of the three variance estimators under fixedb asymptotics, we propose a plugin method to simulate the fixedb asymptotic critical values. In a finite sample simulation study, we compare the finite sample performance of confidence intervals based on the original CHS variance estimator and the two biascorrected versions, each implemented with standard normal critical values and simulated fixedb critical values. We find that the two biascorrected variance estimators along with fixedb critical values provide improvements in finite sample coverage probabilities. We illustrate the impact of biascorrection and use of the fixedb critical values on inference in an empirical example from Thompson (2011) on the relationship between industry profitability and market concentration. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.08707&r=ecm 
By:  Cavit Pakel; Martin Weidner 
Abstract:  Average effects in discrete choice panel data models with individualspecific fixed effects are generally only partially identified in short panels. While consistent estimation of the identified set is possible, it generally requires very large sample sizes, especially when the number of support points of the observed covariates is large, such as when the covariates are continuous. In this paper, we propose estimating outer bounds on the identified set of average effects. Our bounds are easy to construct, converge at the parametric rate, and are computationally simple to obtain even in moderately large samples, independent of whether the covariates are discrete or continuous. We also provide asymptotically valid confidence intervals on the identified set. Simulation studies confirm that our approach works well and is informative in finite samples. We also consider an application to labor force participation. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.09299&r=ecm 
By:  Wei Zhao (University of Missouri) 
Abstract:  Instead of having a â€œyesâ€ or â€œnoâ€ result from a test of the global null hypothesis that a function is increasing, I propose a multiple testing procedure to test at multiple points. If the global null is rejected, then this multiple testing provides more information about why. If the global null is not rejected, then multiple testing can provide stronger evidence in favor of increasingness, by rejecting the null hypotheses that the function is decreasing. With highlevel assumptions that apply to a wide array of models, this approach can be used to test for monotonicity of a function in a broad class of structural and descriptive econometric models. By inverting the proposed multiple testing procedure that controls the familywise error rate, I also equivalently generate â€œinnerâ€ and â€œouterâ€ confidence sets for the set of points at which the function is increasing. With high asymptotic probability, the inner confidence set is contained within the true set, whereas the outer confidence set contains the true set. I also improve power with stepdown and twostage procedures. Simulated and empirical examples (incomeâ€“education conditional mean, and IV Engel curve) illustrate the methodology. 
Keywords:  multiple testing procedure, familywise error rate, inner confidence set, outer confidence set 
JEL:  C25 
Date:  2023–10 
URL:  http://d.repec.org/n?u=RePEc:umc:wpaper:2311&r=ecm 
By:  Federico Zincenko 
Abstract:  Considering a continuous random variable Y together with a continuous random vector X, I propose a nonparametric estimator f^(.x) for the conditional density of Y given X=x. This estimator takes the form of an exponential series whose coefficients T = (T1, ..., TJ) are the solution of a system of nonlinear equations that depends on an estimator of the conditional expectation E[p(Y)X=x], where p(.) is a Jdimensional vector of basis functions. A key feature is that E[p(Y)X=x] is estimated by generalized random forest (Athey, Tibshirani, and Wager, 2019), targeting the heterogeneity of T across x. I show that f^(.x) is uniformly consistent and asymptotically normal, while allowing J to grow to infinity. I also provide a standard error formula to construct asymptotically valid confidence intervals. Results from Monte Carlo experiments and an empirical illustration are provided. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.13251&r=ecm 
By:  Christis Katsouris 
Abstract:  This paper develops unified asymptotic distribution theory for dynamic quantile predictive regressions which is useful when examining quantile predictability in stock returns under possible presence of nonstationarity. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.14160&r=ecm 
By:  Joseph Cummins (University of California, Riverside); Brock Smith (Montana State University); Douglas L. Miller (Cornell University); David Eliot Simon (University of Connecticut) 
Abstract:  We investigate the properties of a systematic bias that arises in the synthetic control estimator in panel data settings with finite pretreatment periods, offering intuition and guidance to practitioners. The bias comes from matching to idiosyncratic error terms (noise) in the treated unit and the donor units’ pretreatment outcome values. This in turn leads to a biased counterfactual for the posttreatment periods. We use Monte Carlo simulations to evaluate the determinants of the bias in terms of error term variance, sample characteristics and DGP complexity, providing guidance as to which situations are likely to yield more bias. We also offer a procedure to reduce the bias using a direct computational biascorrection procedure based on resampling from a pilot model that can reduce the bias in empirically feasible implementations. As a final potential solution, we compare the performance of our corrections to that of an Interactive Fixed Effects model. An empirical application focused on trade liberalization indicates that the magnitude of the bias may be economically meaningful in a real world setting. 
Keywords:  Synthetic Control, Overfitting 
JEL:  C23 C52 
Date:  2023–10 
URL:  http://d.repec.org/n?u=RePEc:uct:uconnp:202307&r=ecm 
By:  Riccardo Di Francesco 
Abstract:  Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as selfevaluations of subjective wellbeing and selfassessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This paper introduces a novel estimator, the ordered correlation forest, that can naturally handle nonlinearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an "honesty" condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates' marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forestbased estimators and demonstrates its ability to construct valid confidence intervals for the covariates' marginal effects. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.08755&r=ecm 
By:  Archer Gong Zhang; Jiahua Chen 
Abstract:  In many statistical and econometric applications, we gather individual samples from various interconnected populations that undeniably exhibit common latent structures. Utilizing a model that incorporates these latent structures for such data enhances the efficiency of inferences. Recently, many researchers have been adopting the semiparametric density ratio model (DRM) to address the presence of latent structures. The DRM enables estimation of each population distribution using pooled data, resulting in statistically more efficient estimations in contrast to nonparametric methods that analyze each sample in isolation. In this article, we investigate the limit of the efficiency improvement attainable through the DRM. We focus on situations where one population's sample size significantly exceeds those of the other populations. In such scenarios, we demonstrate that the DRMbased inferences for populations with smaller sample sizes achieve the highest attainable asymptotic efficiency as if a parametric model is assumed. The estimands we consider include the model parameters, distribution functions, and quantiles. We use simulation experiments to support the theoretical findings with a specific focus on quantile estimation. Additionally, we provide an analysis of real revenue data from U.S. collegiate sports to illustrate the efficacy of our contribution. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.09103&r=ecm 
By:  TaeHwy Lee (Department of Economics, University of California Riverside); Ekaterina Seregina (Colby College); Yaojue Xu (Colby College) 
Abstract:  In this paper, we construct a class of strictly consistent scoring functions based on the Bregman divergence measure, which jointly elicit the mean and variance. We use the scoring functions to develop a novel outofsample forecast encompassing test in volatility predictive models. We show the encompassing test is asymptotically normal. Simulation results demonstrate the merits of the proposed Bregman scoring functions and the forecast encompassing test. The forecast encompassing test exhibits a proper size and good power in finite samples. In an empirical application, we investigate the predictive ability of macroeconomic and financial variables in forecasting the equity premium volatility. 
Keywords:  strictly consistent scoring function, elicitability, Bregman divergence, Grangercausality, encompassing, model averaging, equity premium. 
JEL:  C53 E37 E27 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:ucr:wpaper:202311&r=ecm 
By:  Alberto Quaini (Erasmus University Rotterdam); Fabio Trojani (University of Geneva; University of Turin; and Swiss Finance Institute); Ming Yuan (Columbia University) 
Abstract:  Tradable factor risk premia are defined by the negative factor covariance with the Stochastic Discount Factor projection on returns. They are robust to misspecification or weak identification in asset pricing models, and they are zero for any factor weakly correlated with returns. We propose a simple estimator of tradable factor risk premia that enjoys the Oracle Property, i.e., it performs as well as if the weak or useless factors were known. This estimator not only consistently removes such factors, but it also gives rise to reliable tests of asset pricing models. We study empirically a family of asset pricing models from the factor zoo and detect a robust subset of economically relevant and wellidentified models, which are built out of factors with a nonzero tradable risk premium. Wellidentified models feature a relatively low factor space dimension and some degree of misspecification, which harms the interpretation of other established notions of a factor risk premium in the literature. 
Keywords:  Testing of asset pricing models, factor risk premia, useless and weak factors, factor selection, model misspecification, Oracle estimation and inference 
JEL:  G12 C12 C13 C51 C52 C58 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:chf:rpseri:rp2381&r=ecm 
By:  Harrison H. Li; Art B. Owen 
Abstract:  We consider an experiment with at least two stages or batches and $O(N)$ subjects per batch. First, we propose a semiparametric treatment effect estimator that efficiently pools information across the batches, and show it asymptotically dominates alternatives that aggregate single batch estimates. Then, we consider the design problem of learning propensity scores for assigning treatment in the later batches of the experiment to maximize the asymptotic precision of this estimator. For two common causal estimands, we estimate this precision using observations from previous batches, and then solve a finitedimensional concave maximization problem to adaptively learn flexible propensity scores that converge to suitably defined optima in each batch at rate $O_p(N^{1/4})$. By extending the framework of double machine learning, we show this rate suffices for our pooled estimator to attain the targeted precision after each batch, as long as nuisance function estimates converge at rate $o_p(N^{1/4})$. These relatively weak rate requirements enable the investigator to avoid the common practice of discretizing the covariate space for design and estimation in batch adaptive experiments while maintaining the advantages of pooling. Our numerical study shows that such discretization often leads to substantial asymptotic and finite sample precision losses outweighing any gains from design. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.15297&r=ecm 
By:  Florian Gunsilius; David Van Dijcke 
Abstract:  Thresholds in treatment assignments can produce discontinuities in outcomes, revealing causal insights. In many contexts, like geographic settings, these thresholds are unknown and multivariate. We propose a nonparametric method to estimate the resulting discontinuities by segmenting the regression surface into smooth and discontinuous parts. This estimator uses a convex relaxation of the MumfordShah functional, for which we establish identification and convergence. Using our method, we estimate that an internet shutdown in India resulted in a reduction of economic activity by over 50%, greatly surpassing previous estimates and shedding new light on the true cost of such shutdowns for digital economies globally. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.14630&r=ecm 
By:  Albert Chiu; Xingchen Lan; Ziyi Liu; Yiqing Xu 
Abstract:  Twoway fixed effects (TWFE) models are ubiquitous in causal panel analysis in political science. However, recent methodological discussions challenge their validity in the presence of heterogeneous treatment effects (HTE) and violations of the parallel trends assumption (PTA). This burgeoning literature has introduced multiple estimators and diagnostics, leading to confusion among empirical researchers on two fronts: the reliability of existing results based on TWFE models and the current best practices. To address these concerns, we examined, replicated, and reanalyzed 37 articles from three leading political science journals that employed observational panel data with binary treatments. Using six newly introduced HTErobust estimators, we find that although precision may be affected, the core conclusions derived from TWFE estimates largely remain unchanged. PTA violations and insufficient statistical power, however, continue to be significant obstacles to credible inferences. Based on these findings, we offer recommendations for improving practice in empirical research. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.15983&r=ecm 
By:  YeonKoo Che; Dong Woo Hahm; YingHua He 
Abstract:  Inferring applicant preferences is fundamental in many analyses of schoolchoice data. Application mistakes make this task challenging. We propose a novel approach to deal with the mistakes in a deferredacceptance matching environment. The key insight is that the uncertainties faced by applicants, e.g., due to tiebreaking lotteries, render some mistakes costly, allowing us to reliably infer relevant preferences. Our approach extracts all information on preferences robustly to payoffinsignificant mistakes. We apply it to schoolchoice data from Staten Island, NYC. Counterfactual analysis suggests that we underestimate the effects of proposed desegregation reforms when applicants' mistakes are not accounted for in preference inference and estimation. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.14297&r=ecm 
By:  Yichi Zhang; Mihai Cucuringu; Alexander Y. Shestopaloff; Stefan Zohren 
Abstract:  In multivariate time series systems, leadlag relationships reveal dependencies between time series when they are shifted in time relative to each other. Uncovering such relationships is valuable in downstream tasks, such as control, forecasting, and clustering. By understanding the temporal dependencies between different time series, one can better comprehend the complex interactions and patterns within the system. We develop a clusterdriven methodology based on dynamic time warping for robust detection of leadlag relationships in lagged multifactor models. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our algorithm is able to robustly detect leadlag relationships in financial markets, which can be subsequently leveraged in trading strategies with significant economic benefits. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.08800&r=ecm 
By:  Karin Klieber 
Abstract:  This paper introduces nonlinear dimension reduction in factoraugmented vector autoregressions to analyze the effects of different economic shocks. I argue that controlling for nonlinearities between a largedimensional dataset and the latent factors is particularly useful during turbulent times of the business cycle. In simulations, I show that nonlinear dimension reduction techniques yield good forecasting performance, especially when data is highly volatile. In an empirical application, I identify a monetary policy as well as an uncertainty shock excluding and including observations of the COVID19 pandemic. Those two applications suggest that the nonlinear FAVAR approaches are capable of dealing with the large outliers caused by the COVID19 pandemic and yield reliable results in both scenarios. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.04821&r=ecm 
By:  Dylan BallaElliott 
Abstract:  Information provision experiments are an increasingly popular tool to identify how beliefs causally affect decisionmaking and behavior. In a simple Bayesian model of belief formation via costly information acquisition, people form precise beliefs when these beliefs are important for their decisionmaking. The precision of prior beliefs controls how much their beliefs shift when they are shown new information (i.e., the strength of the first stage). Since twostage least squares (TSLS) targets a weighted average with weights proportional to the strength of the first stage, TSLS will overweight individuals with smaller causal effects and underweight those with larger effects, thus understating the average partial effect of beliefs on behavior. In experimental designs where all participants are exposed to new information, Bayesian updating implies that a control function can be used to identify the (unweighted) average partial effect. I apply this estimator to a recent study of the effects of beliefs about the gender wage gap on support for public policies (Settele, 2022) and find the average partial effect is 40% larger than the comparable TSLS estimate. This difference can be explained by the fact that the effects of beliefs are close to zero for people who update their beliefs the most and receive the most weight in TSLS specifications. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.11387&r=ecm 
By:  Dangxing Chen 
Abstract:  In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to blackbox models without any domain knowledge. By incorporating domain knowledge, scienceinformed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to scienceinformed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.13246&r=ecm 
By:  Jinglong Zhao 
Abstract:  In experimental design, Neyman allocation refers to the practice of allocating subjects into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multistage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multistage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the informationtheoretic limit of conducting experiments. Using online A/B testing data from a social media site, we demonstrate the effectiveness of our adaptive Neyman allocation algorithm, highlighting its practicality even when applied with only a limited number of stages. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.08808&r=ecm 
By:  Malte Jahn 
Abstract:  A nonlinear regression framework is proposed for time series and panel data for the situation where certain explanatory variables are available at a higher temporal resolution than the dependent variable. The main idea is to use the moments of the empirical distribution of these variables to construct regressors with the correct resolution. As the moments are likely to display nonlinear marginal and interaction effects, an artificial neural network regression function is proposed. The corresponding model operates within the traditional stochastic nonlinear least squares framework. In particular, a numerical Hessian is employed to calculate confidence intervals. The practical usefulness is demonstrated by analyzing the influence of daily temperatures in 260 European NUTS2 regions on the yearly growth of gross value added in these regions in the time period 2000 to 2021. In the particular example, the model allows for an appropriate assessment of regional economic impacts resulting from (future) changes in the regional temperature distribution (mean AND variance). 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.10481&r=ecm 
By:  Anirban Basu 
Abstract:  This chapter reviews the econometric approaches typically used to deal with the spike of zeros when modeling nonnegative outcomes such as expenditures, income, or consumption. 
JEL:  C10 D0 I0 
Date:  2023–08 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:31632&r=ecm 
By:  Wei Jie Yeo; Wihan van der Heever; Rui Mao; Erik Cambria; Ranjan Satapathy; Gianmarco Mengaldo 
Abstract:  The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decisionmaking transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important. 
Date:  2023–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2309.11960&r=ecm 