nep-ecm New Economics Papers
on Econometrics
Issue of 2023‒10‒23
twenty-two papers chosen by
Sune Karlsson, Örebro universitet


  1. Fixed-b Asymptotics for Panel Models with Two-Way Clustering By Kaicheng Chen; Timothy J. Vogelsang
  2. Bounds on Average Effects in Discrete Choice Panel Data Models By Cavit Pakel; Martin Weidner
  3. Multiple Testing of a Function’s Monotonicity By Wei Zhao
  4. Nonparametric estimation of conditional densities by generalized random forests By Federico Zincenko
  5. Unified Inference for Dynamic Quantile Predictive Regression By Christis Katsouris
  6. Matching on Noise: Finite Sample Bias in the Synthetic Control Estimator By Joseph Cummins; Brock Smith; Douglas L. Miller; David Eliot Simon
  7. Ordered Correlation Forest By Riccardo Di Francesco
  8. Optimal Estimation under a Semiparametric Density Ratio Model By Archer Gong Zhang; Jiahua Chen
  9. Elicitability and Encompassing for Volatility Forecasts by Bregman Functions By Tae-Hwy Lee; Ekaterina Seregina; Yaojue Xu
  10. Tradable Factor Risk Premia and Oracle Tests of Asset Pricing Models By Alberto Quaini; Fabio Trojani; Ming Yuan
  11. Double machine learning and design in batch adaptive experiments By Harrison H. Li; Art B. Owen
  12. Free Discontinuity Design: With an Application to the Economic Effects of Internet Shutdowns By Florian Gunsilius; David Van Dijcke
  13. What To Do (and Not to Do) with Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study By Albert Chiu; Xingchen Lan; Ziyi Liu; Yiqing Xu
  14. Leveraging Uncertainties to Infer Preferences: Robust Analysis of School Choice By Yeon-Koo Che; Dong Woo Hahm; YingHua He
  15. Dynamic Time Warping for Lead-Lag Relationships in Lagged Multi-Factor Models By Yichi Zhang; Mihai Cucuringu; Alexander Y. Shestopaloff; Stefan Zohren
  16. Non-linear dimension reduction in factor-augmented vector autoregressions By Karin Klieber
  17. Identifying Causal Effects in Information Provision Experiments By Dylan Balla-Elliott
  18. Can I Trust the Explanations? Investigating Explainable Machine Learning Methods for Monotonic Models By Dangxing Chen
  19. Adaptive Neyman Allocation By Jinglong Zhao
  20. Regressing on distributions: The nonlinear effect of temperature on regional economic growth By Malte Jahn
  21. Hidden Figures: Uncovering Quantities Behind Zeros with Econometrics By Anirban Basu
  22. A Comprehensive Review on Financial Explainable AI By Wei Jie Yeo; Wihan van der Heever; Rui Mao; Erik Cambria; Ranjan Satapathy; Gianmarco Mengaldo

  1. By: Kaicheng Chen; Timothy J. Vogelsang
    Abstract: This paper studies a variance estimator proposed by Chiang, Hansen and Sasaki (2022) that is robust to two-way clustering dependence with correlated common time effects in panels. First, we show algebraically that this variance estimator (CHS estimator, hereafter) is a linear combination of three common variance estimators: the cluster estimator by Arellano (1987), the "HAC of averages" estimator by Driscoll and Kraay (1998), and the "average of HACs" estimator (Newey and West (1987) and Vogelsang (2012)). Based on this finding, we obtain a fixed-b asymptotic result for the CHS estimator and corresponding test statistics as the cross-section and time sample sizes jointly go to infinity. As the ratio of the bandwidth to the time sample size goes to zero, the fixed-b asymptotic results match the asymptotic normality result in Chiang et al. (2022). Furthermore, we propose two alternative bias-corrected variance estimators and derive fixed-b asymptotics limits. While the test statistics are not asymptotically pivotal using any of the three variance estimators under fixed-b asymptotics, we propose a plug-in method to simulate the fixed-b asymptotic critical values. In a finite sample simulation study, we compare the finite sample performance of confidence intervals based on the original CHS variance estimator and the two bias-corrected versions, each implemented with standard normal critical values and simulated fixed-b critical values. We find that the two bias-corrected variance estimators along with fixed-b critical values provide improvements in finite sample coverage probabilities. We illustrate the impact of bias-correction and use of the fixed-b critical values on inference in an empirical example from Thompson (2011) on the relationship between industry profitability and market concentration.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.08707&r=ecm
  2. By: Cavit Pakel; Martin Weidner
    Abstract: Average effects in discrete choice panel data models with individual-specific fixed effects are generally only partially identified in short panels. While consistent estimation of the identified set is possible, it generally requires very large sample sizes, especially when the number of support points of the observed covariates is large, such as when the covariates are continuous. In this paper, we propose estimating outer bounds on the identified set of average effects. Our bounds are easy to construct, converge at the parametric rate, and are computationally simple to obtain even in moderately large samples, independent of whether the covariates are discrete or continuous. We also provide asymptotically valid confidence intervals on the identified set. Simulation studies confirm that our approach works well and is informative in finite samples. We also consider an application to labor force participation.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.09299&r=ecm
  3. By: Wei Zhao (University of Missouri)
    Abstract: Instead of having a “yes†or “no†result from a test of the global null hypothesis that a function is increasing, I propose a multiple testing procedure to test at multiple points. If the global null is rejected, then this multiple testing provides more information about why. If the global null is not rejected, then multiple testing can provide stronger evidence in favor of increasingness, by rejecting the null hypotheses that the function is decreasing. With high-level assumptions that apply to a wide array of models, this approach can be used to test for monotonicity of a function in a broad class of structural and descriptive econometric models. By inverting the proposed multiple testing procedure that controls the familywise error rate, I also equivalently generate “inner†and “outer†confidence sets for the set of points at which the function is increasing. With high asymptotic probability, the inner confidence set is contained within the true set, whereas the outer confidence set contains the true set. I also improve power with stepdown and two-stage procedures. Simulated and empirical examples (income–education conditional mean, and IV Engel curve) illustrate the methodology.
    Keywords: multiple testing procedure, familywise error rate, inner confidence set, outer confidence set
    JEL: C25
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:umc:wpaper:2311&r=ecm
  4. By: Federico Zincenko
    Abstract: Considering a continuous random variable Y together with a continuous random vector X, I propose a nonparametric estimator f^(.|x) for the conditional density of Y given X=x. This estimator takes the form of an exponential series whose coefficients T = (T1, ..., TJ) are the solution of a system of nonlinear equations that depends on an estimator of the conditional expectation E[p(Y)|X=x], where p(.) is a J-dimensional vector of basis functions. A key feature is that E[p(Y)|X=x] is estimated by generalized random forest (Athey, Tibshirani, and Wager, 2019), targeting the heterogeneity of T across x. I show that f^(.|x) is uniformly consistent and asymptotically normal, while allowing J to grow to infinity. I also provide a standard error formula to construct asymptotically valid confidence intervals. Results from Monte Carlo experiments and an empirical illustration are provided.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.13251&r=ecm
  5. By: Christis Katsouris
    Abstract: This paper develops unified asymptotic distribution theory for dynamic quantile predictive regressions which is useful when examining quantile predictability in stock returns under possible presence of nonstationarity.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.14160&r=ecm
  6. By: Joseph Cummins (University of California, Riverside); Brock Smith (Montana State University); Douglas L. Miller (Cornell University); David Eliot Simon (University of Connecticut)
    Abstract: We investigate the properties of a systematic bias that arises in the synthetic control estimator in panel data settings with finite pre-treatment periods, offering intuition and guidance to practitioners. The bias comes from matching to idiosyn-cratic error terms (noise) in the treated unit and the donor units’ pre-treatment outcome values. This in turn leads to a biased counterfactual for the post-treatment periods. We use Monte Carlo simulations to evaluate the determinants of the bias in terms of error term variance, sample characteristics and DGP complexity, pro-viding guidance as to which situations are likely to yield more bias. We also offer a procedure to reduce the bias using a direct computational bias-correction procedure based on re-sampling from a pilot model that can reduce the bias in empirically fea-sible implementations. As a final potential solution, we compare the performance of our corrections to that of an Interactive Fixed Effects model. An empirical ap-plication focused on trade liberalization indicates that the magnitude of the bias may be economically meaningful in a real world setting.
    Keywords: Synthetic Control, Over-fitting
    JEL: C23 C52
    Date: 2023–10
    URL: http://d.repec.org/n?u=RePEc:uct:uconnp:2023-07&r=ecm
  7. By: Riccardo Di Francesco
    Abstract: Empirical studies in various social sciences often involve categorical outcomes with inherent ordering, such as self-evaluations of subjective well-being and self-assessments in health domains. While ordered choice models, such as the ordered logit and ordered probit, are popular tools for analyzing these outcomes, they may impose restrictive parametric and distributional assumptions. This paper introduces a novel estimator, the ordered correlation forest, that can naturally handle non-linearities in the data and does not assume a specific error term distribution. The proposed estimator modifies a standard random forest splitting criterion to build a collection of forests, each estimating the conditional probability of a single class. Under an "honesty" condition, predictions are consistent and asymptotically normal. The weights induced by each forest are used to obtain standard errors for the predicted probabilities and the covariates' marginal effects. Evidence from synthetic data shows that the proposed estimator features a superior prediction performance than alternative forest-based estimators and demonstrates its ability to construct valid confidence intervals for the covariates' marginal effects.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.08755&r=ecm
  8. By: Archer Gong Zhang; Jiahua Chen
    Abstract: In many statistical and econometric applications, we gather individual samples from various interconnected populations that undeniably exhibit common latent structures. Utilizing a model that incorporates these latent structures for such data enhances the efficiency of inferences. Recently, many researchers have been adopting the semiparametric density ratio model (DRM) to address the presence of latent structures. The DRM enables estimation of each population distribution using pooled data, resulting in statistically more efficient estimations in contrast to nonparametric methods that analyze each sample in isolation. In this article, we investigate the limit of the efficiency improvement attainable through the DRM. We focus on situations where one population's sample size significantly exceeds those of the other populations. In such scenarios, we demonstrate that the DRM-based inferences for populations with smaller sample sizes achieve the highest attainable asymptotic efficiency as if a parametric model is assumed. The estimands we consider include the model parameters, distribution functions, and quantiles. We use simulation experiments to support the theoretical findings with a specific focus on quantile estimation. Additionally, we provide an analysis of real revenue data from U.S. collegiate sports to illustrate the efficacy of our contribution.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.09103&r=ecm
  9. By: Tae-Hwy Lee (Department of Economics, University of California Riverside); Ekaterina Seregina (Colby College); Yaojue Xu (Colby College)
    Abstract: In this paper, we construct a class of strictly consistent scoring functions based on the Bregman divergence measure, which jointly elicit the mean and variance. We use the scoring functions to develop a novel out-of-sample forecast encompassing test in volatility predictive models. We show the encompassing test is asymptotically normal. Simulation results demonstrate the merits of the proposed Bregman scoring functions and the forecast encompassing test. The forecast encompassing test exhibits a proper size and good power in finite samples. In an empirical application, we investigate the predictive ability of macroeconomic and financial variables in forecasting the equity premium volatility.
    Keywords: strictly consistent scoring function, elicitability, Bregman divergence, Granger-causality, encompassing, model averaging, equity premium.
    JEL: C53 E37 E27
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:ucr:wpaper:202311&r=ecm
  10. By: Alberto Quaini (Erasmus University Rotterdam); Fabio Trojani (University of Geneva; University of Turin; and Swiss Finance Institute); Ming Yuan (Columbia University)
    Abstract: Tradable factor risk premia are defined by the negative factor covariance with the Stochastic Discount Factor projection on returns. They are robust to misspecification or weak identification in asset pricing models, and they are zero for any factor weakly correlated with returns. We propose a simple estimator of tradable factor risk premia that enjoys the Oracle Property, i.e., it performs as well as if the weak or useless factors were known. This estimator not only consistently removes such factors, but it also gives rise to reliable tests of asset pricing models. We study empirically a family of asset pricing models from the factor zoo and detect a robust subset of economically relevant and well-identified models, which are built out of factors with a nonzero tradable risk premium. Well-identified models feature a relatively low factor space dimension and some degree of misspecification, which harms the interpretation of other established notions of a factor risk premium in the literature.
    Keywords: Testing of asset pricing models, factor risk premia, useless and weak factors, factor selection, model misspecification, Oracle estimation and inference
    JEL: G12 C12 C13 C51 C52 C58
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:chf:rpseri:rp2381&r=ecm
  11. By: Harrison H. Li; Art B. Owen
    Abstract: We consider an experiment with at least two stages or batches and $O(N)$ subjects per batch. First, we propose a semiparametric treatment effect estimator that efficiently pools information across the batches, and show it asymptotically dominates alternatives that aggregate single batch estimates. Then, we consider the design problem of learning propensity scores for assigning treatment in the later batches of the experiment to maximize the asymptotic precision of this estimator. For two common causal estimands, we estimate this precision using observations from previous batches, and then solve a finite-dimensional concave maximization problem to adaptively learn flexible propensity scores that converge to suitably defined optima in each batch at rate $O_p(N^{-1/4})$. By extending the framework of double machine learning, we show this rate suffices for our pooled estimator to attain the targeted precision after each batch, as long as nuisance function estimates converge at rate $o_p(N^{-1/4})$. These relatively weak rate requirements enable the investigator to avoid the common practice of discretizing the covariate space for design and estimation in batch adaptive experiments while maintaining the advantages of pooling. Our numerical study shows that such discretization often leads to substantial asymptotic and finite sample precision losses outweighing any gains from design.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.15297&r=ecm
  12. By: Florian Gunsilius; David Van Dijcke
    Abstract: Thresholds in treatment assignments can produce discontinuities in outcomes, revealing causal insights. In many contexts, like geographic settings, these thresholds are unknown and multivariate. We propose a non-parametric method to estimate the resulting discontinuities by segmenting the regression surface into smooth and discontinuous parts. This estimator uses a convex relaxation of the Mumford-Shah functional, for which we establish identification and convergence. Using our method, we estimate that an internet shutdown in India resulted in a reduction of economic activity by over 50%, greatly surpassing previous estimates and shedding new light on the true cost of such shutdowns for digital economies globally.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.14630&r=ecm
  13. By: Albert Chiu; Xingchen Lan; Ziyi Liu; Yiqing Xu
    Abstract: Two-way fixed effects (TWFE) models are ubiquitous in causal panel analysis in political science. However, recent methodological discussions challenge their validity in the presence of heterogeneous treatment effects (HTE) and violations of the parallel trends assumption (PTA). This burgeoning literature has introduced multiple estimators and diagnostics, leading to confusion among empirical researchers on two fronts: the reliability of existing results based on TWFE models and the current best practices. To address these concerns, we examined, replicated, and reanalyzed 37 articles from three leading political science journals that employed observational panel data with binary treatments. Using six newly introduced HTE-robust estimators, we find that although precision may be affected, the core conclusions derived from TWFE estimates largely remain unchanged. PTA violations and insufficient statistical power, however, continue to be significant obstacles to credible inferences. Based on these findings, we offer recommendations for improving practice in empirical research.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.15983&r=ecm
  14. By: Yeon-Koo Che; Dong Woo Hahm; YingHua He
    Abstract: Inferring applicant preferences is fundamental in many analyses of school-choice data. Application mistakes make this task challenging. We propose a novel approach to deal with the mistakes in a deferred-acceptance matching environment. The key insight is that the uncertainties faced by applicants, e.g., due to tie-breaking lotteries, render some mistakes costly, allowing us to reliably infer relevant preferences. Our approach extracts all information on preferences robustly to payoff-insignificant mistakes. We apply it to school-choice data from Staten Island, NYC. Counterfactual analysis suggests that we underestimate the effects of proposed desegregation reforms when applicants' mistakes are not accounted for in preference inference and estimation.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.14297&r=ecm
  15. By: Yichi Zhang; Mihai Cucuringu; Alexander Y. Shestopaloff; Stefan Zohren
    Abstract: In multivariate time series systems, lead-lag relationships reveal dependencies between time series when they are shifted in time relative to each other. Uncovering such relationships is valuable in downstream tasks, such as control, forecasting, and clustering. By understanding the temporal dependencies between different time series, one can better comprehend the complex interactions and patterns within the system. We develop a cluster-driven methodology based on dynamic time warping for robust detection of lead-lag relationships in lagged multi-factor models. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our algorithm is able to robustly detect lead-lag relationships in financial markets, which can be subsequently leveraged in trading strategies with significant economic benefits.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.08800&r=ecm
  16. By: Karin Klieber
    Abstract: This paper introduces non-linear dimension reduction in factor-augmented vector autoregressions to analyze the effects of different economic shocks. I argue that controlling for non-linearities between a large-dimensional dataset and the latent factors is particularly useful during turbulent times of the business cycle. In simulations, I show that non-linear dimension reduction techniques yield good forecasting performance, especially when data is highly volatile. In an empirical application, I identify a monetary policy as well as an uncertainty shock excluding and including observations of the COVID-19 pandemic. Those two applications suggest that the non-linear FAVAR approaches are capable of dealing with the large outliers caused by the COVID-19 pandemic and yield reliable results in both scenarios.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.04821&r=ecm
  17. By: Dylan Balla-Elliott
    Abstract: Information provision experiments are an increasingly popular tool to identify how beliefs causally affect decision-making and behavior. In a simple Bayesian model of belief formation via costly information acquisition, people form precise beliefs when these beliefs are important for their decision-making. The precision of prior beliefs controls how much their beliefs shift when they are shown new information (i.e., the strength of the first stage). Since two-stage least squares (TSLS) targets a weighted average with weights proportional to the strength of the first stage, TSLS will overweight individuals with smaller causal effects and underweight those with larger effects, thus understating the average partial effect of beliefs on behavior. In experimental designs where all participants are exposed to new information, Bayesian updating implies that a control function can be used to identify the (unweighted) average partial effect. I apply this estimator to a recent study of the effects of beliefs about the gender wage gap on support for public policies (Settele, 2022) and find the average partial effect is 40% larger than the comparable TSLS estimate. This difference can be explained by the fact that the effects of beliefs are close to zero for people who update their beliefs the most and receive the most weight in TSLS specifications.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.11387&r=ecm
  18. By: Dangxing Chen
    Abstract: In recent years, explainable machine learning methods have been very successful. Despite their success, most explainable machine learning methods are applied to black-box models without any domain knowledge. By incorporating domain knowledge, science-informed machine learning models have demonstrated better generalization and interpretation. But do we obtain consistent scientific explanations if we apply explainable machine learning methods to science-informed machine learning models? This question is addressed in the context of monotonic models that exhibit three different types of monotonicity. To demonstrate monotonicity, we propose three axioms. Accordingly, this study shows that when only individual monotonicity is involved, the baseline Shapley value provides good explanations; however, when strong pairwise monotonicity is involved, the Integrated gradients method provides reasonable explanations on average.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.13246&r=ecm
  19. By: Jinglong Zhao
    Abstract: In experimental design, Neyman allocation refers to the practice of allocating subjects into treated and control groups, potentially in unequal numbers proportional to their respective standard deviations, with the objective of minimizing the variance of the treatment effect estimator. This widely recognized approach increases statistical power in scenarios where the treated and control groups have different standard deviations, as is often the case in social experiments, clinical trials, marketing research, and online A/B testing. However, Neyman allocation cannot be implemented unless the standard deviations are known in advance. Fortunately, the multi-stage nature of the aforementioned applications allows the use of earlier stage observations to estimate the standard deviations, which further guide allocation decisions in later stages. In this paper, we introduce a competitive analysis framework to study this multi-stage experimental design problem. We propose a simple adaptive Neyman allocation algorithm, which almost matches the information-theoretic limit of conducting experiments. Using online A/B testing data from a social media site, we demonstrate the effectiveness of our adaptive Neyman allocation algorithm, highlighting its practicality even when applied with only a limited number of stages.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.08808&r=ecm
  20. By: Malte Jahn
    Abstract: A nonlinear regression framework is proposed for time series and panel data for the situation where certain explanatory variables are available at a higher temporal resolution than the dependent variable. The main idea is to use the moments of the empirical distribution of these variables to construct regressors with the correct resolution. As the moments are likely to display nonlinear marginal and interaction effects, an artificial neural network regression function is proposed. The corresponding model operates within the traditional stochastic nonlinear least squares framework. In particular, a numerical Hessian is employed to calculate confidence intervals. The practical usefulness is demonstrated by analyzing the influence of daily temperatures in 260 European NUTS2 regions on the yearly growth of gross value added in these regions in the time period 2000 to 2021. In the particular example, the model allows for an appropriate assessment of regional economic impacts resulting from (future) changes in the regional temperature distribution (mean AND variance).
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.10481&r=ecm
  21. By: Anirban Basu
    Abstract: This chapter reviews the econometric approaches typically used to deal with the spike of zeros when modeling non-negative outcomes such as expenditures, income, or consumption.
    JEL: C10 D0 I0
    Date: 2023–08
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:31632&r=ecm
  22. By: Wei Jie Yeo; Wihan van der Heever; Rui Mao; Erik Cambria; Ranjan Satapathy; Gianmarco Mengaldo
    Abstract: The success of artificial intelligence (AI), and deep learning models in particular, has led to their widespread adoption across various industries due to their ability to process huge amounts of data and learn complex patterns. However, due to their lack of explainability, there are significant concerns regarding their use in critical sectors, such as finance and healthcare, where decision-making transparency is of paramount importance. In this paper, we provide a comparative survey of methods that aim to improve the explainability of deep learning models within the context of finance. We categorize the collection of explainable AI methods according to their corresponding characteristics, and we review the concerns and challenges of adopting explainable AI methods, together with future directions we deemed appropriate and important.
    Date: 2023–09
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2309.11960&r=ecm

This nep-ecm issue is ©2023 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.