|
on Econometrics |
By: | Lucija \v{Z}igni\'c; Stjepan Begu\v{s}i\'c; Zvonko Kostanj\v{c}ar |
Abstract: | Estimation of high-dimensional covariance matrices in latent factor models is an important topic in many fields and especially in finance. Since the number of financial assets grows while the estimation window length remains of limited size, the often used sample estimator yields noisy estimates which are not even positive definite. Under the assumption of latent factor models, the covariance matrix is decomposed into a common low-rank component and a full-rank idiosyncratic component. In this paper we focus on the estimation of the idiosyncratic component, under the assumption of a grouped structure of the time series, which may arise due to specific factors such as industries, asset classes or countries. We propose a generalized methodology for estimation of the block-diagonal idiosyncratic component by clustering the residual series and applying shrinkage to the obtained blocks in order to ensure positive definiteness. We derive two different estimators based on different clustering methods and test their performance using simulation and historical data. The proposed methods are shown to provide reliable estimates and outperform other state-of-the-art estimators based on thresholding methods. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.03781&r= |
By: | Bulat Gafarov; Madina Karamysheva; Andrey Polbin; Anton Skrobotov |
Abstract: | We propose a novel approach to identification in structural vector autoregressions (SVARs) that uses external instruments for heteroscedasticiy of a structural shock of interest. This approach does not require lead/lag exogeneity for identification, does not require heteroskedasticity to be persistent, and facilitates interpretation of the structural shocks. To implement this identification approach in applications, we develop a new method for simultaneous inference of structural impulse responses and other parameters, employing a dependent wild-bootstrap of local projection estimators. This method is robust to an arbitrary number of unit roots and cointegration relationships, time-varying local means and drifts, and conditional heteroskedasticity of unknown form and can be used with other identification schemes, including Cholesky and the conventional external IV. We show how to construct pointwise and simultaneous confidence bounds for structural impulse responses and how to compute smoothed local projections with the corresponding confidence bounds. Using simulated data from a standard log-linearized DSGE model, we show that the method can reliably recover the true impulse responses in realistic datasets. As an empirical application, we adopt the proposed method in order to identify monetary policy shock using the dates of FOMC meetings in a standard six-variable VAR. The robustness of our identification and inference methods allows us to construct an instrumental variable for monetary policy shock that dates back to 1965. The resulting impulse response functions for all variables align with the classical Cholesky identification scheme and are different from the narrative sign restricted Bayesian VAR estimates. In particular, the response to inflation manifests a price puzzle that is indicative of the cost channel of the interest rates. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.03265&r= |
By: | Alessandro Casini; Adam McCloskey |
Abstract: | We provide precise conditions for nonparametric identification of causal effects by high-frequency event study regressions, which have been used widely in the recent macroeconomics, financial economics and political economy literatures. The high-frequency event study method regresses changes in an outcome variable on a measure of unexpected changes in a policy variable in a narrow time window around an event or a policy announcement (e.g., a 30-minute window around an FOMC announcement). We show that, contrary to popular belief, the narrow size of the window is not sufficient for identification. Rather, the population regression coefficient identifies a causal estimand when (i) the effect of the policy shock on the outcome does not depend on the other shocks (separability) and (ii) the surprise component of the news or event dominates all other shocks that are present in the event window (relative exogeneity). Technically, the latter condition requires the policy shock to have infinite variance in the event window. Under these conditions, we establish the causal meaning of the event study estimand corresponding to the regression coefficient and the consistency and asymptotic normality of the event study estimator. Notably, this standard linear regression estimator is robust to general forms of nonlinearity. We apply our results to Nakamura and Steinsson's (2018a) analysis of the real economic effects of monetary policy, providing a simple empirical procedure to analyze the extent to which the standard event study estimator adequately estimates causal effects of interest. |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2406.15667&r= |
By: | Max Cytrynbaum |
Abstract: | We study estimation and inference on causal parameters under finely stratified rerandomization designs, which use baseline covariates to match units into groups (e.g. matched pairs), then rerandomize within-group treatment assignments until a balance criterion is satisfied. We show that finely stratified rerandomization does partially linear regression adjustment by design, providing nonparametric control over the covariates used for stratification, and linear control over the rerandomization covariates. We also introduce novel rerandomization criteria, allowing for nonlinear imbalance metrics and proposing a minimax scheme that optimizes the balance criterion using pilot data or prior information provided by the researcher. While the asymptotic distribution of generalized method of moments (GMM) estimators under stratified rerandomization is generically non-Gaussian, we show how to restore asymptotic normality using optimal ex-post linear adjustment. This allows us to provide simple asymptotically exact inference methods for superpopulation parameters, as well as efficient conservative inference methods for finite population parameters. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.03279&r= |
By: | Nicolas Apfel; Julia Hatamyar; Martin Huber; Jannis Kueck |
Abstract: | This study introduces a data-driven, machine learning-based method to detect suitable control variables and instruments for assessing the causal effect of a treatment on an outcome in observational data, if they exist. Our approach tests the joint existence of instruments, which are associated with the treatment but not directly with the outcome (at least conditional on observables), and suitable control variables, conditional on which the treatment is exogenous, and learns the partition of instruments and control variables from the observed data. The detection of sets of instruments and control variables relies on the condition that proper instruments are conditionally independent of the outcome given the treatment and suitable control variables. We establish the consistency of our method for detecting control variables and instruments under certain regularity conditions, investigate the finite sample performance through a simulation study, and provide an empirical application to labor market data from the Job Corps study. |
Date: | 2024–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2407.04448&r= |
By: | Mikihito Nishi |
Abstract: | We consider estimating nonparametric time-varying parameters in linear models using kernel regression. Our contributions are twofold. First, We consider a broad class of time-varying parameters including deterministic smooth functions, the rescaled random walk, structural breaks, the threshold model and their mixtures. We show that those time-varying parameters can be consistently estimated by kernel regression. Our analysis exploits the smoothness of time-varying parameters rather than their specific form. The second contribution is to reveal that the bandwidth used in kernel regression determines the trade-off between the rate of convergence and the size of the class of time-varying parameters that can be estimated. An implication from our result is that the bandwidth should be proportional to $T^{-1/2}$ if the time-varying parameter follows the rescaled random walk, where $T$ is the sample size. We propose a specific choice of the bandwidth that accommodates a wide range of time-varying parameter models. An empirical application shows that the kernel-based estimator with this choice can capture the random-walk dynamics in time-varying parameters. |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2406.14046&r= |
By: | Luofeng Liao; Christian Kroer |
Abstract: | We initiate the study of statistical inference and A/B testing for two market equilibrium models: linear Fisher market (LFM) equilibrium and first-price pacing equilibrium (FPPE). LFM arises from fair resource allocation systems such as allocation of food to food banks and notification opportunities to different types of notifications. For LFM, we assume that the data observed is captured by the classical finite-dimensional Fisher market equilibrium, and its steady-state behavior is modeled by a continuous limit Fisher market. The second type of equilibrium we study, FPPE, arises from internet advertising where advertisers are constrained by budgets and advertising opportunities are sold via first-price auctions. For platforms that use pacing-based methods to smooth out the spending of advertisers, FPPE provides a hindsight-optimal configuration of the pacing method. We propose a statistical framework for the FPPE model, in which a continuous limit FPPE models the steady-state behavior of the auction platform, and a finite FPPE provides the data to estimate primitives of the limit FPPE. Both LFM and FPPE have an Eisenberg-Gale convex program characterization, the pillar upon which we derive our statistical theory. We start by deriving basic convergence results for the finite market to the limit market. We then derive asymptotic distributions, and construct confidence intervals. Furthermore, we establish the asymptotic local minimax optimality of estimation based on finite markets. We then show that the theory can be used for conducting statistically valid A/B testing on auction platforms. Synthetic and semi-synthetic experiments verify the validity and practicality of our theory. |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2406.15522&r= |
By: | Fantazzini, Dean |
Abstract: | This paper investigates the estimation of the Value-at-Risk (VaR) across various probability levels for the log-returns of a comprehensive dataset comprising four thousand crypto-assets. Employing four recently introduced Adaptive Conformal Inference (ACI) algorithms, we aim to provide robust uncertainty estimates crucial for effective risk management in financial markets. We contrast the performance of these ACI algorithms with that of traditional benchmark models, including GARCH models and daily range models. Despite the substantial volatility observed in the majority of crypto-assets, our findings indicate that ACI algorithms exhibit notable efficacy. In contrast, daily range models, and to a lesser extent, GARCH models, encounter challenges related to numerical convergence issues and structural breaks. Among the ACI algorithms, the Fully Adaptive Conformal Inference (FACI) and the Scale-Free Online Gradient Descent (SF-OGD) stand out for their ability to provide precise VaR estimates across all quantiles examined. Conversely, the Aggregated Adaptive Conformal Inference (AgACI) and the Strongly Adaptive Online Conformal Prediction (SAOCP) demonstrate proficiency in estimating VaR for extreme quantiles but tend to be overly conservative for higher probability levels. These conclusions withstand robustness checks encompassing the market capitalization of crypto-assets, time series size, and different forecasting methods for asset log-returns. This study underscores the promise of ACI algorithms in enhancing risk assessment practices in the context of volatile and dynamic crypto-asset markets. |
Keywords: | Value at Risk (VaR); Adaptive Conformal Inference (ACI); Aggregated Adaptive Conformal Inference (AgACI); Fully Adaptive Conformal Inference (FACI); Scale-Free Online Gradient Descent (SF-OGD); Strongly Adaptive Online Conformal Prediction (SAOCP), GARCH; Daily Range; Risk Management |
JEL: | C14 C51 C53 C58 G17 G32 |
Date: | 2024 |
URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:121214&r= |
By: | Stefan Faridani |
Abstract: | How many experimental studies would have come to different conclusions had they been run on larger samples? I show how to estimate the expected number of statistically significant results that a set of experiments would have reported had their sample sizes all been counterfactually increased by a chosen factor. The estimator is consistent and asymptotically normal. Unlike existing methods, my approach requires no assumptions about the distribution of true effects of the interventions being studied other than continuity. This method includes an adjustment for publication bias in the reported t-scores. An application to randomized controlled trials (RCTs) published in top economics journals finds that doubling every experiment's sample size would only increase the power of two-sided t-tests by 7.2 percentage points on average. This effect is small and is comparable to the effect for systematic replication projects in laboratory psychology where previous studies enabled accurate power calculations ex ante. These effects are both smaller than for non-RCTs. This comparison suggests that RCTs are on average relatively insensitive to sample size increases. The policy implication is that grant givers should generally fund more experiments rather than fewer, larger ones. |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2406.13122&r= |
By: | Mayo, Deborah |
Abstract: | In this paper I discuss a fundamental contrast between two types of statistical tests now in use: those where the post-data inferential assessment is sensitive to the method’s error probabilities—error statistical methods (e.g., statistical significance tests), and those where it is insensitive (e.g., Bayes factors). It might be thought that if a method is insensitive to error probabilities that it escapes the inferential consequences of inflated error rates due to biasing selection effects. I will argue that this is not the case. I discuss a recent paper advocating subjective Bayes factors (BFs) by van Dongen, Sprenger, and Wagenmakers (VSW 2022). VSW claim that the comparatively more likely hypothesis H passes a stringent test, despite insensitivity to the error statistical properties of that test. I argue that the BF test rule they advocate can accord strong evidence to a claim H, even though little has been done to rule out H’s flaws. There are two reasons the BF test fails to satisfy the minimal requirement for stringency: its insensitivity to biasing selection effects, and the fact that H and its competitor need not exhaust the space of possibilities. Data can be much more probable under hypothesis H than under a chosen non-exhaustive competitor H’, even though H is poorly warranted. I will recommend VSW supplement their BF tests with a report of how severely H has passed, in the frequentist error statistical sense. I begin by responding to the criticisms VSW raise for a severe testing reformulation of statistical significance tests. A post-data severity concept can supply a transparent way for skeptical consumers, who are not steeped in technical machinery, to check if errors and biases are avoided in specific inferences that affect them. |
Date: | 2024–06–25 |
URL: | https://d.repec.org/n?u=RePEc:osf:osfxxx:tmgqd&r= |
By: | Jose Apesteguia; Miguel A. Ballester; Ángelo Gutiérrez-Daza |
Abstract: | This paper introduces the random discounted expected utility (RDEU) model, which we have developed as a means to deal with heterogeneous risk and time preferences. The RDEU model provides an explicit linkage between preference and choice heterogeneity. We prove it has solid comparative statics, discuss its identification, and demonstrate its computational convenience. Finally, we use two distinct experimental datasets to illustrate the advantages of the RDEU model over common alternatives for estimating heterogeneity in preferences across individuals. |
Keywords: | Heterogeneity;Risk Preferences;Time Preferences;Comparative Statics;Random Utility Models |
JEL: | C01 D01 |
Date: | 2024–06 |
URL: | https://d.repec.org/n?u=RePEc:bdm:wpaper:2024-03&r= |
By: | Robinson, Thomas; Tax, Niek; Mudd, Richard; Guy, Ido |
Abstract: | Active learning can improve the efficiency of training prediction models by identifying the most informative new labels to acquire. However, non-response to label requests can impact active learning’s effectiveness in real-world contexts. We conceptualise this degradation by considering the type of non-response present in the data, demonstrating that biased non-response is particularly detrimental to model performance. We argue that biased non-response is likely in contexts where the labelling process, by nature, relies on user interactions. To mitigate the impact of biased non-response, we propose a cost-based correction to the sampling strategy–the Upper Confidence Bound of the Expected Utility (UCB-EU)–that can, plausibly, be applied to any active learning algorithm. Through experiments, we demonstrate that our method successfully reduces the harm from labelling non-response in many settings. However, we also characterise settings where the non-response bias in the annotations remains detrimental under UCB-EU for specific sampling methods and data generating processes. Finally, we evaluate our method on a real-world dataset from an e-commerce platform. We show that UCB-EU yields substantial performance improvements to conversion models that are trained on clicked impressions. Most generally, this research serves to both better conceptualise the interplay between types of non-response and model improvements via active learning, and to provide a practical, easy-to-implement correction that mitigates model degradation. |
Keywords: | active learning; non-response; missing data; e-commerce; CTR prediction; Springer deal |
JEL: | L81 |
Date: | 2024–05–25 |
URL: | https://d.repec.org/n?u=RePEc:ehl:lserod:123029&r= |