|
on Econometrics |
By: | Liang Jiang; Oliver B. Linton; Haihan Tang; Yichong Zhang |
Abstract: | We study regression adjustments with additional covariates in randomized experiments under covariate-adaptive randomizations (CARs) when subject compliance is imperfect. We develop a regression-adjusted local average treatment effect (LATE) estimator that is proven to improve efficiency in the estimation of LATEs under CARs. Our adjustments can be parametric in linear and nonlinear forms, nonparametric, and high-dimensional. Even when the adjustments are misspecified, our proposed estimator is still consistent and asymptotically normal, and their inference method still achieves the exact asymptotic size under the null. When the adjustments are correctly specified, our estimator achieves the minimum asymptotic variance. When the adjustments are parametrically misspecified, we construct a new estimator which is weakly more efficient than linearly and nonlinearly adjusted estimators, as well as the one without any adjustments. Simulation evidence and empirical application confirm efficiency gains achieved by regression adjustments relative to both the estimator without adjustment and the standard two-stage least squares estimator. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.13004&r= |
By: | Jad Beyhum; Samuele Centorrino; Jean-Pierre Florens; Ingrid Van Keilegom |
Abstract: | This paper considers identification and estimation of the causal effect of the time Z until a subject is treated on a survival outcome T. The treatment is not randomly assigned, T is randomly right censored by a random variable C and the time to treatment Z is right censored by min(T,C) The endogeneity issue is treated using an instrumental variable explaining Z and independent of the error term of the model. We study identification in a fully nonparametric framework. We show that our specification generates an integral equation, of which the regression function of interest is a solution. We provide identification conditions that rely on this identification equation. For estimation purposes, we assume that the regression function follows a parametric model. We propose an estimation procedure and give conditions under which the estimator is asymptotically normal. The estimators exhibit good finite sample properties in simulations. Our methodology is applied to find evidence supporting the efficacy of a therapy for burn-out. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.10826&r= |
By: | Harold D Chiang; Bruce E Hansen; Yuya Sasaki |
Abstract: | We propose improved standard errors and an asymptotic distribution theory for two-way clustered panels. Our proposed estimator and theory allow for arbitrary serial dependence in the common time effects, which is excluded by existing two-way methods, including the popular two-way cluster standard errors of Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021). Our asymptotic distribution theory is the first which allows for this level of inter-dependence among the observations. Under weak regularity conditions, we demonstrate that the least squares estimator is asymptotically normal, our proposed variance estimator is consistent, and t-ratios are asymptotically standard normal, permitting conventional inference. We present simulation evidence that confidence intervals constructed with our proposed standard errors obtain superior coverage performance relative to existing methods. We illustrate the relevance of the proposed method in an empirical application to a standard Fama-French three-factor regression. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.11304&r= |
By: | Alexander Georges Gretener; Matthias Neuenkirch; Dennis Umlandt |
Abstract: | We propose a novel dynamic mixture vector autoregressive (VAR) model in which time-varying mixture weights are driven by the predictive likelihood score. Intuitively, the state weight of the k-th component VAR model in the subsequent period is increased if the current observation is more likely to be drawn from this particular state. The model is not limited to a specific distributional assumption and allows for straightforward likelihood-based estimation and inference. We conduct a Monte Carlo study and find that the score-driven mixture VAR model is able to adequately filter the mixture dynamics from a variety of different data generating processes which most other observation-driven dynamic mixture VAR models cannot appropriately cope with. Finally, we illustrate our approach by an application where we model the conditional joint distribution of economic and financial conditions and derive generalized impulse responses. |
Keywords: | Dynamic Mixture Models; Generalized Autoregressive Score Models; Macro-Financial Linkages; Nonlinear VAR |
JEL: | C32 C34 G17 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:trr:wpaper:202202&r= |
By: | Carolina Caetano; Brantly Callaway; Stroud Payne; Hugo Sant'Anna Rodrigues |
Abstract: | This paper considers identification and estimation of causal effect parameters from participating in a binary treatment in a difference in differences (DID) setup when the parallel trends assumption holds after conditioning on observed covariates. Relative to existing work in the econometrics literature, we consider the case where the value of covariates can change over time and, potentially, where participating in the treatment can affect the covariates themselves. We propose new empirical strategies in both cases. We also consider two-way fixed effects (TWFE) regressions that include time-varying regressors, which is the most common way that DID identification strategies are implemented under conditional parallel trends. We show that, even in the case with only two time periods, these TWFE regressions are not generally robust to (i) time-varying covariates being affected by the treatment, (ii) treatment effects and/or paths of untreated potential outcomes depending on the level of time-varying covariates in addition to only the change in the covariates over time, (iii) treatment effects and/or paths of untreated potential outcomes depending on time-invariant covariates, (iv) treatment effect heterogeneity with respect to observed covariates, and (v) violations of strong functional form assumptions, both for outcomes over time and the propensity score, that are unlikely to be plausible in most DID applications. Thus, TWFE regressions can deliver misleading estimates of causal effect parameters in a number of empirically relevant cases. We propose both doubly robust estimands and regression adjustment/imputation strategies that are robust to these issues while not being substantially more challenging to implement. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.02903&r= |
By: | Ke-Li Xu (Indiana University Bloomington) |
Abstract: | We consider inference for predictive regressions with multiple predictors. Extant tests for predictability may perform unsatisfactorily and tend to discover spurious predictability as the number of predictors increases. We propose a battery of new instrumental-variables based tests which involve enforcement or partial enforcement of the null hypothesis in variance estimation. A test based on the few-predictors-at-a-time parsimonious system approach is recommended. Empirical Monte Carlos demonstrate the remarkable finite-sample performance regardless of numerosity of predictors and their persistence properties. Empirical application to equity premium predictability is provided. |
Keywords: | Uniform inference, impulse responses, local projections, persistence |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:inu:caeprp:2022002&r= |
By: | H. Peter Boswijk; Giuseppe Cavaliere; Luca De Angelis; A. M. Robert Taylor |
Abstract: | Standard methods, such as sequential procedures based on Johansen's (pseudo-)likelihood ratio (PLR) test, for determining the co-integration rank of a vector autoregressive (VAR) system of variables integrated of order one can be significantly affected, even asymptotically, by unconditional heteroskedasticity (non-stationary volatility) in the data. Known solutions to this problem include wild bootstrap implementations of the PLR test or the use of an information criterion, such as the BIC, to select the co-integration rank. Although asymptotically valid in the presence of heteroskedasticity, these methods can display very low finite sample power under some patterns of non-stationary volatility. In particular, they do not exploit potential efficiency gains that could be realised in the presence of non-stationary volatility by using adaptive inference methods. Under the assumption of a known autoregressive lag length, Boswijk and Zu (2022) develop adaptive PLR test based methods using a non-parameteric estimate of the covariance matrix process. It is well-known, however, that selecting an incorrect lag length can significantly impact on the efficacy of both information criteria and bootstrap PLR tests to determine co-integration rank in finite samples. We show that adaptive information criteria-based approaches can be used to estimate the autoregressive lag order to use in connection with bootstrap adaptive PLR tests, or to jointly determine the co-integration rank and the VAR lag length and that in both cases they are weakly consistent for these parameters in the presence of non-stationary volatility provided standard conditions hold on the penalty term. Monte Carlo simulations are used to demonstrate the potential gains from using adaptive methods and an empirical application to the U.S. term structure is provided. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.02532&r= |
By: | Juho Koistinen; Bernd Funovits |
Abstract: | We propose a new parametrization for the estimation and identification of the impulse-response functions (IRFs) of dynamic factor models (DFMs). The theoretical contribution of this paper concerns the problem of observational equivalence between different IRFs, which implies non-identification of the IRF parameters without further restrictions. We show how the minimal identification conditions proposed by Bai and Wang (2015) are nested in the proposed framework and can be further augmented with overidentifying restrictions leading to efficiency gains. The current standard practice for the IRF estimation of DFMs is based on principal components, compared to which the new parametrization is less restrictive and allows for modelling richer dynamics. As the empirical contribution of the paper, we develop an estimation method based on the EM algorithm, which incorporates the proposed identification restrictions. In the empirical application, we use a standard high-dimensional macroeconomic dataset to estimate the effects of a monetary policy shock. We estimate a strong reaction of the macroeconomic variables, while the benchmark models appear to give qualitatively counterintuitive results. The estimation methods are implemented in the accompanying R package. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.00310&r= |
By: | Georg Keilbar; Juan M. Rodriguez-Poo; Alexandra Soberon; Weining Wang |
Abstract: | This paper presents a new approach to estimation and inference in panel data models with interactive fixed effects, where the unobserved factor loadings are allowed to be correlated with the regressors. A distinctive feature of the proposed approach is to assume a nonparametric specification for the factor loadings, that allows us to partial out the interactive effects using sieve basis functions to estimate the slope parameters directly. The new estimator adopts the well-known partial least squares form, and its $\sqrt{NT}$-consistency and asymptotic normality are shown. Later, the common factors are estimated using principal component analysis (PCA), and the corresponding convergence rates are obtained. A Monte Carlo study indicates good performance in terms of mean squared error. We apply our methodology to analyze the determinants of growth rates in OECD countries. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.11482&r= |
By: | Ayden Higgins; Koen Jochmans |
Abstract: | The maximum-likelihood estimator of nonlinear panel data models with fixed effects is consistent but asymptotically-biased under rectangular-array asymptotics. The literature has thus far concentrated its effort on devising methods to correct the maximum-likelihood estimator for its bias as a means to salvage standard inferential procedures. Instead, we show that the parametric bootstrap replicates the distribution of the (uncorrected) maximum-likelihood estimator in large samples. This justifies the use of confidence sets constructed via standard bootstrap percentile methods. No adjustment for the presence of bias needs to be made. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.11156&r= |
By: | AmirEmad Ghassami; Ilya Shpitser; Eric Tchetgen Tchetgen |
Abstract: | We consider the task of estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain. The observational data is assumed to be confounded and hence without further assumptions, this dataset alone cannot be used for causal inference. Also, only a short-term version of the primary outcome variable of interest is observed in the experimental data, and hence, this dataset alone cannot be used for causal inference either. In a recent work, Athey et al. (2020) proposed a method for systematically combining such data for identifying the downstream causal effect in view. Their approach is based on the assumptions of internal and external validity of the experimental data, and an extra novel assumption called latent unconfoundedness. In this paper, we first review their proposed approach and discuss the latent unconfoundedness assumption. Then we propose two alternative approaches for data fusion for the purpose of estimating average treatment effect as well as the effect of treatment on the treated. Our first proposed approach is based on assuming equi-confounding bias for the short-term and long-term outcomes. Our second proposed approach is based on the proximal causal inference framework, in which we assume the existence of an extra variable in the system which is a proxy of the latent confounder of the treatment-outcome relation. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.10743&r= |
By: | Bora Kim |
Abstract: | Empirical researchers are often interested in not only whether a treatment affects an outcome of interest, but also how the treatment effect arises. Causal mediation analysis provides a formal framework to identify causal mechanisms through which a treatment affects an outcome. The most popular identification strategy relies on so-called sequential ignorability (SI) assumption which requires that there is no unobserved confounder that lies in the causal paths between the treatment and the outcome. Despite its popularity, such assumption is deemed to be too strong in many settings as it excludes the existence of unobserved confounders. This limitation has inspired recent literature to consider an alternative identification strategy based on an instrumental variable (IV). This paper discusses the identification of causal mediation effects in a setting with a binary treatment and a binary instrumental variable that is both assumed to be random. We show that while IV methods allow for the possible existence of unobserved confounders, additional monotonicity assumptions are required unless the strong constant effect is assumed. Furthermore, even when such monotonicity assumptions are satisfied, IV estimands are not necessarily equivalent to target parameters. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.12752&r= |
By: | Gabriel Okasa |
Abstract: | Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of meta-learners for estimation of heterogeneous treatment effects under the usage of sample-splitting and cross-fitting to reduce the overfitting bias. In both synthetic and semi-synthetic simulations we find that the performance of the meta-learners in finite samples greatly depends on the estimation procedure. The results imply that sample-splitting and cross-fitting are beneficial in large samples for bias reduction and efficiency of the meta-learners, respectively, whereas full-sample estimation is preferable in small samples. Furthermore, we derive practical recommendations for application of specific meta-learners in empirical studies depending on particular data characteristics such as treatment shares and sample size. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.12692&r= |
By: | Zheng, Bang Quan |
Abstract: | This paper assesses the performance of regularized generalized least squares (RGLS) and reweighted least squares (RLS) methodologies in a confirmatory factor analysis model. Normal theory maximum likelihood (ML) and GLS statistics are based on large sample statistical theory. However, violation of asymptotic sample size is ubiquitous in real applications of structural equation modeling (SEM), and ML and GLS goodness-of-fit tests in SEM often make incorrect decisions on the true model. The novel methods RGLS and RLS aim to correct the over-rejection by ML and under-rejection by GLS. Proposed by Arruda and Bentler (2017), RGLS replaces a GLS weight matrix with a regularized one. Rediscovered by Hayakawa (2019), RLS replaces this weight matrix with one that derives from an ML function. Both of these methods outperform ML and GLS when samples are small, yet no studies have compared their relative performance. A confirmatory factor analysis Monte Carlo simulation study with normal and non-normal data was carried out to examine the statistical performance of these two methods at different sample sizes. Based on empirical rejection frequencies and empirical distributions of test statistics, we find that RLS and RGLS have equivalent performance when N≥70; whereas when N<70, RLS outperforms RGLS. Both methods clearly outperform ML and GLS with N≤400. Nonetheless, adopting mean and variance adjusted test proposed by Hayakawa (2019) for non-normal data, our results show that RGLS slightly outperforms RLS. |
Date: | 2021–10–05 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:aejgf&r= |
By: | Stephan Martin |
Abstract: | Nonparametric random coefficient (RC)-density estimation has mostly been considered in the marginal density case under strict independence of RCs and covariates. This paper deals with the estimation of RC-densities conditional on a (large-dimensional) set of control variables using machine learning techniques. The conditional RC-density allows to disentangle observable from unobservable heterogeneity in partial effects of continuous treatments adding to a growing literature on heterogeneous effect estimation using machine learning. %It is also informative of the conditional potential outcome distribution. This paper proposes a two-stage sieve estimation procedure. First a closed-form sieve approximation of the conditional RC density is derived where each sieve coefficient can be expressed as conditional expectation function varying with controls. Second, sieve coefficients are estimated with generic machine learning procedures and under appropriate sample splitting rules. The $L_2$-convergence rate of the conditional RC-density estimator is derived. The rate is slower by a factor then typical rates of mean regression machine learning estimators which is due to the ill-posedness of the RC density estimation problem. The performance and applicability of the estimator is illustrated using random forest algorithms over a range of Monte Carlo simulations and with real data from the SOEP-IS. Here behavioral heterogeneity in an economic experiment on portfolio choice is studied. The method reveals two types of behavior in the population, one type complying with economic theory and one not. The assignment to types appears largely based on unobservables not available in the data. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.08366&r= |
By: | B. Cooper Boniece; Jos\'e E. Figueroa-L\'opez; Yuchen Han |
Abstract: | Statistical inference for stochastic processes based on high-frequency observations has been an active research area for more than a decade. One of the most well-known and widely studied problems is that of estimation of the quadratic variation of the continuous component of an It\^o semimartingale with jumps. Several rate- and variance-efficient estimators have been proposed in the literature when the jump component is of bounded variation. However, to date, very few methods can deal with jumps of unbounded variation. By developing new high-order expansions of the truncated moments of a L\'evy process, we construct a new rate- and variance-efficient estimator for a class of L\'evy processes of unbounded variation, whose small jumps behave like those of a stable L\'evy process with Blumenthal-Getoor index less than $8/5$. The proposed method is based on a two-step debiasing procedure for the truncated realized quadratic variation of the process. Our Monte Carlo experiments indicate that the method outperforms other efficient alternatives in the literature in the setting covered by our theoretical framework. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.00877&r= |
By: | Benjamin Poignard; Manabu Asai |
Abstract: | Although multivariate stochastic volatility models usually produce more accurate forecasts compared to MGARCH models, their estimation techniques such as Bayesian MCMC typically suffer from the curse of dimensionality. We propose a fast and efficient estimation approach for MSV based on a penalized OLS framework. Specifying the MSV model as a multivariate state-space model, we carry out a two-step penalized procedure. We provide the asymptotic properties of the two-step estimator and the oracle property of the first-step estimator when the number of parameters diverges. The performances of our method are illustrated through simulations and financial data. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.08584&r= |
By: | Joshua Angrist |
Abstract: | The view that empirical strategies in economics should be transparent and credible now goes almost without saying. The local average treatment effects (LATE) framework for causal inference helped make this so. The LATE theorem tells us for whom particular instrumental variables (IV) and regression discontinuity estimates are valid. This lecture uses several empirical examples, mostly involving charter and exam schools, to highlight the value of LATE. A surprising exclusion restriction, an assumption central to the LATE interpretation of IV estimates, is shown to explain why enrollment at Chicago exam schools reduces student achievement. I also make two broader points: IV exclusion restrictions formalize commitment to clear and consistent explanations of reduced-form causal effects; compelling applications demonstrate the power of simple empirical strategies to generate new causal knowledge. |
JEL: | B23 I21 I28 J13 J22 |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:29726&r= |
By: | Roberto Casarin (University of Ca' Foscari of Venice); Stefano Grassi (University of Rome Tor Vergata); Francesco Ravazzolo (BI Norwegian Business School); Herman van Dijk (Erasmus University Rotterdam) |
Abstract: | A flexible predictive density combination model is introduced for large financial data sets which allows for dynamic weight learning and model set incompleteness. Dimension reduction procedures allocate the large sets of predictive densities and combination weights to relatively small sets. Given the representation of the probability model in extended nonlinear state-space form, efficient simulation-based Bayesian inference is proposed using parallel sequential clustering as well as nonlinear filtering, implemented on graphics processing units. The approach is applied to combine predictive densities based on a large number of individual stock returns of daily observations over a period that includes the Covid-19 crisis period. Evidence on the quantification of predictive accuracy, uncertainty and risk, in particular, in the tails, may provide useful information for investment fund management. Information on dynamic cluster composition, weight patterns and model set incompleteness give also valuable signals for improved modelling and policy specification. |
Keywords: | Density Combination, Large Set of Predictive Densities, Dynamic Factor Models, Nonlinear state-space, Bayesian Inference |
JEL: | C11 C15 C53 E37 |
Date: | 2022–02–14 |
URL: | http://d.repec.org/n?u=RePEc:tin:wpaper:20220013&r= |
By: | Sergey Nadtochiy; Yuan Yin |
Abstract: | This paper presents a tractable sufficient condition for the consistency of maximum likelihood estimators (MLEs) in partially observed diffusion models, stated in terms of stationary distributions of the associated test processes, under the assumption that the set of unknown parameter values is finite. We illustrate the tractability of this sufficient condition by verifying it in the context of a latent price model of market microstructure. Finally, we describe an algorithm for computing MLEs in partially observed diffusion models and test it on historical data to estimate the parameters of the latent price model. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.07656&r= |
By: | Wolf, Elias |
Abstract: | This paper proposes a Skewed Stochastic Volatility (SSV) model to model time varying, asymmetric forecast distributions to estimate Growth at Risk as introduced in Adrian, Boyarchenko, and Giannone's (2019) seminal paper "Vulnerable Growth". In contrary to their semi-parametric approach, the SSV model enables researchers to capture the evolution of the densities parametrically to conduct statistical tests and compare different models. The SSV-model forms a non-linear, non-gaussian state space model that can be estimated using Particle Filtering and MCMC algorithms. To remedy drawbacks of standard Bootstrap Particle Filters, I modify the Tempered Particle Filter of Herbst and Schorfheide's (2019) to account for stochastic volatility and asymmetric measurement densities. Estimating the model based on US data yields conditional forecast densities that closely resemble the findings by Adrian et al. (2019). Exploiting the advantages of the proposed model, I find that the estimated parameter values for the effect of financial conditions on the variance and skewness of the conditional distributions are statistically significant and in line with the intuition of the results found in the existing literature. |
Keywords: | Growth at Risk,Macro Finance,Bayesian Econometrics,Particle Filters |
JEL: | C10 E32 E58 G01 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:zbw:fubsbe:20222&r= |
By: | Isuru Ratnayake; V. A. Samaranayake |
Abstract: | This paper introduces a Threshold Asymmetric Conditional Autoregressive Range (TACARR) formulation for modeling the daily price ranges of financial assets. It is assumed that the process generating the conditional expected ranges at each time point switches between two regimes, labeled as upward market and downward market states. The disturbance term of the error process is also allowed to switch between two distributions depending on the regime. It is assumed that a self-adjusting threshold component that is driven by the past values of the time series determines the current market regime. The proposed model is able to capture aspects such as asymmetric and heteroscedastic behavior of volatility in financial markets. The proposed model is an attempt at addressing several potential deficits found in existing price range models such as the Conditional Autoregressive Range (CARR), Asymmetric CARR (ACARR), Feedback ACARR (FACARR) and Threshold Autoregressive Range (TARR) models. Parameters of the model are estimated using the Maximum Likelihood (ML) method. A simulation study shows that the ML method performs well in estimating the TACARR model parameters. The empirical performance of the TACARR model was investigated using IBM index data and results show that the proposed model is a good alternative for in-sample prediction and out-of-sample forecasting of volatility. Key Words: Volatility Modeling, Asymmetric Volatility, CARR Models, Regime Switching. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.03351&r= |
By: | Meng-Chen Hsieh; Clifford Hurvich; Philippe Soulier |
Abstract: | We develop and justify methodology to consistently test for long-horizon return predictability based on realized variance. To accomplish this, we propose a parametric transaction-level model for the continuous-time log price process based on a pure jump point process. The model determines the returns and realized variance at any level of aggregation with properties shown to be consistent with the stylized facts in the empirical finance literature. Under our model, the long-memory parameter propagates unchanged from the transaction-level drift to the calendar-time returns and the realized variance, leading endogenously to a balanced predictive regression equation. We propose an asymptotic framework using power-law aggregation in the predictive regression. Within this framework, we propose a hypothesis test for long horizon return predictability which is asymptotically correctly sized and consistent. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.00793&r= |
By: | Caio Almeida (Princeton University); Paul Schneider (University of Lugano) |
Abstract: | We develop a non-negative polynomial minimum-norm likelihood ratio (PLR) of two distributions of which only moments are known under shape restrictions. The PLR converges to the true, unknown, likelihood ratio. We show consistency, obtain the asymptotic distribution for the PLR coefficients estimated with sample moments, and present two applications. The first develops a PLR for the unknown transition density of a jump-diffusion process. The second modifies the Hansen-Jagannathan pricing kernel framework to accommodate polynomial return models consistent with no-arbitrage while simultaneously nesting the linear return model. |
Keywords: | Likelihood ratio, positive polynomial, Reproducing Kernel Hilbert Space (RKHS) |
JEL: | C13 C51 C61 |
Date: | 2021–10 |
URL: | http://d.repec.org/n?u=RePEc:pri:econom:2021-45&r= |
By: | Christian A. Scholbeck; Giuseppe Casalicchio; Christoph Molnar; Bernd Bischl; Christian Heumann |
Abstract: | Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for non-linear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a model-agnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the non-linearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of better-interpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the non-linearity of prediction functions, we introduce a non-linearity measure for marginal effects. We argue against summarizing feature effects of a non-linear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.08837&r= |
By: | Federico Bassetti; Giulia Carallo; Roberto Casarin |
Abstract: | A new integer-valued autoregressive process (INAR) with Generalised Lagrangian Katz (GLK) innovations is defined. We show that our GLK-INAR process is stationary, discrete semi-self-decomposable, infinite divisible, and provides a flexible modelling framework for count data allowing for under- and over-dispersion, asymmetry, and excess of kurtosis. A Bayesian inference framework and an efficient posterior approximation procedure based on Markov Chain Monte Carlo are provided. The proposed model family is applied to a Google Trend dataset which proxies the public concern about climate change around the world. The empirical results provide new evidence of heterogeneity across countries and keywords in the persistence, uncertainty, and long-run public awareness level. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.02029&r= |
By: | Jose Blanchet; Fernando Hernandez; Viet Anh Nguyen; Markus Pelger; Xuhui Zhang |
Abstract: | Missing time-series data is a prevalent problem in finance. Imputation methods for time-series data are usually applied to the full panel data with the purpose of training a model for a downstream out-of-sample task. For example, the imputation of missing returns may be applied prior to estimating a portfolio optimization model. However, this practice can result in a look-ahead-bias in the future performance of the downstream task. There is an inherent trade-off between the look-ahead-bias of using the full data set for imputation and the larger variance in the imputation from using only the training data. By connecting layers of information revealed in time, we propose a Bayesian consensus posterior that fuses an arbitrary number of posteriors to optimally control the variance and look-ahead-bias trade-off in the imputation. We derive tractable two-step optimization procedures for finding the optimal consensus posterior, with Kullback-Leibler divergence and Wasserstein distance as the measure of dissimilarity between posterior distributions. We demonstrate in simulations and an empirical study the benefit of our imputation mechanism for portfolio optimization with missing returns. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.00871&r= |
By: | Stefanos Bennett; Mihai Cucuringu; Gesine Reinert |
Abstract: | In multivariate time series systems, it has been observed that certain groups of variables partially lead the evolution of the system, while other variables follow this evolution with a time delay; the result is a lead-lag structure amongst the time series variables. In this paper, we propose a method for the detection of lead-lag clusters of time series in multivariate systems. We demonstrate that the web of pairwise lead-lag relationships between time series can be helpfully construed as a directed network, for which there exist suitable algorithms for the detection of pairs of lead-lag clusters with high pairwise imbalance. Within our framework, we consider a number of choices for the pairwise lead-lag metric and directed network clustering components. Our framework is validated on both a synthetic generative model for multivariate lead-lag time series systems and daily real-world US equity prices data. We showcase that our method is able to detect statistically significant lead-lag clusters in the US equity market. We study the nature of these clusters in the context of the empirical finance literature on lead-lag relations and demonstrate how these can be used for the construction of predictive financial signals. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.08283&r= |
By: | Rafael R. S. Guimaraes |
Abstract: | Limited datasets and complex nonlinear relationships are among the challenges that may emerge when applying econometrics to macroeconomic problems. This research proposes deep learning as an approach to transfer learning in the former case and to map relationships between variables in the latter case. Although macroeconomists already apply transfer learning when assuming a given a priori distribution in a Bayesian context, estimating a structural VAR with signal restriction and calibrating parameters based on results observed in other models, to name a few examples, advance in a more systematic transfer learning strategy in applied macroeconomics is the innovation we are introducing. We explore the proposed strategy empirically, showing that data from different but related domains, a type of transfer learning, helps identify the business cycle phases when there is no business cycle dating committee and to quick estimate a economic-based output gap. Next, since deep learning methods are a way of learning representations, those that are formed by the composition of multiple non-linear transformations, to yield more abstract representations, we apply deep learning for mapping low-frequency from high-frequency variables. The results obtained show the suitability of deep learning models applied to macroeconomic problems. First, models learned to classify United States business cycles correctly. Then, applying transfer learning, they were able to identify the business cycles of out-of-sample Brazilian and European data. Along the same lines, the models learned to estimate the output gap based on the U.S. data and obtained good performance when faced with Brazilian data. Additionally, deep learning proved adequate for mapping low-frequency variables from high-frequency data to interpolate, distribute, and extrapolate time series by related series. |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2201.13380&r= |
By: | Driver, Charles C |
Abstract: | The interpretation of cross-effects from vector autoregressive models to infer structure and causality amongst constructs is widespread and sometimes problematic. I first explain how hypothesis testing and regularization are invalidated when processes that are thought to fluctuate continuously in time are, as is typically done, modeled as changing only in discrete steps. I then describe an alternative interpretation of cross-effect parameters that incorporates correlated random changes for a potentially more realistic view of how process are temporally coupled. Using an example based on wellbeing data, I demonstrate how some classical concerns such as sign flipping and counter intuitive effect directions can disappear when using this combined deterministic / stochastic interpretation. Models that treat processes as continuously interacting offer both a resolution to the hypothesis testing problem, and the possibility of the combined stochastic / deterministic interpretation. |
Date: | 2022–01–14 |
URL: | http://d.repec.org/n?u=RePEc:osf:osfxxx:xdf72&r= |
By: | Carlo Drago; Andrea Scozzari |
Abstract: | Modeling and forecasting of dynamically varying covariances have received much attention in the literature. The two most widely used conditional covariances and correlations models are BEKK and DCC. In this paper, we advance a new method to introduce targeting in both models to estimate matrices associated with financial time series. Our approach is based on specific groups of highly correlated assets in a financial market, and these relationships remain unaltered over time. Based on the estimated parameters, we evaluate our targeting method on simulated series by referring to two well-known loss functions introduced in the literature and Network analysis. We find all the maximal cliques in correlation graphs to evaluate the effectiveness of our method. Results from an empirical case study are encouraging, mainly when the number of assets is not large. |
Date: | 2022–02 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2202.02197&r= |
By: | Verhagen, Mark D. |
Abstract: | `All models are wrong, but some are useful' is an often-used mantra, particularly when a model's ability to capture the full complexities of social life is questioned. However, an appropriate functional form is key to valid statistical inference, and under-estimating complexity can lead to biased results. Unfortunately, it is unclear a-priori what the appropriate complexity of a functional form should be. I propose to use methods from machine learning to identify the appropriate complexity of the functional form by i) generating an estimate of the fit potential of the outcome given a set of explanatory variables, ii) comparing this potential with the fit from the functional form originally hypothesized by the researcher, and iii) in case a lack of fit is identified, using recent advances in the field of explainable AI to generate understanding into the missing complexity. I illustrate the approach with a range of simulation and real-world examples. |
Date: | 2021–12–01 |
URL: | http://d.repec.org/n?u=RePEc:osf:socarx:bka76&r= |