
on Econometrics 
By:  Liang Jiang; Oliver B. Linton; Haihan Tang; Yichong Zhang 
Abstract:  We study regression adjustments with additional covariates in randomized experiments under covariateadaptive randomizations (CARs) when subject compliance is imperfect. We develop a regressionadjusted local average treatment effect (LATE) estimator that is proven to improve efficiency in the estimation of LATEs under CARs. Our adjustments can be parametric in linear and nonlinear forms, nonparametric, and highdimensional. Even when the adjustments are misspecified, our proposed estimator is still consistent and asymptotically normal, and their inference method still achieves the exact asymptotic size under the null. When the adjustments are correctly specified, our estimator achieves the minimum asymptotic variance. When the adjustments are parametrically misspecified, we construct a new estimator which is weakly more efficient than linearly and nonlinearly adjusted estimators, as well as the one without any adjustments. Simulation evidence and empirical application confirm efficiency gains achieved by regression adjustments relative to both the estimator without adjustment and the standard twostage least squares estimator. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.13004&r= 
By:  Jad Beyhum; Samuele Centorrino; JeanPierre Florens; Ingrid Van Keilegom 
Abstract:  This paper considers identification and estimation of the causal effect of the time Z until a subject is treated on a survival outcome T. The treatment is not randomly assigned, T is randomly right censored by a random variable C and the time to treatment Z is right censored by min(T,C) The endogeneity issue is treated using an instrumental variable explaining Z and independent of the error term of the model. We study identification in a fully nonparametric framework. We show that our specification generates an integral equation, of which the regression function of interest is a solution. We provide identification conditions that rely on this identification equation. For estimation purposes, we assume that the regression function follows a parametric model. We propose an estimation procedure and give conditions under which the estimator is asymptotically normal. The estimators exhibit good finite sample properties in simulations. Our methodology is applied to find evidence supporting the efficacy of a therapy for burnout. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.10826&r= 
By:  Harold D Chiang; Bruce E Hansen; Yuya Sasaki 
Abstract:  We propose improved standard errors and an asymptotic distribution theory for twoway clustered panels. Our proposed estimator and theory allow for arbitrary serial dependence in the common time effects, which is excluded by existing twoway methods, including the popular twoway cluster standard errors of Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021). Our asymptotic distribution theory is the first which allows for this level of interdependence among the observations. Under weak regularity conditions, we demonstrate that the least squares estimator is asymptotically normal, our proposed variance estimator is consistent, and tratios are asymptotically standard normal, permitting conventional inference. We present simulation evidence that confidence intervals constructed with our proposed standard errors obtain superior coverage performance relative to existing methods. We illustrate the relevance of the proposed method in an empirical application to a standard FamaFrench threefactor regression. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.11304&r= 
By:  Alexander Georges Gretener; Matthias Neuenkirch; Dennis Umlandt 
Abstract:  We propose a novel dynamic mixture vector autoregressive (VAR) model in which timevarying mixture weights are driven by the predictive likelihood score. Intuitively, the state weight of the kth component VAR model in the subsequent period is increased if the current observation is more likely to be drawn from this particular state. The model is not limited to a specific distributional assumption and allows for straightforward likelihoodbased estimation and inference. We conduct a Monte Carlo study and find that the scoredriven mixture VAR model is able to adequately filter the mixture dynamics from a variety of different data generating processes which most other observationdriven dynamic mixture VAR models cannot appropriately cope with. Finally, we illustrate our approach by an application where we model the conditional joint distribution of economic and financial conditions and derive generalized impulse responses. 
Keywords:  Dynamic Mixture Models; Generalized Autoregressive Score Models; MacroFinancial Linkages; Nonlinear VAR 
JEL:  C32 C34 G17 
Date:  2022 
URL:  http://d.repec.org/n?u=RePEc:trr:wpaper:202202&r= 
By:  Carolina Caetano; Brantly Callaway; Stroud Payne; Hugo Sant'Anna Rodrigues 
Abstract:  This paper considers identification and estimation of causal effect parameters from participating in a binary treatment in a difference in differences (DID) setup when the parallel trends assumption holds after conditioning on observed covariates. Relative to existing work in the econometrics literature, we consider the case where the value of covariates can change over time and, potentially, where participating in the treatment can affect the covariates themselves. We propose new empirical strategies in both cases. We also consider twoway fixed effects (TWFE) regressions that include timevarying regressors, which is the most common way that DID identification strategies are implemented under conditional parallel trends. We show that, even in the case with only two time periods, these TWFE regressions are not generally robust to (i) timevarying covariates being affected by the treatment, (ii) treatment effects and/or paths of untreated potential outcomes depending on the level of timevarying covariates in addition to only the change in the covariates over time, (iii) treatment effects and/or paths of untreated potential outcomes depending on timeinvariant covariates, (iv) treatment effect heterogeneity with respect to observed covariates, and (v) violations of strong functional form assumptions, both for outcomes over time and the propensity score, that are unlikely to be plausible in most DID applications. Thus, TWFE regressions can deliver misleading estimates of causal effect parameters in a number of empirically relevant cases. We propose both doubly robust estimands and regression adjustment/imputation strategies that are robust to these issues while not being substantially more challenging to implement. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.02903&r= 
By:  KeLi Xu (Indiana University Bloomington) 
Abstract:  We consider inference for predictive regressions with multiple predictors. Extant tests for predictability may perform unsatisfactorily and tend to discover spurious predictability as the number of predictors increases. We propose a battery of new instrumentalvariables based tests which involve enforcement or partial enforcement of the null hypothesis in variance estimation. A test based on the fewpredictorsatatime parsimonious system approach is recommended. Empirical Monte Carlos demonstrate the remarkable finitesample performance regardless of numerosity of predictors and their persistence properties. Empirical application to equity premium predictability is provided. 
Keywords:  Uniform inference, impulse responses, local projections, persistence 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:inu:caeprp:2022002&r= 
By:  H. Peter Boswijk; Giuseppe Cavaliere; Luca De Angelis; A. M. Robert Taylor 
Abstract:  Standard methods, such as sequential procedures based on Johansen's (pseudo)likelihood ratio (PLR) test, for determining the cointegration rank of a vector autoregressive (VAR) system of variables integrated of order one can be significantly affected, even asymptotically, by unconditional heteroskedasticity (nonstationary volatility) in the data. Known solutions to this problem include wild bootstrap implementations of the PLR test or the use of an information criterion, such as the BIC, to select the cointegration rank. Although asymptotically valid in the presence of heteroskedasticity, these methods can display very low finite sample power under some patterns of nonstationary volatility. In particular, they do not exploit potential efficiency gains that could be realised in the presence of nonstationary volatility by using adaptive inference methods. Under the assumption of a known autoregressive lag length, Boswijk and Zu (2022) develop adaptive PLR test based methods using a nonparameteric estimate of the covariance matrix process. It is wellknown, however, that selecting an incorrect lag length can significantly impact on the efficacy of both information criteria and bootstrap PLR tests to determine cointegration rank in finite samples. We show that adaptive information criteriabased approaches can be used to estimate the autoregressive lag order to use in connection with bootstrap adaptive PLR tests, or to jointly determine the cointegration rank and the VAR lag length and that in both cases they are weakly consistent for these parameters in the presence of nonstationary volatility provided standard conditions hold on the penalty term. Monte Carlo simulations are used to demonstrate the potential gains from using adaptive methods and an empirical application to the U.S. term structure is provided. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.02532&r= 
By:  Juho Koistinen; Bernd Funovits 
Abstract:  We propose a new parametrization for the estimation and identification of the impulseresponse functions (IRFs) of dynamic factor models (DFMs). The theoretical contribution of this paper concerns the problem of observational equivalence between different IRFs, which implies nonidentification of the IRF parameters without further restrictions. We show how the minimal identification conditions proposed by Bai and Wang (2015) are nested in the proposed framework and can be further augmented with overidentifying restrictions leading to efficiency gains. The current standard practice for the IRF estimation of DFMs is based on principal components, compared to which the new parametrization is less restrictive and allows for modelling richer dynamics. As the empirical contribution of the paper, we develop an estimation method based on the EM algorithm, which incorporates the proposed identification restrictions. In the empirical application, we use a standard highdimensional macroeconomic dataset to estimate the effects of a monetary policy shock. We estimate a strong reaction of the macroeconomic variables, while the benchmark models appear to give qualitatively counterintuitive results. The estimation methods are implemented in the accompanying R package. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.00310&r= 
By:  Georg Keilbar; Juan M. RodriguezPoo; Alexandra Soberon; Weining Wang 
Abstract:  This paper presents a new approach to estimation and inference in panel data models with interactive fixed effects, where the unobserved factor loadings are allowed to be correlated with the regressors. A distinctive feature of the proposed approach is to assume a nonparametric specification for the factor loadings, that allows us to partial out the interactive effects using sieve basis functions to estimate the slope parameters directly. The new estimator adopts the wellknown partial least squares form, and its $\sqrt{NT}$consistency and asymptotic normality are shown. Later, the common factors are estimated using principal component analysis (PCA), and the corresponding convergence rates are obtained. A Monte Carlo study indicates good performance in terms of mean squared error. We apply our methodology to analyze the determinants of growth rates in OECD countries. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.11482&r= 
By:  Ayden Higgins; Koen Jochmans 
Abstract:  The maximumlikelihood estimator of nonlinear panel data models with fixed effects is consistent but asymptoticallybiased under rectangulararray asymptotics. The literature has thus far concentrated its effort on devising methods to correct the maximumlikelihood estimator for its bias as a means to salvage standard inferential procedures. Instead, we show that the parametric bootstrap replicates the distribution of the (uncorrected) maximumlikelihood estimator in large samples. This justifies the use of confidence sets constructed via standard bootstrap percentile methods. No adjustment for the presence of bias needs to be made. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.11156&r= 
By:  AmirEmad Ghassami; Ilya Shpitser; Eric Tchetgen Tchetgen 
Abstract:  We consider the task of estimating the causal effect of a treatment variable on a longterm outcome variable using data from an observational domain and an experimental domain. The observational data is assumed to be confounded and hence without further assumptions, this dataset alone cannot be used for causal inference. Also, only a shortterm version of the primary outcome variable of interest is observed in the experimental data, and hence, this dataset alone cannot be used for causal inference either. In a recent work, Athey et al. (2020) proposed a method for systematically combining such data for identifying the downstream causal effect in view. Their approach is based on the assumptions of internal and external validity of the experimental data, and an extra novel assumption called latent unconfoundedness. In this paper, we first review their proposed approach and discuss the latent unconfoundedness assumption. Then we propose two alternative approaches for data fusion for the purpose of estimating average treatment effect as well as the effect of treatment on the treated. Our first proposed approach is based on assuming equiconfounding bias for the shortterm and longterm outcomes. Our second proposed approach is based on the proximal causal inference framework, in which we assume the existence of an extra variable in the system which is a proxy of the latent confounder of the treatmentoutcome relation. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.10743&r= 
By:  Bora Kim 
Abstract:  Empirical researchers are often interested in not only whether a treatment affects an outcome of interest, but also how the treatment effect arises. Causal mediation analysis provides a formal framework to identify causal mechanisms through which a treatment affects an outcome. The most popular identification strategy relies on socalled sequential ignorability (SI) assumption which requires that there is no unobserved confounder that lies in the causal paths between the treatment and the outcome. Despite its popularity, such assumption is deemed to be too strong in many settings as it excludes the existence of unobserved confounders. This limitation has inspired recent literature to consider an alternative identification strategy based on an instrumental variable (IV). This paper discusses the identification of causal mediation effects in a setting with a binary treatment and a binary instrumental variable that is both assumed to be random. We show that while IV methods allow for the possible existence of unobserved confounders, additional monotonicity assumptions are required unless the strong constant effect is assumed. Furthermore, even when such monotonicity assumptions are satisfied, IV estimands are not necessarily equivalent to target parameters. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.12752&r= 
By:  Gabriel Okasa 
Abstract:  Estimation of causal effects using machine learning methods has become an active research field in econometrics. In this paper, we study the finite sample performance of metalearners for estimation of heterogeneous treatment effects under the usage of samplesplitting and crossfitting to reduce the overfitting bias. In both synthetic and semisynthetic simulations we find that the performance of the metalearners in finite samples greatly depends on the estimation procedure. The results imply that samplesplitting and crossfitting are beneficial in large samples for bias reduction and efficiency of the metalearners, respectively, whereas fullsample estimation is preferable in small samples. Furthermore, we derive practical recommendations for application of specific metalearners in empirical studies depending on particular data characteristics such as treatment shares and sample size. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.12692&r= 
By:  Zheng, Bang Quan 
Abstract:  This paper assesses the performance of regularized generalized least squares (RGLS) and reweighted least squares (RLS) methodologies in a confirmatory factor analysis model. Normal theory maximum likelihood (ML) and GLS statistics are based on large sample statistical theory. However, violation of asymptotic sample size is ubiquitous in real applications of structural equation modeling (SEM), and ML and GLS goodnessoffit tests in SEM often make incorrect decisions on the true model. The novel methods RGLS and RLS aim to correct the overrejection by ML and underrejection by GLS. Proposed by Arruda and Bentler (2017), RGLS replaces a GLS weight matrix with a regularized one. Rediscovered by Hayakawa (2019), RLS replaces this weight matrix with one that derives from an ML function. Both of these methods outperform ML and GLS when samples are small, yet no studies have compared their relative performance. A confirmatory factor analysis Monte Carlo simulation study with normal and nonnormal data was carried out to examine the statistical performance of these two methods at different sample sizes. Based on empirical rejection frequencies and empirical distributions of test statistics, we find that RLS and RGLS have equivalent performance when N≥70; whereas when N<70, RLS outperforms RGLS. Both methods clearly outperform ML and GLS with N≤400. Nonetheless, adopting mean and variance adjusted test proposed by Hayakawa (2019) for nonnormal data, our results show that RGLS slightly outperforms RLS. 
Date:  2021–10–05 
URL:  http://d.repec.org/n?u=RePEc:osf:socarx:aejgf&r= 
By:  Stephan Martin 
Abstract:  Nonparametric random coefficient (RC)density estimation has mostly been considered in the marginal density case under strict independence of RCs and covariates. This paper deals with the estimation of RCdensities conditional on a (largedimensional) set of control variables using machine learning techniques. The conditional RCdensity allows to disentangle observable from unobservable heterogeneity in partial effects of continuous treatments adding to a growing literature on heterogeneous effect estimation using machine learning. %It is also informative of the conditional potential outcome distribution. This paper proposes a twostage sieve estimation procedure. First a closedform sieve approximation of the conditional RC density is derived where each sieve coefficient can be expressed as conditional expectation function varying with controls. Second, sieve coefficients are estimated with generic machine learning procedures and under appropriate sample splitting rules. The $L_2$convergence rate of the conditional RCdensity estimator is derived. The rate is slower by a factor then typical rates of mean regression machine learning estimators which is due to the illposedness of the RC density estimation problem. The performance and applicability of the estimator is illustrated using random forest algorithms over a range of Monte Carlo simulations and with real data from the SOEPIS. Here behavioral heterogeneity in an economic experiment on portfolio choice is studied. The method reveals two types of behavior in the population, one type complying with economic theory and one not. The assignment to types appears largely based on unobservables not available in the data. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.08366&r= 
By:  B. Cooper Boniece; Jos\'e E. FigueroaL\'opez; Yuchen Han 
Abstract:  Statistical inference for stochastic processes based on highfrequency observations has been an active research area for more than a decade. One of the most wellknown and widely studied problems is that of estimation of the quadratic variation of the continuous component of an It\^o semimartingale with jumps. Several rate and varianceefficient estimators have been proposed in the literature when the jump component is of bounded variation. However, to date, very few methods can deal with jumps of unbounded variation. By developing new highorder expansions of the truncated moments of a L\'evy process, we construct a new rate and varianceefficient estimator for a class of L\'evy processes of unbounded variation, whose small jumps behave like those of a stable L\'evy process with BlumenthalGetoor index less than $8/5$. The proposed method is based on a twostep debiasing procedure for the truncated realized quadratic variation of the process. Our Monte Carlo experiments indicate that the method outperforms other efficient alternatives in the literature in the setting covered by our theoretical framework. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.00877&r= 
By:  Benjamin Poignard; Manabu Asai 
Abstract:  Although multivariate stochastic volatility models usually produce more accurate forecasts compared to MGARCH models, their estimation techniques such as Bayesian MCMC typically suffer from the curse of dimensionality. We propose a fast and efficient estimation approach for MSV based on a penalized OLS framework. Specifying the MSV model as a multivariate statespace model, we carry out a twostep penalized procedure. We provide the asymptotic properties of the twostep estimator and the oracle property of the firststep estimator when the number of parameters diverges. The performances of our method are illustrated through simulations and financial data. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.08584&r= 
By:  Joshua Angrist 
Abstract:  The view that empirical strategies in economics should be transparent and credible now goes almost without saying. The local average treatment effects (LATE) framework for causal inference helped make this so. The LATE theorem tells us for whom particular instrumental variables (IV) and regression discontinuity estimates are valid. This lecture uses several empirical examples, mostly involving charter and exam schools, to highlight the value of LATE. A surprising exclusion restriction, an assumption central to the LATE interpretation of IV estimates, is shown to explain why enrollment at Chicago exam schools reduces student achievement. I also make two broader points: IV exclusion restrictions formalize commitment to clear and consistent explanations of reducedform causal effects; compelling applications demonstrate the power of simple empirical strategies to generate new causal knowledge. 
JEL:  B23 I21 I28 J13 J22 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:29726&r= 
By:  Roberto Casarin (University of Ca' Foscari of Venice); Stefano Grassi (University of Rome Tor Vergata); Francesco Ravazzolo (BI Norwegian Business School); Herman van Dijk (Erasmus University Rotterdam) 
Abstract:  A flexible predictive density combination model is introduced for large financial data sets which allows for dynamic weight learning and model set incompleteness. Dimension reduction procedures allocate the large sets of predictive densities and combination weights to relatively small sets. Given the representation of the probability model in extended nonlinear statespace form, efficient simulationbased Bayesian inference is proposed using parallel sequential clustering as well as nonlinear filtering, implemented on graphics processing units. The approach is applied to combine predictive densities based on a large number of individual stock returns of daily observations over a period that includes the Covid19 crisis period. Evidence on the quantification of predictive accuracy, uncertainty and risk, in particular, in the tails, may provide useful information for investment fund management. Information on dynamic cluster composition, weight patterns and model set incompleteness give also valuable signals for improved modelling and policy specification. 
Keywords:  Density Combination, Large Set of Predictive Densities, Dynamic Factor Models, Nonlinear statespace, Bayesian Inference 
JEL:  C11 C15 C53 E37 
Date:  2022–02–14 
URL:  http://d.repec.org/n?u=RePEc:tin:wpaper:20220013&r= 
By:  Sergey Nadtochiy; Yuan Yin 
Abstract:  This paper presents a tractable sufficient condition for the consistency of maximum likelihood estimators (MLEs) in partially observed diffusion models, stated in terms of stationary distributions of the associated test processes, under the assumption that the set of unknown parameter values is finite. We illustrate the tractability of this sufficient condition by verifying it in the context of a latent price model of market microstructure. Finally, we describe an algorithm for computing MLEs in partially observed diffusion models and test it on historical data to estimate the parameters of the latent price model. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.07656&r= 
By:  Wolf, Elias 
Abstract:  This paper proposes a Skewed Stochastic Volatility (SSV) model to model time varying, asymmetric forecast distributions to estimate Growth at Risk as introduced in Adrian, Boyarchenko, and Giannone's (2019) seminal paper "Vulnerable Growth". In contrary to their semiparametric approach, the SSV model enables researchers to capture the evolution of the densities parametrically to conduct statistical tests and compare different models. The SSVmodel forms a nonlinear, nongaussian state space model that can be estimated using Particle Filtering and MCMC algorithms. To remedy drawbacks of standard Bootstrap Particle Filters, I modify the Tempered Particle Filter of Herbst and Schorfheide's (2019) to account for stochastic volatility and asymmetric measurement densities. Estimating the model based on US data yields conditional forecast densities that closely resemble the findings by Adrian et al. (2019). Exploiting the advantages of the proposed model, I find that the estimated parameter values for the effect of financial conditions on the variance and skewness of the conditional distributions are statistically significant and in line with the intuition of the results found in the existing literature. 
Keywords:  Growth at Risk,Macro Finance,Bayesian Econometrics,Particle Filters 
JEL:  C10 E32 E58 G01 
Date:  2022 
URL:  http://d.repec.org/n?u=RePEc:zbw:fubsbe:20222&r= 
By:  Isuru Ratnayake; V. A. Samaranayake 
Abstract:  This paper introduces a Threshold Asymmetric Conditional Autoregressive Range (TACARR) formulation for modeling the daily price ranges of financial assets. It is assumed that the process generating the conditional expected ranges at each time point switches between two regimes, labeled as upward market and downward market states. The disturbance term of the error process is also allowed to switch between two distributions depending on the regime. It is assumed that a selfadjusting threshold component that is driven by the past values of the time series determines the current market regime. The proposed model is able to capture aspects such as asymmetric and heteroscedastic behavior of volatility in financial markets. The proposed model is an attempt at addressing several potential deficits found in existing price range models such as the Conditional Autoregressive Range (CARR), Asymmetric CARR (ACARR), Feedback ACARR (FACARR) and Threshold Autoregressive Range (TARR) models. Parameters of the model are estimated using the Maximum Likelihood (ML) method. A simulation study shows that the ML method performs well in estimating the TACARR model parameters. The empirical performance of the TACARR model was investigated using IBM index data and results show that the proposed model is a good alternative for insample prediction and outofsample forecasting of volatility. Key Words: Volatility Modeling, Asymmetric Volatility, CARR Models, Regime Switching. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.03351&r= 
By:  MengChen Hsieh; Clifford Hurvich; Philippe Soulier 
Abstract:  We develop and justify methodology to consistently test for longhorizon return predictability based on realized variance. To accomplish this, we propose a parametric transactionlevel model for the continuoustime log price process based on a pure jump point process. The model determines the returns and realized variance at any level of aggregation with properties shown to be consistent with the stylized facts in the empirical finance literature. Under our model, the longmemory parameter propagates unchanged from the transactionlevel drift to the calendartime returns and the realized variance, leading endogenously to a balanced predictive regression equation. We propose an asymptotic framework using powerlaw aggregation in the predictive regression. Within this framework, we propose a hypothesis test for long horizon return predictability which is asymptotically correctly sized and consistent. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.00793&r= 
By:  Caio Almeida (Princeton University); Paul Schneider (University of Lugano) 
Abstract:  We develop a nonnegative polynomial minimumnorm likelihood ratio (PLR) of two distributions of which only moments are known under shape restrictions. The PLR converges to the true, unknown, likelihood ratio. We show consistency, obtain the asymptotic distribution for the PLR coefficients estimated with sample moments, and present two applications. The first develops a PLR for the unknown transition density of a jumpdiffusion process. The second modifies the HansenJagannathan pricing kernel framework to accommodate polynomial return models consistent with noarbitrage while simultaneously nesting the linear return model. 
Keywords:  Likelihood ratio, positive polynomial, Reproducing Kernel Hilbert Space (RKHS) 
JEL:  C13 C51 C61 
Date:  2021–10 
URL:  http://d.repec.org/n?u=RePEc:pri:econom:202145&r= 
By:  Christian A. Scholbeck; Giuseppe Casalicchio; Christoph Molnar; Bernd Bischl; Christian Heumann 
Abstract:  Beta coefficients for linear regression models represent the ideal form of an interpretable feature effect. However, for nonlinear models and especially generalized linear models, the estimated coefficients cannot be interpreted as a direct feature effect on the predicted outcome. Hence, marginal effects are typically used as approximations for feature effects, either in the shape of derivatives of the prediction function or forward differences in prediction due to a change in a feature value. While marginal effects are commonly used in many scientific fields, they have not yet been adopted as a modelagnostic interpretation method for machine learning models. This may stem from their inflexibility as a univariate feature effect and their inability to deal with the nonlinearities found in black box models. We introduce a new class of marginal effects termed forward marginal effects. We argue to abandon derivatives in favor of betterinterpretable forward differences. Furthermore, we generalize marginal effects based on forward differences to multivariate changes in feature values. To account for the nonlinearity of prediction functions, we introduce a nonlinearity measure for marginal effects. We argue against summarizing feature effects of a nonlinear prediction function in a single metric such as the average marginal effect. Instead, we propose to partition the feature space to compute conditional average marginal effects on feature subspaces, which serve as conditional feature effect estimates. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.08837&r= 
By:  Federico Bassetti; Giulia Carallo; Roberto Casarin 
Abstract:  A new integervalued autoregressive process (INAR) with Generalised Lagrangian Katz (GLK) innovations is defined. We show that our GLKINAR process is stationary, discrete semiselfdecomposable, infinite divisible, and provides a flexible modelling framework for count data allowing for under and overdispersion, asymmetry, and excess of kurtosis. A Bayesian inference framework and an efficient posterior approximation procedure based on Markov Chain Monte Carlo are provided. The proposed model family is applied to a Google Trend dataset which proxies the public concern about climate change around the world. The empirical results provide new evidence of heterogeneity across countries and keywords in the persistence, uncertainty, and longrun public awareness level. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.02029&r= 
By:  Jose Blanchet; Fernando Hernandez; Viet Anh Nguyen; Markus Pelger; Xuhui Zhang 
Abstract:  Missing timeseries data is a prevalent problem in finance. Imputation methods for timeseries data are usually applied to the full panel data with the purpose of training a model for a downstream outofsample task. For example, the imputation of missing returns may be applied prior to estimating a portfolio optimization model. However, this practice can result in a lookaheadbias in the future performance of the downstream task. There is an inherent tradeoff between the lookaheadbias of using the full data set for imputation and the larger variance in the imputation from using only the training data. By connecting layers of information revealed in time, we propose a Bayesian consensus posterior that fuses an arbitrary number of posteriors to optimally control the variance and lookaheadbias tradeoff in the imputation. We derive tractable twostep optimization procedures for finding the optimal consensus posterior, with KullbackLeibler divergence and Wasserstein distance as the measure of dissimilarity between posterior distributions. We demonstrate in simulations and an empirical study the benefit of our imputation mechanism for portfolio optimization with missing returns. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.00871&r= 
By:  Stefanos Bennett; Mihai Cucuringu; Gesine Reinert 
Abstract:  In multivariate time series systems, it has been observed that certain groups of variables partially lead the evolution of the system, while other variables follow this evolution with a time delay; the result is a leadlag structure amongst the time series variables. In this paper, we propose a method for the detection of leadlag clusters of time series in multivariate systems. We demonstrate that the web of pairwise leadlag relationships between time series can be helpfully construed as a directed network, for which there exist suitable algorithms for the detection of pairs of leadlag clusters with high pairwise imbalance. Within our framework, we consider a number of choices for the pairwise leadlag metric and directed network clustering components. Our framework is validated on both a synthetic generative model for multivariate leadlag time series systems and daily realworld US equity prices data. We showcase that our method is able to detect statistically significant leadlag clusters in the US equity market. We study the nature of these clusters in the context of the empirical finance literature on leadlag relations and demonstrate how these can be used for the construction of predictive financial signals. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.08283&r= 
By:  Rafael R. S. Guimaraes 
Abstract:  Limited datasets and complex nonlinear relationships are among the challenges that may emerge when applying econometrics to macroeconomic problems. This research proposes deep learning as an approach to transfer learning in the former case and to map relationships between variables in the latter case. Although macroeconomists already apply transfer learning when assuming a given a priori distribution in a Bayesian context, estimating a structural VAR with signal restriction and calibrating parameters based on results observed in other models, to name a few examples, advance in a more systematic transfer learning strategy in applied macroeconomics is the innovation we are introducing. We explore the proposed strategy empirically, showing that data from different but related domains, a type of transfer learning, helps identify the business cycle phases when there is no business cycle dating committee and to quick estimate a economicbased output gap. Next, since deep learning methods are a way of learning representations, those that are formed by the composition of multiple nonlinear transformations, to yield more abstract representations, we apply deep learning for mapping lowfrequency from highfrequency variables. The results obtained show the suitability of deep learning models applied to macroeconomic problems. First, models learned to classify United States business cycles correctly. Then, applying transfer learning, they were able to identify the business cycles of outofsample Brazilian and European data. Along the same lines, the models learned to estimate the output gap based on the U.S. data and obtained good performance when faced with Brazilian data. Additionally, deep learning proved adequate for mapping lowfrequency variables from highfrequency data to interpolate, distribute, and extrapolate time series by related series. 
Date:  2022–01 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2201.13380&r= 
By:  Driver, Charles C 
Abstract:  The interpretation of crosseffects from vector autoregressive models to infer structure and causality amongst constructs is widespread and sometimes problematic. I first explain how hypothesis testing and regularization are invalidated when processes that are thought to fluctuate continuously in time are, as is typically done, modeled as changing only in discrete steps. I then describe an alternative interpretation of crosseffect parameters that incorporates correlated random changes for a potentially more realistic view of how process are temporally coupled. Using an example based on wellbeing data, I demonstrate how some classical concerns such as sign flipping and counter intuitive effect directions can disappear when using this combined deterministic / stochastic interpretation. Models that treat processes as continuously interacting offer both a resolution to the hypothesis testing problem, and the possibility of the combined stochastic / deterministic interpretation. 
Date:  2022–01–14 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:xdf72&r= 
By:  Carlo Drago; Andrea Scozzari 
Abstract:  Modeling and forecasting of dynamically varying covariances have received much attention in the literature. The two most widely used conditional covariances and correlations models are BEKK and DCC. In this paper, we advance a new method to introduce targeting in both models to estimate matrices associated with financial time series. Our approach is based on specific groups of highly correlated assets in a financial market, and these relationships remain unaltered over time. Based on the estimated parameters, we evaluate our targeting method on simulated series by referring to two wellknown loss functions introduced in the literature and Network analysis. We find all the maximal cliques in correlation graphs to evaluate the effectiveness of our method. Results from an empirical case study are encouraging, mainly when the number of assets is not large. 
Date:  2022–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2202.02197&r= 
By:  Verhagen, Mark D. 
Abstract:  `All models are wrong, but some are useful' is an oftenused mantra, particularly when a model's ability to capture the full complexities of social life is questioned. However, an appropriate functional form is key to valid statistical inference, and underestimating complexity can lead to biased results. Unfortunately, it is unclear apriori what the appropriate complexity of a functional form should be. I propose to use methods from machine learning to identify the appropriate complexity of the functional form by i) generating an estimate of the fit potential of the outcome given a set of explanatory variables, ii) comparing this potential with the fit from the functional form originally hypothesized by the researcher, and iii) in case a lack of fit is identified, using recent advances in the field of explainable AI to generate understanding into the missing complexity. I illustrate the approach with a range of simulation and realworld examples. 
Date:  2021–12–01 
URL:  http://d.repec.org/n?u=RePEc:osf:socarx:bka76&r= 