nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒08‒15
twenty-six papers chosen by
Sune Karlsson
Örebro universitet

  1. Unbiased estimation of the OLS covariance matrix when the errors are clustered By Tom Boot; Gianmaria Niccodemi; Tom Wansbeek
  2. Dynamic Co-Quantile Regression By Timo Dimitriadis; Yannick Hoga
  3. On the impact of serial dependence on penalized regression methods By Simone Tonini; Francesca Chiaromonte; Alessandro Giovannelli
  4. Isotonic propensity score matching By Taisuke Otsu; Meghan Xu; Meghan Xu
  5. A new algorithm for structural restrictions in Bayesian vector autoregressions By Dimitris Korobilis
  6. Threshold spatial autoregressive model By Li, Kunpeng
  7. Multivariate mixed Poisson Generalized Inverse Gaussian INAR(1) regression By Chen, Zezhun; Dassios, Angelos; Tzougas, George
  8. Symmetric generalized Heckman models By Helton Saulo; Roberto Vila; Shayane S. Cordeiro
  9. Efficient Bias Correction for Cross-section and Panel Data By Jinyong Hahn; David W. Hughes; Guido Kuersteiner; Whitney K. Newey
  10. Statistical inference of lead-lag at various timescales between asynchronous time series from p-values of transfer entropy By Christian Bongiorno; Damien Challet
  11. Large Bayesian VARs with Factor Stochastic Volatility: Identification, Order Invariance and Structural Analysis By Joshua Chan; Eric Eisenstat; Xuewen Yu
  12. Testing for unit roots based on sample autocovariances By Chang, Jinyuan; Cheng, Guanghui; Yao, Qiwei
  13. Estimating value at risk: LSTM vs. GARCH By Weronika Ormaniec; Marcin Pitera; Sajad Safarveisi; Thorsten Schmidt
  14. Testing for a Threshold in Models with Endogenous Regressors By Mario P. Rothfelder; Otilia Boldea
  15. Fitting the Cox proportional hazards model to interval-censored data By Danyu Lin
  16. The Impossibility of Testing for Dependence Using Kendall’s Ƭ Under Missing Data of Unknown Form By Oliver R. Cutbill; Rami V. Tabri
  17. Policy Learning under Endogeneity Using Instrumental Variables By Yan Liu
  18. Modeling Multivariate Positive-Valued Time Series Using R-INLA By Chiranjit Dutta; Nalini Ravishanker; Sumanta Basu
  19. The Sine Aggregatio Approach to Applied Macro By Timothy G. Conley; Bill Dupor; Mahdi Ebsim
  20. Data depth and multiple output regression, the distorted M-quantiles approach By Ochoa Arellano, Maicol Jesús; Cascos Fernández, Ignacio
  21. An Expectile Strong Law of Large Numbers By Collin Philipps
  22. A Bayesian Approach to Inference on Probabilistic Surveys By Federico Bassetti; Roberto Casarin; Marco Del Negro
  23. Modeling Randomly Walking Volatility with Chained Gamma Distributions By Di Zhang; Qiang Niu; Youzhou Zhou
  24. Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning By Anthony Coache; Sebastian Jaimungal; \'Alvaro Cartea
  25. Most claimed statistical findings in cross-sectional return predictability are likely true By Andrew Y. Chen
  26. The Virtue of Complexity in Return Prediction By Bryan T. Kelly; Semyon Malamud; Kangying Zhou

  1. By: Tom Boot; Gianmaria Niccodemi; Tom Wansbeek
    Abstract: When data are clustered, common practice has become to do OLS and use an estimator of the covariance matrix of the OLS estimator that comes close to unbiasedness. In this paper we derive an estimator that is unbiased when the random-effects model holds. We do the same for two more general structures. We study the usefulness of these estimators against others by simulation, the size of the $t$-test being the criterion. Our findings suggest that the choice of estimator hardly matters when the regressor has the same distribution over the clusters. But when the regressor is a cluster-specific treatment variable, the choice does matter and the unbiased estimator we propose for the random-effects model shows excellent performance, even when the clusters are highly unbalanced.
    Date: 2022–06
  2. By: Timo Dimitriadis; Yannick Hoga
    Abstract: The popular systemic risk measure CoVaR (conditional Value-at-Risk) is widely used in economics and finance. Formally, it is defined as an (extreme) quantile of one variable (e.g., losses in the financial system) conditional on some other variable (e.g., losses in a bank's shares) being in distress and, hence, measures the spillover of risks. In this article, we propose a dynamic "Co-Quantile Regression", which jointly models VaR and CoVaR semiparametrically. We propose a two-step M-estimator drawing on recently proposed bivariate scoring functions for the pair (VaR, CoVaR). Among others, this allows for the estimation of joint dynamic forecasting models for (VaR, CoVaR). We prove the asymptotic normality of the proposed estimator and simulations illustrate its good finite-sample properties. We apply our co-quantile regression to correct the statistical inference in the existing literature on CoVaR, and to generate CoVaR forecasts for real financial data, which are shown to be superior to existing methods.
    Date: 2022–06
  3. By: Simone Tonini; Francesca Chiaromonte; Alessandro Giovannelli
    Abstract: This paper characterizes the impact of serial dependence on the non-asymptotic estimation error bound of penalized regressions (PRs). Focusing on the direct relationship between the degree of cross-correlation of covariates and the estimation error bound of PRs, we show that orthogonal or weakly cross-correlated stationary AR processes can exhibit high spurious cross-correlations caused by serial dependence. In this respect, we study analytically the density of sample cross-correlations in the simplest case of two orthogonal Gaussian AR(1) processes. Simulations show that our results can be extended to the general case of weakly cross-correlated non Gaussian AR processes of any autoregressive order. To improve the estimation performance of PRs in a time series regime, we propose an approach based on applying PRs to the residuals of ARMA models fit on the observed time series. We show that under mild assumptions the proposed approach allows us both to reduce the estimation error and to develop an effective forecasting strategy. The estimation accuracy of our proposal is numerically evaluated through simulations. To assess the effectiveness of the forecasting strategy, we provide the results of an empirical application to monthly macroeconomic data relative to the Euro Area economy.
    Keywords: Serial dependence; spurious correlation; minimum eigenvalue; penalized regressions; estimation accuracy.
    Date: 2022–07–27
  4. By: Taisuke Otsu; Meghan Xu; Meghan Xu
    Abstract: We propose a one-to-many matching estimator of the average treatment effect based on propensity scores estimated by isotonic regression. The method relies on the monotonicity assumption on the propensity score function, which can be justified in many applications in economics. We show that the nature of the isotonic estimator can help us to fix many problems of existing matching methods, including efficiency, choice of the number of matches, choice of tuning parameters, robustness to propensity score misspecification, and bootstrap validity. As a by-product, a uniformly consistent isotonic estimator is developed for our proposed matching method.
    Keywords: Matching, Propensity score, Isotonic regression
    JEL: C14
    Date: 2022–07
  5. By: Dimitris Korobilis
    Abstract: A comprehensive methodology for inference in vector autoregressions (VARs) using sign and other structural restrictions is developed. The reduced-form VAR disturbances are driven by a few common factors and structural identification restrictions can be incorporated in their loadings in the form of parametric restrictions. A Gibbs sampler is derived that allows for reduced-form parameters and structural restrictions to be sampled efficiently in one step. A key benefit of the proposed approach is that it allows for treating parameter estimation and structural inference as a joint problem. An additional benefit is that the methodology can scale to large VARs with multiple shocks, and it can be extended to accommodate non-linearities, asymmetries, and numerous other interesting empirical features. The excellent properties of the new algorithm for inference are explored using synthetic data experiments, and by revisiting the role of financial factors in economic fluctuations using identification based on sign restrictions.
    Date: 2022–06
  6. By: Li, Kunpeng
    Abstract: This paper considers the estimation and inferential issues of threshold spatial autoregressive model, which is a hybrid of threshold model and spatial autoregressive model. We consider using the quasi maximum likelihood (QML) method to estimate the model. We prove the tightness and the H\'{a}jek-R\'{e}nyi type inequality for a quadratic form, and establish a full inferential theory of the QML estimator under the setup that threshold effect shrinks to zero along with an increasing sample size. We consider the hypothesis testing on the presence of threshold effect. Three super-type statistics are proposed to perform this testing. Their asymptotic behaviors are studied under the Pitman local alternatives. A bootstrap procedure is proposed to obtain the asymptotically correct critical value. We also consider the hypothesis testing on the threshold value equal to some prespecified one. We run Monte carlo simulations to investigate the finite sample performance of the QML estimators and find that the QML estimators have good performance.
    Keywords: Spatial autoregressive models, Spillover effects, Threshold effect, Maximum likelihood estimation, Inferential theory.
    JEL: C12 C31
    Date: 2022–06–27
  7. By: Chen, Zezhun; Dassios, Angelos; Tzougas, George
    Abstract: In this paper, we present a novel family of multivariate mixed Poisson-Generalized Inverse Gaussian INAR(1), MMPGIG-INAR(1), regression models for modelling time series of overdispersed count response variables in a versatile manner. The statistical properties associated with the proposed family of models are discussed and we derive the joint distribution of innovations across all the sequences. Finally, for illustrative purposes different members of the MMPGIG-INAR(1) class are fitted to Local Government Property Insurance Fund data from the state of Wisconsin via maximum likelihood estimation.
    Keywords: count data time series; multivariate INAR(1) regression models; multivariate mixed Poisson- Generalized Inverse Gaussian; correlated time series; maximum likelihood estimation; Springer deal
    JEL: C1
    Date: 2022–07–09
  8. By: Helton Saulo; Roberto Vila; Shayane S. Cordeiro
    Abstract: The sample selection bias problem arises when a variable of interest is correlated with a latent variable, and involves situations in which the response variable had part of its observations censored. Heckman (1976) proposed a sample selection model based on the bivariate normal distribution that fits both the variable of interest and the latent variable. Recently, this assumption of normality has been relaxed by more flexible models such as the Student-t distribution (Marchenko and Genton, 2012; Lachos et al., 2021). The aim of this work is to propose generalized Heckman sample selection models based on symmetric distributions (Fang et al., 1990). This is a new class of sample selection models, in which variables are added to the dispersion and correlation parameters. A Monte Carlo simulation study is performed to assess the behavior of the parameter estimation method. Two real data sets are analyzed to illustrate the proposed approach.
    Date: 2022–06
  9. By: Jinyong Hahn; David W. Hughes; Guido Kuersteiner; Whitney K. Newey
    Abstract: Bias correction can often improve the finite sample performance of estimators. We show that the choice of bias correction method has no effect on the higher-order variance of semiparametrically efficient parametric estimators, so long as the estimate of the bias is asymptotically linear. It is also shown that bootstrap, jackknife, and analytical bias estimates are asymptotically linear for estimators with higher-order expansions of a standard form. In particular, we find that for a variety of estimators the straightforward bootstrap bias correction gives the same higher-order variance as more complicated analytical or jackknife bias corrections. In contrast, bias corrections that do not estimate the bias at the parametric rate, such as the split-sample jackknife, result in larger higher-order variances in the i.i.d. setting we focus on. For both a cross-sectional MLE and a panel model with individual fixed effects, we show that the split-sample jackknife has a higher-order variance term that is twice as large as that of the `leave-one-out' jackknife.
    Date: 2022–07
  10. By: Christian Bongiorno; Damien Challet
    Abstract: Symbolic transfer entropy is a powerful non-parametric tool to detect lead-lag between time series. Because a closed expression of the distribution of Transfer Entropy is not known for finite-size samples, statistical testing is often performed with bootstraps whose slowness prevents the inference of large lead-lag networks between long time series. On the other hand, the asymptotic distribution of Transfer Entropy between two time series is known. In this work, we derive the asymptotic distribution of the test for one time series having a larger Transfer Entropy than another one on a target time series. We then measure the convergence speed of both tests in the small sample size limits via benchmarks. We then introduce Transfer Entropy between time-shifted time series, which allows to measure the timescale at which information transfer is maximal and vanishes. We finally apply these methods to tick-by-tick price changes of several hundreds of stocks, yielding non-trivial statistically validated networks.
    Date: 2022–06
  11. By: Joshua Chan; Eric Eisenstat; Xuewen Yu
    Abstract: Vector autoregressions (VARs) with multivariate stochastic volatility are widely used for structural analysis. Often the structural model identified through economically meaningful restrictions--e.g., sign restrictions--is supposed to be independent of how the dependent variables are ordered. But since the reduced-form model is not order invariant, results from the structural analysis depend on the order of the variables. We consider a VAR based on the factor stochastic volatility that is constructed to be order invariant. We show that the presence of multivariate stochastic volatility allows for statistical identification of the model. We further prove that, with a suitable set of sign restrictions, the corresponding structural model is point-identified. An additional appeal of the proposed approach is that it can easily handle a large number of dependent variables as well as sign restrictions. We demonstrate the methodology through a structural analysis in which we use a 20-variable VAR with sign restrictions to identify 5 structural shocks.
    Date: 2022–07
  12. By: Chang, Jinyuan; Cheng, Guanghui; Yao, Qiwei
    Abstract: We propose a new unit-root test for a stationary null hypothesis H0 against a unit-root alternative H1⁠. Our approach is nonparametric as H0 assumes only that the process concerned is I(0)⁠, without specifying any parametric forms. The new test is based on the fact that the sample autocovariance function converges to the finite population autocovariance function for an I(0) process, but diverges to infinity for a process with unit roots. Therefore, the new test rejects H0 for large values of the sample autocovariance function. To address the technical question of how large is large, we split the sample and establish an appropriate normal approximation for the null distribution of the test statistic. The substantial discriminative power of the new test statistic is due to the fact that it takes finite values under H0 and diverges to infinity under H1⁠. This property allows one to truncate the critical values of the test so that it has asymptotic power 1; it also alleviates the loss of power due to the sample-splitting. The test is implemented in R⁠.
    Keywords: autocovariance; integrated processes; normal approximation; power-one test; sample-splitting; EP/V007556/1; OUP deal
    JEL: C1
    Date: 2022–06–01
  13. By: Weronika Ormaniec; Marcin Pitera; Sajad Safarveisi; Thorsten Schmidt
    Abstract: Estimating value-at-risk on time series data with possibly heteroscedastic dynamics is a highly challenging task. Typically, we face a small data problem in combination with a high degree of non-linearity, causing difficulties for both classical and machine-learning estimation algorithms. In this paper, we propose a novel value-at-risk estimator using a long short-term memory (LSTM) neural network and compare its performance to benchmark GARCH estimators. Our results indicate that even for a relatively short time series, the LSTM could be used to refine or monitor risk estimation processes and correctly identify the underlying risk dynamics in a non-parametric fashion. We evaluate the estimator on both simulated and market data with a focus on heteroscedasticity, finding that LSTM exhibits a similar performance to GARCH estimators on simulated data, whereas on real market data it is more sensitive towards increasing or decreasing volatility and outperforms all existing estimators of value-at-risk in terms of exception rate and mean quantile score.
    Date: 2022–07
  14. By: Mario P. Rothfelder; Otilia Boldea
    Abstract: We show by simulation that the test for an unknown threshold in models with endogenous regressors - proposed in Caner and Hansen (2004) - can exhibit severe size distortions both in small and in moderately large samples, pertinent to empirical applications. We propose three new tests that rectify these size distortions. The first test is based on GMM estimators. The other two are based on unconventional 2SLS estimators, that use additional information about the linearity (or lack of linearity) of the first stage. Just like the test in Caner and Hansen (2004), our tests are non-pivotal, and we prove their bootstrap validity. The empirical application revisits the question in Ramey and Zubairy (2018) whether government spending multipliers are larger in recessions, but using tests for an unknown threshold. Consistent with Ramey and Zubairy (2018), we do not find strong evidence that these multipliers are larger in recessions.
    Date: 2022–07
  15. By: Danyu Lin (University of North Carolina at Chapel Hill)
    Abstract: Interval-censored data arise frequently in clinical, epidemiological, financial, and sociological studies, where the event or failure of interest is not observed at an exact time point but is rather known to occur within a time interval induced by periodic examinations. We formulate the effects of potentially time-dependent covariates on the failure time through the familiar Cox proportional hazards model, under which the failure time distribution is completely arbitrary. We consider nonparametric maximum-likelihood estimation with an arbitrary number of examination times for each study subject. We present an EM algorithm that involves very simple calculations and converges stably for any dataset, even in the presence of time-dependent covariates. The resulting estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. In addition, we extend the EM algorithm and the theoretical results to multivariate failure time data, in which there are multiple events per subjects or clustering of study subjects. Finally, we provide illustrations with real medical studies.
    Date: 2022–06–25
  16. By: Oliver R. Cutbill; Rami V. Tabri
    Abstract: This paper discusses the statistical inference problem associated with testing for dependence between two continuous random variables using Kendall’s Ƭ in the context of the missing data problem. We prove the worst-case identified set for this measure of association always includes zero. The consequence of this result is that robust inference for dependence using Kendall’s Ƭ, where robustness is with respect to the form of the missingness-generating process, is impossible.
    Keywords: Impossible Inference; Statistical Dependence; Kendall’s Ƭ; Partial Identification; Missing Data
    Date: 2022–02
  17. By: Yan Liu
    Abstract: This paper studies the statistical decision problem of learning an individualized intervention policy when data are obtained from observational studies or randomized experiments with imperfect compliance. Leveraging an instrumental variable, we provide a social welfare criterion that allows the policymaker to account for endogenous treatment selection. To this end, we incorporate the marginal treatment effects (MTE) when identifying treatment effects parameters and consider encouragement rules that affect social welfare through treatment take-up when designing policies. We focus on settings where encouragement rules are binary decisions on whether or not to offer a user-chosen manipulation of the instrument based on observable characteristics. We apply the representation of the social welfare criterion of encouragement rules via the MTE to the Empirical Welfare Maximization (EWM) method and derive convergence rates of the worst-case regret (welfare loss). We illustrate the EWM encouragement rule using data from the Indonesia Family Life Survey.
    Date: 2022–06
  18. By: Chiranjit Dutta; Nalini Ravishanker; Sumanta Basu
    Abstract: In this paper we describe fast Bayesian statistical analysis of vector positive-valued time series, with application to interesting financial data streams. We discuss a flexible level correlated model (LCM) framework for building hierarchical models for vector positive-valued time series. The LCM allows us to combine marginal gamma distributions for the positive-valued component responses, while accounting for association among the components at a latent level. We use integrated nested Laplace approximation (INLA) for fast approximate Bayesian modeling via the \texttt{R-INLA} package, building custom functions to handle this setup. We use the proposed method to model interdependencies between realized volatility measures from several stock indexes.
    Date: 2022–06
  19. By: Timothy G. Conley; Bill Dupor; Mahdi Ebsim
    Abstract: We develop a method to use disaggregate data to conduct causal inference in macroeconomics. The approach permits one to infer the aggregate effect of a macro treatment using regional outcome data and a valid instrument. We estimate a macro effect without (sine) the aggregation (aggregatio) of the outcome variable. We exploit cross-series parameter restrictions to increase precision relative to traditional, aggregate series estimates and provide a method to assess robustness to modest departures from these restrictions. We illustrate our method via estimating the jobs effect of oil price changes using regional manufacturing employment data and an aggregate oil supply shock.
    Keywords: aggregation; macroeconomic causal effect
    JEL: E3
    Date: 2022–07–11
  20. By: Ochoa Arellano, Maicol Jesús; Cascos Fernández, Ignacio
    Abstract: For a univariate distribution, its M-quantiles are obtained as solutions to asymmetric minimization problems dealing with the distance of a random variable to a fixed point. The asymmetry refers to the different weights for the values of the random variable at either side of the fixed point. We focus on M-quantiles whose associated losses are given in terms of a power. In this setting, the classical quantiles are obtained for the first power, while the expectiles correspond to quadratic losses. The M-quantiles considered here are computed over distorted distributions, which allows to tune the weight awarded to the more central or peripheral parts of the distribution. These distorted M-quantiles are used in the multivariate setting to introduce novel families of central regions and their associated depth functions, which are further extended to the multiple output regression setting in the form of conditional regression regions and conditional depths.
    Keywords: Bivariate Depth Algorithm; Data Depth; Distortion Function; Conditional Regression Region; M-Quantiles
    Date: 2022–07–14
  21. By: Collin Philipps (Department of Economics and Geosciences, US Air Force Academy)
    Abstract: We show that Kolmogorov's classical strong law of large numbers applies to all expectiles uniformly. The expectiles of a random sample converge almost surely (uniformly) to the true expectiles if and only if the true data generating process has a finite first moment. The result holds for expectile functions of scalar and vector-valued random variables and can be reformulated to state that the mean (or any expectile) of a random sample converges almost surely to the true mean (or expectile) if and only if any arbitrary expectile exists and is finite.
    Keywords: Expectile Regression, Quantile Regression, Strong Law of Large Numbers
    JEL: C0 C21 C46
    Date: 2022–07
  22. By: Federico Bassetti; Roberto Casarin; Marco Del Negro
    Abstract: We propose a nonparametric Bayesian approach for conducting inference on probabilistic surveys. We use this approach to study whether U.S. Survey of Professional Forecasters density projections for output growth and inflation are consistent with the noisy rational expectations hypothesis. We find that in contrast to theory, for horizons close to two years, there is no relationship whatsoever between subjective uncertainty and forecast accuracy for output growth density projections, both across forecasters and over time, and only a mild relationship for inflation projections. As the horizon shortens, the relationship becomes one-to-one, as the theory would predict.
    Keywords: Bayesian interface; Bayesian nonparametric; Survey of Professional Forecasters; noisy rational expectations
    JEL: C11 C13 C15 C32 C58 G12
    Date: 2022–07–01
  23. By: Di Zhang; Qiang Niu; Youzhou Zhou
    Abstract: Volatility clustering is a common phenomenon in financial time series. Typically, linear models are used to describe the temporal autocorrelation of the (logarithmic) variance of returns. Considering the difficulty in estimation of this model, we construct a Dynamic Bayesian Network, which utilizes the conjugate prior relation of normal-gamma and gamma-gamma, so that at each node, its posterior form locally remains unchanged. This makes it possible to quickly find approximate solutions using variational methods. Furthermore, we ensure that the volatility expressed by the model is an independent incremental process after inserting dummy gamma nodes between adjacent time steps. We have found that, this model has two advantages: 1) It can be proved that it can express heavier tails than Gaussians, i.e., have positive excess kurtosis, compared to popular linear models. 2) If the variational inference(VI) is used for state estimation, it runs much faster than Monte Carlo(MC) methods, since the calculation of the posterior uses only basic arithmetic operations. And, its convergence process is deterministic. We tested the model, named Gam-Chain, using recent Crypto, Nasdaq, and Forex records of varying resolutions. The results show that: 1) In the same case of using MC, this model can achieve comparable state estimation results with the regular lognormal chain. 2) In the case of only using VI, this model can obtain accuracy that are slightly worse than MC, but still acceptable in practice; 3) Only using VI, the running time of Gam-Chain, under the most conservative settings, can be reduced to below 20% of that based on the lognormal chain via MC.
    Date: 2022–07
  24. By: Anthony Coache; Sebastian Jaimungal; \'Alvaro Cartea
    Abstract: We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.
    Date: 2022–06
  25. By: Andrew Y. Chen
    Abstract: I present two simple bounds for the false discovery rate (FDR) that account for publication bias. The first assumes that the publication process is not worse at finding predictability than atheoretical data-mining. The second conservatively extrapolates by assuming that there are exponentially more file-drawer t-stats than published t-stats. Both methods find that at least 75% of findings in cross-sectional predictability are true. I show that, surprisingly, Harvey, Liu, and Zhu's (2016) estimates imply a similar FDR. I discuss interpretations and relate to the biostatistics literature. My analysis shows that carefully mapping multiple testing statistics to economic interpretations is important.
    Date: 2022–06
  26. By: Bryan T. Kelly; Semyon Malamud; Kangying Zhou
    Abstract: The extant literature predicts market returns with “simple” models that use only a few parameters. Contrary to conventional wisdom, we theoretically prove that simple models severely understate return predictability compared to “complex” models in which the number of parameters exceeds the number of observations. We empirically document the virtue of complexity in US equity market return prediction. Our findings establish the rationale for modeling expected returns through machine learning.
    JEL: C1 C45 G1
    Date: 2022–07

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.