nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒01‒17
23 papers chosen by
Sune Karlsson
Örebro universitet

  1. Inference in a class of optimization problems: Con?dence regions and ?nite sample bounds on errors in coverage probabilities By Joel L. Horowitz; Sokbae (Simon) Lee
  2. Estimating density ratio of marginals to joint: Applications to causal inference By Yukitoshi Matsushita; Taisuke Otsu; Keisuke Takahata
  3. Modelling hetegeneous treatment effects by quantitle local polynomial decision tree and forest By Lai Xinglin
  4. Dynamic Factor Models with Sparse VAR Idiosyncratic Components By Jonas Krampe; Luca Margaritella
  5. Testing Instrument Validity with Covariates By Thomas Carr; Toru Kitagawa
  6. Efficient Estimation of Average Derivatives in NPIV Models: Simulation Comparisons of Neural Network Estimators By Jiafeng Chen; Xiaohong Chen; Elie Tamer
  7. IV methods for Tobit models By Andrew Chesher; Dongwoo Kim; Adam Rosen
  8. Uniform Convergence for Local Linear Regression Estimation of the Conditional Distribution By Haitian Xie
  9. Estimation based on nearest neighbor matching: from density ratio to average treatment effect By Zhexiao Lin; Peng Ding; Fang Han
  10. A Nonparametric Approach for Studying Teacher Impacts By Mike Gilraine; Jiaying Gu; Robert McMillan
  11. On the Estimation of Cross-Firm Productivity Spillovers with an Application to FDI By Malikov, Emir; Zhao, Shunan
  12. Production Analysis with Asymmetric Noise By Badunenko, Oleg; Henderson, Daniel J.
  13. Location-Scale and Compensated Effects in Unconditional Quantile Regressions By Martinez-Iriarte, Julian; Montes-Rojas, Gabriel; Sun, Yixiao
  14. Volatility and Dependence Models with Applications to U.S. Equity Markets By Pan, Jingwei
  15. Posterior Cramer-Rao Lower Bound based Adaptive State Estimation for Option Price Forecasting By Kumar Yashaswi
  16. Real-Time Forecasting with a (Standard) Mixed-Frequency VAR During a Pandemic By Frank Schorfheide; Dongho Song
  17. Visual Inference and Graphical Representation in Regression Discontinuity Designs By Christina Korting; Carl Lieberman; Jordan Matsudaira; Zhuan Pei; Yi Shen
  18. Robustness, Heterogeneous Treatment Effects and Covariate Shifts By Pietro Emilio Spini
  19. Estimation of inter-sector asset correlations By Christian Meyer
  20. Reinforcing RCTs with Multiple Priors while Learning about External Validity By Frederico Finan; Demian Pouzo
  21. Identification of misreported beliefs By Elias Tsakas
  22. Lassoed Boosting and Linear Prediction in Equities Market By Xiao Huang
  23. The Virtue of Complexity in Machine Learning Portfolios By Bryan T. Kelly; Semyon Malamud; Kangying Zhou

  1. By: Joel L. Horowitz (Institute for Fiscal Studies and Northwestern University); Sokbae (Simon) Lee (Institute for Fiscal Studies and Columbia University)
    Abstract: This paper describes three methods for carrying out non-asymptotic inference on partially identi?ed parameters that are solutions to a class of optimization problems. Applications in which the optimization problems arise include estimation under shape restrictions, estimation of models of discrete games, and estimation based on grouped data. The partially identi?ed parameters are characterized by restrictions that involve the unknown population means of observed random variables in addition to the structural parameters of interest. Inference consists of ?nding con?dence intervals for the structural parameters. Our theory provides ?nite-sample lower bounds on the coverage probabilities of the con?dence intervals under three sets of assumptions of increasing strength. With the moderate sample sizes found in most economics applications, the bounds become tighter as the assumptions strengthen. We discuss estimation of population parameters that the bounds depend on and contrast our methods with alternative methods for obtaining con?dence intervals for partially identi?ed parameters. The results of Monte Carlo experiments and empirical examples illustrate the usefulness of our method.
    Date: 2021–08–02
  2. By: Yukitoshi Matsushita; Taisuke Otsu; Keisuke Takahata
    Abstract: In various fields of data science, researchers often face problems of estimating the ratios of two probability densities. Particularly in the context of causal inference, the product of marginals for a treatment variable and covariates to their joint density ratio typically emerges in the process of constructing causal effect estimators. This paper applies the general least square density ratio estimation methodology by Kanamori, Hido and Sugiyama (2009) to the product of marginals to joint density ratio, and demonstrates its usefulness particularly for causal inference on continuous treatment effects and dose-response curves. The proposed method is illustrated by a simulation study and an empirical example to investigate the treatment effect of political advertisements in the U.S. presidential campaign data.
    Keywords: density ratio, causal inference, nonparametric estimation
    JEL: C14
    Date: 2022–01
  3. By: Lai Xinglin
    Abstract: To further develop the statistical inference problem for heterogeneous treatment effects, this paper builds on Breiman's (2001) random forest tree (RFT)and Wager et al.'s (2018) causal tree to parameterize the nonparametric problem using the excellent statistical properties of classical OLS and the division of local linear intervals based on covariate quantile points, while preserving the random forest trees with the advantages of constructible confidence intervals and asymptotic normality properties [Athey and Imbens (2016),Efron (2014),Wager et al.(2014)\citep{wager2014asymptotic}], we propose a decision tree using quantile classification according to fixed rules combined with polynomial estimation of local samples, which we call the quantile local linear causal tree (QLPRT) and forest (QLPRF).
    Date: 2021–11
  4. By: Jonas Krampe; Luca Margaritella
    Abstract: We reconcile the two worlds of dense and sparse modeling by exploiting the positive aspects of both. We employ a dynamic factor model and assume the idiosyncratic term follows a sparse vector autoregressive model (VAR) which allows for cross-sectional and time dependence. The estimation is articulated in two steps: first, the factors and their loadings are estimated via principal component analysis and second, the sparse VAR is estimated by regularized regression on the estimated idiosyncratic components. We prove consistency of the proposed estimation approach as the time and cross-sectional dimension diverge. In the second step, the estimation error of the first step needs to be accounted for. Here, we do not follow the naive approach of simply plugging in the standard rates derived for the factor estimation. Instead, we derive a more refined expression of the error. This enables us to derive tighter rates. We discuss the implications to forecasting and semi-parametric estimation of the inverse of the spectral density matrix and we complement our procedure with a joint information criteria for the VAR lag-length and the number of factors. The finite sample performance is illustrated by means of an extensive simulation exercise. Empirically, we assess the performance of the proposed method for macroeconomic forecasting using the FRED-MD dataset.
    Date: 2021–12
  5. By: Thomas Carr; Toru Kitagawa
    Abstract: We develop a novel specification test of the instrumental variable identifying assumptions (instrument validity) for heterogeneous treatment effect models with conditioning covariates. Building on the common empirical settings of local average treatment effect and marginal treatment effect analysis, we assume semiparametric dependence between the potential outcomes and conditioning covariates, and show that this allows us to express the testable implications of instrument validity in terms of equality and inequality restrictions among the subdensities of estimable partial residuals. We propose jointly testing these restrictions. To improve power of the test, we propose distillation, a process designed to reduce the sample down to the information useful for detecting violations of the instrument validity inequalities. We perform Monte Carlo exercises to demonstrate the gain in power from testing restrictions jointly and distillation. We apply our test procedure to the college proximity instrument of Card1 (1993), the same-sex instrument of Angrist and Evans (1998), the school leaving age instrument of Oreopoulos (2006), and the mean land gradient instrument of Dinkelman (2011). We find that the null of instrument validity conditional on covariates cannot be rejected for Card (1993) and Dinkelman (2011), but it can be rejected at the 10% level of significance for Angrist and Evans (1998) for some levels of a tuning parameter, and it is rejected at all conventional levels of significance in the case of Oreopoulos (2006).
    Date: 2021–12
  6. By: Jiafeng Chen (Department of Economics, Harvard University); Xiaohong Chen (Cowles Foundation, Yale University); Elie Tamer (Harvard University)
    Abstract: Artiï¬ cial Neural Networks (ANNs) can be viewed as {nonlinear sieves} that can approximate complex functions of high dimensional variables more effectively than linear sieves. We investigate the computational performance of various ANNs in nonparametric instrumental variables (NPIV) models of moderately high dimensional covariates that are relevant to empirical economics. We present two efficient procedures for estimation and inference on a weighted average derivative (WAD): an orthogonalized plug-in with optimally-weighted sieve minimum distance (OP-OSMD) procedure and a sieve efficient score (ES) procedure. Both estimators for WAD use ANN sieves to approximate the unknown NPIV function and are root-n asymptotically normal and first-order equivalent. We provide a detailed practitioner’s recipe for implementing both efficient procedures. This involves the choice of tuning parameters for the unknown NPIV, the conditional expectations and the optimal weighting function that are present in both procedures but also the choice of tuning parameters for the unknown Riesz representer in the ES procedure. We compare their finite-sample performances in various simulation designs that involve smooth NPIV function of up to 13 continuous covariates, different nonlinearities and covariate correlations. Some Monte Carlo ï¬ ndings include: 1) tuning and optimization are more delicate in ANN estimation; 2) given proper tuning, both ANN estimators with various architectures can perform well; 3) easier to tune ANN OP-OSMD estimators than ANN ES estimators; 4) stable inferences are more difficult to achieve with ANN (than spline) estimators; 5) there are gaps between current implementations and approximation theories. Finally, we apply ANN NPIV to estimate average partial derivatives in two empirical demand examples with multivariate covariates.
    Keywords: Artiï¬ cial neural networks, Relu, Sigmoid, Nonparametric instrumental variables, Weighted average derivatives, Optimal sieve minimum distance, Efficient influence, Semiparametric efficiency, Endogenous demand
    JEL: C14 C22
    Date: 2021–12
  7. By: Andrew Chesher (Institute for Fiscal Studies and University College London); Dongwoo Kim (Institute for Fiscal Studies and Simon Fraser University); Adam Rosen (Institute for Fiscal Studies and Duke University)
    Abstract: This paper studies models of processes generating censored outcomes with endogenous explanatory variables and instrumental variable restrictions. Tobit-type left censoring at zero is the primary focus in the exposition. The models studied here are unrestrictive relative to others widely used in practice, so they are relatively robust to misspecification. The models do not specify the process determining endogenous explanatory variables and they do not embody restrictions justifying control function approaches. The models can be partially or point identifying. Identified sets are characterized and it is shown how inference can be performed on scalar functions of partially identified parameters when exogenous variables have rich support. In an application using data on UK household tobacco expenditures inference is conducted on the coefficient of an endogenous total expenditure variable with and without a Gaussian distributional restriction on the unobservable and compared with the results obtained using a point identifying complete triangular model.
    Date: 2021–06–28
  8. By: Haitian Xie
    Abstract: This paper studies the local linear regression (LLR) estimation of the conditional distribution function $F(y|x)$. We derive three uniform convergence results: the uniform bias expansion, the uniform convergence rate, and the uniform asymptotic linear representation. The uniformity of the above results is not only with respect to $x$ but also $y$, and therefore are not covered by the current developments in the literature of local polynomial regressions. Such uniform convergence results are especially useful when the conditional distribution estimator is the first stage of a semiparametric estimator. We demonstrate the usefulness of these uniform results with an example.
    Date: 2021–12
  9. By: Zhexiao Lin; Peng Ding; Fang Han
    Abstract: Nearest neighbor (NN) matching as a tool to align data sampled from different groups is both conceptually natural and practically well-used. In a landmark paper, Abadie and Imbens (2006) provided the first large-sample analysis of NN matching under, however, a crucial assumption that the number of NNs, $M$, is fixed. This manuscript reveals something new out of their study and shows that, once allowing $M$ to diverge with the sample size, an intrinsic statistic in their analysis actually constitutes a consistent estimator of the density ratio. Furthermore, through selecting a suitable $M$, this statistic can attain the minimax lower bound of estimation over a Lipschitz density function class. Consequently, with a diverging $M$, the NN matching provably yields a doubly robust estimator of the average treatment effect and is semiparametrically efficient if the density functions are sufficiently smooth and the outcome model is appropriately specified. It can thus be viewed as a precursor of double machine learning estimators.
    Date: 2021–12
  10. By: Mike Gilraine; Jiaying Gu; Robert McMillan
    Abstract: We propose a nonparametric approach for studying the impacts of teachers, built around the distribution of unobserved teacher value-added. Rather than assuming this distribution is normal (as standard), we show it is nonparametrically identified and can be feasibly estimated. The distribution is central to a new nonparametric estimator for individual teacher value-added that we present, and allows us to compute new metrics for assessing teacher-related policies. Simulations indicate our nonparametric approach performs very well, even in moderately-sized samples. We also show applying our approach in practice can make a significant difference to teacher-relevant policy calculations, compared with widely-used parametric estimates.
    Keywords: Teacher Impacts, Teacher Value-Added, Value-Added Distribution, Nonparametric Estimation, Empirical Bayes, Education Policy, Teacher Release Policy, False Discovery Rate
    JEL: C11 H75 I21 J24
    Date: 2022–01–06
  11. By: Malikov, Emir; Zhao, Shunan
    Abstract: We develop a novel methodology for the proxy variable identification of firm productivity in the presence of productivity-modifying learning and spillovers which facilitates a unified "internally consistent" analysis of the spillover effects between firms. Contrary to the popular two-step empirical approach, ours does not postulate contradictory assumptions about firm productivity across the estimation steps. Instead, we explicitly accommodate crosssectional dependence in productivity induced by spillovers which facilitates identification of both the productivity and spillover effects therein simultaneously. We apply our model to study cross-firmspillovers in China’s electric machinery manufacturing, with a particular focus on productivity effects of inbound FDI.
    Keywords: Production Economics, Productivity Analysis
    Date: 2022–01
  12. By: Badunenko, Oleg; Henderson, Daniel J.
    Abstract: Symmetric noise is the prevailing assumption in production analysis, but it is often violated in practice. Not only does asymmetric noise cause least-squares models to be inefficient, it can hide important features of the data which may be useful to the firm/policymaker. Here we outline how to introduce asymmetric noise into a production or cost framework as well as develop a model to introduce inefficiency into said models. We derive closed-form solutions for the convolution of the noise and inefficiency distributions, the log-likelihood function, and inefficiency, as well as show how to introduce determinants of heteroskedasticity, efficiency and skewness to allow for heterogenous results. We perform a Monte Carlo study and profile analysis to examine the finite sample performance of the proposed estimators. We outline R and Stata packages that we have developed and apply to three empirical applications to show how our methods lead to improved fit, explain features of the data hidden by assuming symmetry, and how our approach is still able to estimate efficiency scores when the least-squares model exhibits the well-known "wrong skewness" problem in production analysis.
    Keywords: asymmetry, production, cost, efficiency, wrong skewness
    JEL: C13 C21 D24 I21
    Date: 2021–12–01
  13. By: Martinez-Iriarte, Julian; Montes-Rojas, Gabriel; Sun, Yixiao
    Abstract: This paper proposes an extension of the unconditional quantile regression analysis to (i) location-scale shifts, and (ii) compensated shifts. The first case is intended to study a counterfactual policy analysis aimed at increasing not only the mean or location of a covariate but also its dispersion or scale. The compensated shift refers to a situation where a shift in a covariate is compensated at a certain rate by another covariate. Not accounting for these possible scale or compensated effects will result in an incorrect assessment of the potential policy effects on the quantiles of an outcome variable. More general interventions and compensated shifts are also considered. The unconditional policy parameters are estimated with simple semiparametric estimators, for which asymptotic properties are studied. Monte Carlo simulations are implemented to study their finite sample performances, and the proposed approach is applied to a Mincer equation to study the effects of a location scale shift in education on the unconditional quantiles of wages.
    Keywords: Social and Behavioral Sciences, Quantile regression, unconditional policy effect, unconditional regression
    Date: 2022–01–10
  14. By: Pan, Jingwei
    Abstract: The dissertation consists of three studies concerning the research fields of evaluating volatility and correlation forecasts as well as modeling of tail dependence. Based on theoretical discussions and empirical studies the methods for modeling the time-varying volatilities and dependence for the financial market data are evaluated. The first study evaluates the volatility forecasts with the basic generalized conditional autoregressive heteroskedasticity (GARCH) model and its asymmetric extensions. The concepts of loss function and model confidence set (MCS) are introduced. The realized volatility is used as benchmark. The main results of Brownlees et al. (2011) can be confirmed and extended. In particular, the one-step forecasts achieve significantly lower average losses than the multi-step forecasts in times of crises. The difference between the one-step and the multi-step forecasts in pre-crisis times is relatively small. The evaluation results demonstrate the strong forecasting performance of the asymmetric model variants. The second study evaluates the multivariate correlation forecasts. The Baba-Engle-Kraft-Kroner (BEKK) model of Engle and Kroner (1995) is compared with the dynamic conditional correlation (DCC) model of Engle (2002). Using a two-stage estimation method, the DCC model is well suited for large correlation matrices. In contrast, the more flexible BEKK model suffers from the curse of dimensionality. The evaluation is based on the class of asymmetric loss functions proposed by Komunjer and Owyang (2012). The results show that the BEKK model cannot better predict the correlations than the simpler DCC model in the trivariate system. Therefore, the application of the DCC model appears to be superior. The third study leads to a flexible approach which separates the univariate marginal distributions from the joint distribution. The different copula functions are presented and the corresponding tail dependence is calculated. The empirical analysis compares different copula functions with a non-parametric approach and three time-dependent approaches. The results show noticeable reactions of tail dependence to the major financial market events. In addition, the lower tail dependence dominates over time. This can be interpreted in a way that joint losses occur more frequently than joint gains.
    Date: 2021
  15. By: Kumar Yashaswi
    Abstract: The use of Bayesian filtering has been widely used in mathematical finance, primarily in Stochastic Volatility models. They help in estimating unobserved latent variables from observed market data. This field saw huge developments in recent years, because of the increased computational power and increased research in the model parameter estimation and implied volatility theory. In this paper, we design a novel method to estimate underlying states (volatility and risk) from option prices using Bayesian filtering theory and Posterior Cramer-Rao Lower Bound (PCRLB), further using it for option price prediction. Several Bayesian filters like Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), Particle Filter (PF) are used for latent state estimation of Black-Scholes model under a GARCH model dynamics. We employ an Average and Best case switching strategy for adaptive state estimation of a non-linear, discrete-time state space model (SSM) like Black-Scholes, using PCRLB based performance measure to judge the best filter at each time step [1]. Since estimating closed-form solution of PCRLB is non-trivial, we employ a particle filter based approximation of PCRLB based on [2]. We test our proposed framework on option data from S$\&$P 500, estimating the underlying state from the real option price, and using it to estimate theoretical price of the option and forecasting future prices. Our proposed method performs much better than the individual applied filter used for estimating the underlying state and substantially improve forecasting capabilities.
    Date: 2021–12
  16. By: Frank Schorfheide; Dongho Song
    Abstract: We resuscitated the mixed-frequency vector autoregression (MF-VAR) developed in Schorfheide and Song (2015, JBES) to generate macroeconomic forecasts for the U.S. during the COVID-19 pandemic in real time. The model combines eleven time series observed at two frequencies: quarterly and monthly. We deliberately did not modify the model specification in view of the COVID-19 outbreak, except for the exclusion of crisis observations from the estimation sample. We compare the MF-VAR forecasts to the median forecast from the Survey of Professional Forecasters (SPF). While the MF-VAR performed poorly during 2020:Q2, subsequent forecasts were at par with the SPF forecasts. We show that excluding a few months of extreme observations is a promising way of handling VAR estimation going forward, as an alternative of a sophisticated modeling of outliers.
    JEL: C11 C32 C53
    Date: 2021–12
  17. By: Christina Korting; Carl Lieberman; Jordan Matsudaira; Zhuan Pei; Yi Shen
    Abstract: Despite the widespread use of graphs in empirical research, little is known about readers' ability to process the statistical information they are meant to convey ("visual inference"). We study visual inference within the context of regression discontinuity (RD) designs by measuring how accurately readers identify discontinuities in graphs produced from data generating processes calibrated on 11 published papers from leading economics journals. First, we assess the effects of different graphical representation methods on visual inference using randomized experiments. We find that bin widths and fit lines have the largest impacts on whether participants correctly perceive the presence or absence of a discontinuity. Incorporating the experimental results into two decision theoretical criteria adapted from the recent economics literature, we find that using small bins with no fit lines to construct RD graphs performs well and recommend it as a starting point to practitioners. Second, we compare visual inference with widely used econometric inference procedures. We find that visual inference achieves similar or lower type I error rates and complements econometric inference.
    Date: 2021–12
  18. By: Pietro Emilio Spini
    Abstract: This paper studies the robustness of estimated policy effects to changes in the distribution of covariates. Robustness to covariate shifts is important, for example, when evaluating the external validity of quasi-experimental results, which are often used as a benchmark for evidence-based policy-making. I propose a novel scalar robustness metric. This metric measures the magnitude of the smallest covariate shift needed to invalidate a claim on the policy effect (for example, $ATE \geq 0$) supported by the quasi-experimental evidence. My metric links the heterogeneity of policy effects and robustness in a flexible, nonparametric way and does not require functional form assumptions. I cast the estimation of the robustness metric as a de-biased GMM problem. This approach guarantees a parametric convergence rate for the robustness metric while allowing for machine learning-based estimators of policy effect heterogeneity (for example, lasso, random forest, boosting, neural nets). I apply my procedure to the Oregon Health Insurance experiment. I study the robustness of policy effects estimates of health-care utilization and financial strain outcomes, relative to a shift in the distribution of context-specific covariates. Such covariates are likely to differ across US states, making quantification of robustness an important exercise for adoption of the insurance policy in states other than Oregon. I find that the effect on outpatient visits is the most robust among the metrics of health-care utilization considered.
    Date: 2021–12
  19. By: Christian Meyer
    Abstract: Asset correlations are an intuitive and therefore popular way to incorporate event dependence into event risk, e.g., default risk, modeling. In this paper we study the case of estimation of inter-sector asset correlations by separation of cross-sectional dimension and time dimension.
    Date: 2021–11
  20. By: Frederico Finan; Demian Pouzo
    Abstract: This paper presents a framework for how to incorporate prior sources of information into the design of a sequential experiment. This information can come from many sources, including previous experiments, expert opinions, or the experimenter's own introspection. We formalize this problem using a multi-prior Bayesian approach that maps each source to a Bayesian model. These models are aggregated according to their associated posterior probabilities. We evaluate our framework according to three criteria: whether the experimenter learns the parameters of the payoff distributions, the probability that the experimenter chooses the wrong treatment when deciding to stop the experiment, and the average rewards. We show that our framework exhibits several nice finite sample properties, including robustness to any source that is not externally valid.
    Date: 2021–12
  21. By: Elias Tsakas
    Abstract: It is well-known that subjective beliefs cannot be identified with traditional choice data unless we impose the strong assumption that preferences are state-independent. This is seen as one of the biggest pitfalls of incentivized belief elicitation. The two common approaches are either to exogenously assume that preferences are state-independent, or to use intractable elicitation mechanisms that require an awful lot of hard-to-get non-traditional choice data. In this paper we use a third approach, introducing a novel methodology that retains the simplicity of standard elicitation mechanisms without imposing the awkward state-independence assumption. The cost is that instead of insisting on full identification of beliefs, we seek identification of misreporting. That is, we elicit beliefs with a standard simple elicitation mechanism, and then by means of a single additional observation we can tell whether the reported beliefs deviate from the actual beliefs, and if so, in which direction they do.
    Date: 2021–12
  22. By: Xiao Huang
    Abstract: We consider a two-stage estimation method for linear regression that uses the lasso in Tibshirani (1996) to screen variables and re-estimate the coefficients using the least-squares boosting method in Friedman (2001) on every set of selected variables. Based on the large-scale simulation experiment in Hastie et al. (2020), the performance of lassoed boosting is found to be as competitive as the relaxed lasso in Meinshausen (2007) and can yield a sparser model under certain scenarios. An application to predict equity returns also shows that lassoed boosting can give the smallest mean square prediction error among all methods under consideration.
    Date: 2021–12
  23. By: Bryan T. Kelly (Yale SOM; AQR Capital Management, LLC; National Bureau of Economic Research (NBER)); Semyon Malamud (Ecole Polytechnique Federale de Lausanne; Centre for Economic Policy Research (CEPR); Swiss Finance Institute); Kangying Zhou (Yale School of Management)
    Abstract: We theoretically characterize the behavior of machine learning portfolios in the high complexity regime, i.e. when the number of parameters exceeds the number of observations. We demonstrate a surprising \virtue of complexity:" Sharpe ratios of machine learning portfolios generally increase with model parameterization, even with minimal regularization. Empirically, we document the virtue of complexity in US equity market timing strategies. High complexity models deliver economically large and statistically significant out-of-sample portfolio gains relative to simpler models, due in large part to their remarkable ability to predict recessions.
    Keywords: Portfolio choice, machine learning, random matrix theory, benign overfit, overparameterization
    JEL: C3 C58 C61 G11 G12 G14
    Date: 2021–12

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.