
on Econometrics 
By:  Angus Deaton 
Abstract:  Randomized controlled trials have been used in economics for 50 years, and intensively in economic development for more than 20. There has been a great deal of useful work, but RCTs have no unique advantages or disadvantages over other empirical methods in economics. They do not simplify inference, nor can an RCT establish causality. Many of the difficulties were recognized and explored in economics 30 years ago, but are sometimes forgotten. I review some of the most relevant issues here. The most troubling questions concern ethics, especially when very poor people are experimented on. Finding out what works, even if such a thing is possible, is in itself a deeply inadequate basis for policy 
JEL:  C01 C93 O22 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:27600&r=all 
By:  Tien Mai; Patrick Jaillet 
Abstract:  We study the relation between different Markov Decision Process (MDP) frameworks in the machine learning and econometrics literatures, including the standard MDP, the entropy and general regularized MDP, and stochastic MDP, where the latter is based on the assumption that the reward function is stochastic and follows a given distribution. We show that the entropyregularized MDP is equivalent to a stochastic MDP model, and is strictly subsumed by the general regularized MDP. Moreover, we propose a distributional stochastic MDP framework by assuming that the distribution of the reward function is ambiguous. We further show that the distributional stochastic MDP is equivalent to the regularized MDP, in the sense that they always yield the same optimal policies. We also provide a connection between stochastic/regularized MDP and constrained MDP. Our work gives a unified view on several important MDP frameworks, which would lead new ways to interpret the (entropy/general) regularized MDP frameworks through the lens of stochastic rewards and viceversa. Given the recent popularity of regularized MDP in (deep) reinforcement learning, our work brings new understandings of how such algorithmic schemes work and suggest ideas to develop new ones. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.07820&r=all 
By:  Youssef M. Aboutaleb; Moshe BenAkiva; Patrick Jaillet 
Abstract:  This paper introduces a new datadriven methodology for nested logit structure discovery. Nested logit models allow the modeling of positive correlations between the error terms of the utility specifications of the different alternatives in a discrete choice scenario through the specification of a nesting structure. Current nested logit model estimation practices require an a priori specification of a nesting structure by the modeler. In this we work we optimize over all possible specifications of the nested logit model that are consistent with rational utility maximization. We formulate the problem of learning an optimal nesting structure from the data as a mixed integer nonlinear programming (MINLP) optimization problem and solve it using a variant of the linear outer approximation algorithm. We exploit the tree structure of the problem and utilize the latest advances in integer optimization to bring practical tractability to the optimization problem we introduce. We demonstrate the ability of our algorithm to correctly recover the true nesting structure from synthetic data in a Monte Carlo experiment. In an empirical illustration using a stated preference survey on modes of transportation in the U.S. state of Massachusetts, we use our algorithm to obtain an optimal nesting tree representing the correlations between the unobserved effects of the different travel mode choices. We provide our implementation as a customizable and opensource code base written in the Julia programming language. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.08048&r=all 
By:  Phillip Heiler 
Abstract:  This paper develops an empirical balancing approach for the estimation of treatment effects under twosided noncompliance using a binary conditionally independent instrumental variable. The method weighs both treatment and outcome information with inverse probabilities to produce exact finite sample balance across instrument level groups. It is free of functional form assumptions on the outcome or the treatment selection step. By tailoring the loss function for the instrument propensity scores, the resulting treatment effect estimates exhibit both low bias and a reduced variance in finite samples compared to conventional inverse probability weighting methods. The estimator is automatically weight normalized and has similar bias properties compared to conventional twostage least squares estimation under constant causal effects for the compliers. We provide conditions for asymptotic normality and semiparametric efficiency and demonstrate how to utilize additional information about the treatment selection step for bias reduction in finite samples. The method can be easily combined with regularization or other statistical learning approaches to deal with a highdimensional number of observed confounding variables. Monte Carlo simulations suggest that the theoretical advantages translate well to finite samples. The method is illustrated in an empirical example. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.04346&r=all 
By:  Boyuan Zhang 
Abstract:  In this paper, we estimate and leverage latent constant group structure to generate the point, set, and density forecasts for short dynamic panel data. We implement a nonparametric Bayesian approach to simultaneously identify coefficients and group membership in the random effects which are heterogeneous across groups but fixed within a group. This method allows us to incorporate subjective prior knowledge on the group structure that potentially improves the predictive accuracy. In Monte Carlo experiments, we demonstrate that our Bayesian grouped random effects (BGRE) estimators produce accurate estimates and score predictive gains over standard panel data estimators. With a datadriven group structure, the BGRE estimators exhibit comparable accuracy of clustering with the nonsupervised machine learning algorithm Kmeans and outperform Kmeans in a twostep procedure. In the empirical analysis, we apply our method to forecast the investment rate across a broad range of firms and illustrate that the estimated latent group structure facilitate forecasts relative to standard panel data estimators. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.02435&r=all 
By:  Lenza, Michele; Primiceri, Giorgio E. 
Abstract:  This paper illustrates how to handle a sequence of extreme observations—such as those recorded during the COVID19 pandemic—when estimating a Vector Autoregression, which is the most popular timeseries model in macroeconomics. Our results show that the adhoc strategy of dropping these observations may be acceptable for the purpose of parameter estimation. However, disregarding these recent data is inappropriate for forecasting the future evolution of the economy, because it vastly underestimates uncertainty. JEL Classification: C32, E32, E37, C11 
Keywords:  COVID19, density forecasts, outliers, volatility 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:ecb:ecbwps:20202461&r=all 
By:  Xiaoqian Wang; Yanfei Kang; Rob J Hyndman; Feng Li 
Abstract:  Providing forecasts for ultralong time series plays a vital role in various activities, such as investment decisions, industrial production arrangements, and farm management. This paper develops a novel distributed forecasting framework to tackle challenges associated with forecasting ultralong time series by utilizing the industrystandard MapReduce framework. The proposed model combination approach facilitates distributed time series forecasting by combining the local estimators of ARIMA (AutoRegressive Integrated Moving Average) models delivered from worker nodes and minimizing a global loss function. In this way, instead of unrealistically assuming the data generating process (DGP) of an ultralong time series stays invariant, we make assumptions only on the DGP of subseries spanning shorter time periods. We investigate the performance of the proposed distributed ARIMA models on an electricity demand dataset. Compared to ARIMA models, our approach results in significantly improved forecasting accuracy and computational efficiency both in point forecasts and prediction intervals, especially for longer forecast horizons. Moreover, we explore some potential factors that may affect the forecasting performance of our approach. 
Keywords:  ultralong time series, distributed forecasting, ARIMA models, least squares approximatio, MapReduce 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:msh:ebswps:202029&r=all 
By:  Magne Mogstad; Alexander Torgovitsky; Christopher R. Walters 
Abstract:  Marginal treatment effect methods are widely used for causal inference and policy evaluation with instrumental variables. However, they fundamentally rely on the wellknown monotonicity (thresholdcrossing) condition on treatment choice behavior. Recent research has shown that this condition cannot hold with multiple instruments unless treatment choice is effectively homogeneous. Based on these findings, we develop a new marginal treatment effect framework under a weaker, partial monotonicity condition. The partial monotonicity condition is implied by standard choice theory and allows for rich heterogeneity even in the presence of multiple instruments. The new framework can be viewed as having multiple different choice models for the same observed treatment variable, all of which must be consistent with the data and with each other. Using this framework, we develop a methodology for partial identification of clearly stated, policyrelevant target parameters while allowing for a wide variety of nonparametric shape restrictions and parametric functional form assumptions. We show how the methodology can be used to combine multiple instruments together to yield more informative empirical conclusions than one would obtain by using each instrument separately. The methodology provides a blueprint for extracting and aggregating information about treatment effects from multiple controlled or natural experiments while still allowing for rich heterogeneity in both treatment effects and choice behavior. 
JEL:  C01 C1 C26 C31 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:27546&r=all 
By:  Marco Battaglini; Forrest W. Crawford; Eleonora Patacchini; Sida Peng 
Abstract:  In this paper, we propose a new approach to the estimation of social networks and we apply it to the estimation of productivity spillovers in the U.S. Congress. Social networks such as the social connections among lawmakers are not generally directly observed, they can be recovered only using the observable outcomes that they contribute to determine (such as, for example, the legislators’ effectiveness). Moreover, they are typically stable for relatively short periods of time, thus generating only short panels of observations. Our estimator has three appealing properties that allows it to work in these environments. First, it is constructed for “small” asymptotic, thus requiring only short panels of observations. Second, it requires relatively nonrestrictive sparsity assumptions for identification, thus being applicable to dense networks with (potentially) star shaped connections. Third, it allows for heterogeneous common shocks across subnetworks. The application to the U.S. Congress gives us new insights about the nature of social interactions among lawmakers. We estimate a significant decrease over time in the importance of productivity spillovers among individual lawmakers, compensated by an increase in the party level common shock over time. This suggests that the rise of partisanship is not affecting only the ideological position of legislators when they vote, but more generally how lawmakers collaborate in the U.S. Congress. 
JEL:  D7 D72 D85 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:27557&r=all 
By:  Jacobs, B.J.D.; Fok, D.; Donkers, A.C.D. 
Abstract:  In modern retail contexts, retailers sell products from vast product assortments to a large and heterogeneous customer base. Understanding purchase behavior in such a context is very important. Standard models cannot be used due to the high dimen sionality of the data. We propose a new model that creates an efficient dimension reduction through the idea of purchase motivations. We only require customerlevel purchase history data, which is ubiquitous in modern retailing. The model han dles largescale data and even works in settings with shopping trips consisting of few purchases. As scalability of the model is essential for practical applicability, we develop a fast, custommade inference algorithm based on variational inference. Essential features of our model are that it accounts for the product, customer and time dimensions present in purchase history data; relates the relevance of moti vations to customer and shoppingtrip characteristics; captures interdependencies between motivations; and achieves superior predictive performance. Estimation re sults from this comprehensive model provide deep insights into purchase behavior. Such insights can be used by managers to create more intuitive, better informed, and more effective marketing actions. We illustrate the model using purchase history data from a Fortune 500 retailer involving more than 4,000 unique products. 
Keywords:  dynamic purchase behavior, largescale assortment, purchase history data, topic model, machine learning, variational inference 
Date:  2020–08–01 
URL:  http://d.repec.org/n?u=RePEc:ems:eureri:129674&r=all 
By:  Brantly Callaway 
Abstract:  This paper develops new techniques to bound distributional treatment effect parameters that depend on the joint distribution of potential outcomes  an object not identified by standard identifying assumptions such as selection on observables or even when treatment is randomly assigned. I show that panel data and an additional assumption on the dependence between untreated potential outcomes for the treated group over time (i) provide more identifying power for distributional treatment effect parameters than existing bounds and (ii) provide a more plausible set of conditions than existing methods that obtain point identification. I apply these bounds to study heterogeneity in the effect of job displacement during the Great Recession. Using standard techniques, I find that workers who were displaced during the Great Recession lost on average 34\% of their earnings relative to their counterfactual earnings had they not been displaced. Using the methods developed in the current paper, I also show that the average effect masks substantial heterogeneity across workers. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.08117&r=all 
By:  Neil Shephard 
Abstract:  Estimating linear regression using least squares and reporting robust standard errors is very common in financial economics, and indeed, much of the social sciences and elsewhere. For thick tailed predictors under heteroskedasticity this recipe for inference performs poorly, sometimes dramatically so. Here, we develop an alternative approach which delivers an unbiased, consistent and asymptotically normal estimator so long as the means of the outcome and predictors are finite. The new method has standard errors under heteroskedasticity which are easy to reliably estimate and tests which are close to their nominal size. The procedure works well in simulations and in an empirical exercise. An extension is given to quantile regression. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.06130&r=all 
By:  Jules Sadefo Kamdem (MRE  Montpellier Recherche en Economie  UM  Université de Montpellier); Babel Raïssa Guemdjo Kamdem (IMSP  Institut de Mathématiques et de Sciences Physiques [Bénin] (Université d’AbomeyCalavi (UAC))); Carlos Ougouyandjou 
Abstract:  The main purpose of this work is to contribute to the study of setvalued random variablesby providing a kind of Wold decomposition theorem for intervalvalued processes. As theset of setvalued random variables is not a vector space, the Wold decomposition theorem asestablished in 1938 by Herman Wold is not applicable for them. So, a notion of pseudovectorspace is introduced and used to establish a generalization of the Wold decomposition theoremthat works for intervalvalued covariance stationary time series processes. Before this, Setvalued autoregressive and moving average (SARMA) time series process is defined by takinginto account an arithmetical difference between random sets and random real variables. 
Keywords:  Wold décomposition,stationary time series,intervalvalued time series processes,ARMA model 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal02901595&r=all 
By:  JeanYves Pitarakis 
Abstract:  We introduce a new approach for comparing the predictive accuracy of two nested models that bypasses the difficulties caused by the degeneracy of the asymptotic variance of forecast error loss differentials used in the construction of commonly used predictive comparison statistics. Our approach continues to rely on the out of sample MSE loss differentials between the two competing models, leads to nuisance parameter free Gaussian asymptotics and is shown to remain valid under flexible assumptions that can accommodate heteroskedasticity and the presence of mixed predictors (e.g. stationary and local to unit root). A local power analysis also establishes its ability to detect departures from the null in both stationary and persistent settings. Simulations calibrated to common economic and financial applications indicate that our methods have strong power with good size control across commonly encountered sample sizes. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.08387&r=all 
By:  PierreOlivier Goffard (UCBL  Université Claude Bernard Lyon 1  Université de Lyon, ISFA  Institut de Science Financière et d'Assurances, LSAF  Laboratoire de Sciences Actuarielles et Financières  EA2429  ISFA  Institut de Science Financière et d'Assurances); Patrick Laub (UCBL  Université Claude Bernard Lyon 1  Université de Lyon, ISFA  Institut de Science Financière et d'Assurances, LSAF  Laboratoire de Sciences Actuarielles et Financière  ISFA  Institut de Science Financière et d'Assurances) 
Abstract:  Approximate Bayesian Computation (ABC) is a statistical learning technique to calibrate and select models by comparing observed data to simulated data. This technique bypasses the use of the likelihood and requires only the ability to generate synthetic data from the models of interest. We apply ABC to fit and compare insurance loss models using aggregated data. We present along the way how to use ABC for the more common claim counts and claim sizes data. A stateoftheart ABC implementation in Python is proposed. It uses sequential Monte Carlo to sample from the posterior distribution and the Wasserstein distance to compare the observed and synthetic data. MSC 2010 : 60G55, 60G40, 12E10. 
Keywords:  Bayesian statistics,approximate Bayesian computation,likelihood free inference,risk management 
Date:  2020–07–06 
URL:  http://d.repec.org/n?u=RePEc:hal:wpaper:hal02891046&r=all 
By:  Aknouche, Abdelhakim; Francq, Christian 
Abstract:  A general MarkovSwitching autoregressive conditional mean model, valued in the set of nonnegative numbers, is considered. The conditional distribution of this model is a finite mixture of nonnegative distributions whose conditional mean follows a GARCHlike dynamics with parameters depending on the state of a Markov chain. Three different variants of the model are examined depending on how the laggedvalues of the mixing variable are integrated into the conditional mean equation. The model includes, in particular, Markov mixture versions of various wellknown nonnegative time series models such as the autoregressive conditional duration (ACD) model, the integervalued GARCH (INGARCH) model, and the Beta observation driven model. Under contraction in mean conditions, it is shown that the three variants of the model are stationary and ergodic when the stochastic order and the mean order of the mixing distributions are equal. The proposed conditions match those already known for Markovswitching GARCH models. We also give conditions for finite marginal moments. Applications to various mixture and Markov mixture count, duration and proportion models are provided. 
Keywords:  Autoregressive Conditional Duration, Count time series models, finite mixture models, Ergodicity, Integervalued GARCH, Markov mixture models. 
JEL:  C10 C18 C22 C25 
Date:  2020–08–18 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:102503&r=all 
By:  Eric Benhamou (LAMSADE  Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision  Université Paris DauphinePSL  CNRS  Centre National de la Recherche Scientifique); David Saltiel (ULCO  Université du Littoral Côte d'Opale); Beatrice Guez; Nicolas Paris (CRIL  Centre de Recherche en Informatique de Lens  UA  Université d'Artois  CNRS  Centre National de la Recherche Scientifique) 
Abstract:  Sharpe ratio (sometimes also referred to as information ratio) is widely used in asset management to compare and benchmark funds and asset managers. It computes the ratio of the (excess) net return over the strategy standard deviation. However, the elements to compute the Sharpe ratio, namely, the expected returns and the volatilities are unknown numbers and need to be estimated statistically. This means that the Sharpe ratio used by funds is likely to be error prone because of statistical estimation errors. In this paper, we provide various tests to measure the quality of the Sharpe ratios. By quality, we are aiming at measuring whether a manager was indeed lucky of skillful. The test assesses this through the statistical significance of the Sharpe ratio. We not only look at the traditional Sharpe ratio but also compute a modified Sharpe insensitive to used Capital. We provide various statistical tests that can be used to precisely quantify the fact that the Sharpe is statistically significant. We illustrate in particular the number of trades for a given Sharpe level that provides statistical significance as well as the impact of autocorrelation by providing reference tables that provides the minimum required Sharpe ratio for a given time period and correlation. We also provide for a Sharpe ratio of 0.5, 1.0, 1.5 and 2.0 the skill percentage given the autocorrelation level. JEL classification: C12, G11. 
Keywords:  Sharpe ratio,Student distribution,compounding effect on Sharpe,Wald test,Ttest,Chi square test 
Date:  2020–07–01 
URL:  http://d.repec.org/n?u=RePEc:hal:wpaper:hal02886500&r=all 
By:  Andres Sagner 
Abstract:  In this paper, I develop a method that extends quantile regressions to high dimensional factor analysis. In this context, the quantile function of a panel of variables with crosssection and timeseries dimensions N and T is endowed with a factor structure. Thus, both factors and factor loadings are allowed to be quantilespecific. I provide a set of conditions under which these objects are identified, and I propose a simple twostep iterative procedure called Quantile Principal Components (QPC) to estimate them. Uniform consistency of the estimators is established under general assumptions when N,T?8 jointly. Lastly, under certain additional assumptions related to the density of the observations about the quantile of interest, and the relationship between N and T, I show that the QPC estimators are asymptotically normal with convergence rates similar to the ones derived in the traditional factor analysis literature. Monte Carlo simulations confirm the good performance of the QPC procedure, especially in nonlinear environments, or when the factors affect higher moments of the observable variables and suggest that the proposed theory provides a good approximation to the finite sample distribution of the QPC estimators. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:chb:bcchwp:886&r=all 
By:  Mynbayev, Kairat; Darkenbayeva, Gulsim 
Abstract:  Central limit theorems deal with convergence in distribution of sums of random variables. The usual approach is to normalize the sums to have variance equal to 1. As a result, the limit distribution has variance one. In most papers, existence of the limit of the normalizing factor is postulated and the limit itself is not studied. Here we review some results which focus on the study of the normalizing factor. Applications are indicated. 
Keywords:  Central limit theorems, convergence in distribution, limit distribution, variance 
JEL:  C02 C10 C40 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:101685&r=all 
By:  Long Feng; Tiefeng Jiang; Binghui Liu; Wei Xiong 
Abstract:  We consider a testing problem for crosssectional dependence for highdimensional panel data, where the number of crosssectional units is potentially much larger than the number of observations. The crosssectional dependence is described through a linear regression model. We study three tests named the sum test, the max test and the maxsum test, where the latter two are new. The sum test is initially proposed by Breusch and Pagan (1980). We design the max and sum tests for sparse and nonsparse residuals in the linear regressions, respectively.And the maxsum test is devised to compromise both situations on the residuals. Indeed, our simulation shows that the maxsum test outperforms the previous two tests. This makes the maxsum test very useful in practice where sparsity or not for a set of data is usually vague. Towards the theoretical analysis of the three tests, we have settled two conjectures regarding the sum of squares of sample correlation coefficients asked by Pesaran (2004 and 2008). In addition, we establish the asymptotic theory for maxima of sample correlations coefficients appeared in the linear regression model for panel data, which is also the first successful attempt to our knowledge. To study the maxsum test, we create a novel method to show asymptotic independence between maxima and sums of dependent random variables. We expect the method itself is useful for other problems of this nature. Finally, an extensive simulation study as well as a case study are carried out. They demonstrate advantages of our proposed methods in terms of both empirical powers and robustness for residuals regardless of sparsity or not. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.03911&r=all 
By:  Georges Sfeir; Maya AbouZeid; Filipe Rodrigues; Francisco Camara Pereira; Isam Kaysi 
Abstract:  This study presents a seminonparametric Latent Class Choice Model (LCCM) with a flexible class membership component. The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification with the aim of comparing the two approaches on various measures including prediction accuracy and representation of heterogeneity in the choice process. Mixture models are parametric modelbased clustering techniques that have been widely used in areas such as machine learning, data mining and patter recognition for clustering and classification problems. An ExpectationMaximization (EM) algorithm is derived for the estimation of the proposed model. Using two different case studies on travel mode choice behavior, the proposed model is compared to traditional discrete choice models on the basis of parameter estimates' signs, value of time, statistical goodnessoffit measures, and crossvalidation tests. Results show that mixture models improve the overall performance of latent class choice models by providing better outofsample prediction accuracy in addition to better representations of heterogeneity without weakening the behavioral and economic interpretability of the choice models. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.02739&r=all 
By:  Andr\'es Ram\'irezHassan; Raquel VargasCorrea; Gustavo Garc\'ia; Daniel Londo\~no 
Abstract:  We propose a simple approach to optimally select the number of control units in k nearest neighbors (kNN) algorithm focusing in minimizing the mean squared error for the average treatment effects. Our approach is nonparametric where confidence intervals for the treatment effects were calculated using asymptotic results with bias correction. Simulation exercises show that our approach gets relative small mean squared errors, and a balance between confidence intervals length and type I error. We analyzed the average treatment effects on treated (ATET) of participation in 401(k) plans on accumulated net financial assets confirming significant effects on amount and positive probability of net asset. Our optimal k selection produces significant narrower ATET confidence intervals compared with common practice of using k=1. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.06564&r=all 
By:  Jie Fang; Jianwu Lin; Shutao Xia; Yong Jiang; Zhikang Xia; Xiang Liu 
Abstract:  Instead of conducting manual factor construction based on traditional and behavioural finance analysis, academic researchers and quantitative investment managers have leveraged Genetic Programming (GP) as an automatic feature construction tool in recent years, which builds reverse polish mathematical expressions from trading data into new factors. However, with the development of deep learning, more powerful feature extraction tools are available. This paper proposes Neural Networkbased Automatic Factor Construction (NNAFC), a tailored neural network framework that can automatically construct diversified financial factors based on financial domain knowledge and a variety of neural network structures. The experiment results show that NNAFC can construct more informative and diversified factors than GP, to effectively enrich the current factor pool. For the current market, both fully connected and recurrent neural network structures are better at extracting information from financial time series than convolution neural network structures. Moreover, new factors constructed by NNAFC can always improve the return, Sharpe ratio, and the max drawdown of a multifactor quantitative investment strategy due to their introducing more information and diversification to the existing factor pool. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.06225&r=all 
By:  Sookyo Jeong; Hongseok Namkoong 
Abstract:  We propose the worstcase treatment effect (WTE) across all subpopulations of a given size, a conservative notion of topline treatment effect. Compared to the average treatment effect (ATE) that solely relies on the covariate distribution of collected data, WTE is robust to unanticipated covariate shifts, and ensures positive findings guarantee uniformly valid treatment effects over underrepresented minority groups. We develop a semiparametrically efficient estimator for the WTE, leveraging machine learningbased estimates of heterogenous treatment effects and propensity scores. By virtue of satisfying a key (Neyman) orthogonality property, our estimator enjoys central limit behaviororacle rates with true nuisance parameterseven when estimates of nuisance parameters converge at slower rates. For both observational and randomized studies, we prove that our estimator achieves the optimal asymptotic variance, by establishing a semiparametric efficiency lower bound. On real datasets where robustness to covariate shift is of core concern, we illustrate the nonrobustness of ATE under even mild distributional shift, and demonstrate that the WTE guards against brittle findings that are invalidated by unanticipated covariate shifts. 
Date:  2020–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2007.02411&r=all 
By:  Philippe Goulet Coulombe 
Abstract:  It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits insample without any consequence outofsample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible. I empirically demonstrate the property with simulations and real data by reporting that these new ensembles yield equivalent performance to their tuned counterparts. 
Date:  2020–08 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2008.07063&r=all 