nep-cmp New Economics Papers
on Computational Economics
Issue of 2021‒03‒08
twenty-one papers chosen by

  1. Surrogate Models for Optimization of Dynamical Systems By Khowaja, Kainat; Shcherbatyy, Mykhaylo; Härdle, Wolfgang Karl
  2. Will urban air mobility fly? The efficiency and distributional impacts of UAM in different urban spatial structures By Anna Straubinger; Erik T. Verhoef; Henri L.F. de Groot
  3. Assessment of AIR Quality Index for Delhi region: A comparison between odd-even policy 2019 and Lock Down Period. By Dhingra, Chesta
  4. History-Augmented Collaborative Filtering for Financial Recommendations By Baptiste Barreau; Laurent Carlier
  5. New predictor-corrector interior-point algorithm for symmetric cone horizontal linear complementarity problems By Darvay, Zsolt; Rigó, Petra Renáta
  6. Gender Distribution across Topics in Top 5 Economics Journals: A Machine Learning Approach By J. Ignacio Conde-Ruiz; Juan-José Ganuza; Manu García; Luis A. Puch
  7. Detecting possibly frequent change-points: Wild Binary Segmentation 2 and steepest-drop model selection By Fryzlewicz, Piotr
  8. Estimation of Heuristic Switching in Behavioral Macroeconomic Models By Kukacka, Jiri; Sacht, Stephen
  9. Monte-Carlo-Evaluation von Instrumentenvariablenschätzern By Auer, Benjamin R.; Rottmann, Horst
  10. Can Machine Learning Catch the COVID-19 Recession? By Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
  11. Universal Independence Income. A EUROMOD Utopian Simulation in the UK By Bonomi Bezzo, Franco
  12. Dissimilarity effects on house prices: What is the value of similar neighbours? By Bonakdar, Said Benjamin; Roos, Michael W. M.
  13. Investor Confidence and Forecastability of US Stock Market Realized Volatility : Evidence from Machine Learning By Rangan Gupta; Jacobus Nel; Christian Pierdzioch
  14. Machine Learning and Oil Price Point and Density Forecasting By Alexandre Bonnet R. Costa; Pedro Cavalcanti G. Ferreira; Wagner P. Gaglianone; Osmani Teixeira C. Guillén; João Victor Issler; Yihao Lin
  15. Do gender wage differences within households influence women's empowerment and welfare?: Evidence from Ghana By Michael Danquah; Abdul Malik Iddrisu; Ernest Owusu Boakye; Solomon Owusu
  16. Do Words Hurt More Than Actions? The Impact of Trade Tensions on Financial Markets By Massimo Ferrari; Frederik Kurcz; Maria Sole Pagliari
  17. The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter? By Strittmatter, Anthony; Wunsch, Conny
  18. Using machine learning for measuring democracy: An update By Gründler, Klaus; Krieger, Tommy
  19. Machine Learning and Credit Risk: Empirical Evidence from SMEs By Alessandro Bitetto; Paola Cerchiello; Stefano Filomeni; Alessandra Tanda; Barbara Tarantino
  20. Risk & Returns around Fomc Press Conferences: A Novel Perspective from Computer Vision By Alexis Marchal
  21. Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects By Jacob, Daniel

  1. By: Khowaja, Kainat; Shcherbatyy, Mykhaylo; Härdle, Wolfgang Karl
    Abstract: Driven by increased complexity of dynamical systems, the solution of system of differential equations through numerical simulation in optimization problems has become computationally expensive. This paper provides a smart data driven mechanism to construct low dimensional surrogate models. These surrogate models reduce the computational time for solution of the complex optimization problems by using training instances derived from the evaluations of the true objective functions. The surrogate models are constructed using combination of proper orthogonal decomposition and radial basis functions and provides system responses by simple matrix multiplication. Using relative maximum absolute error as the measure of accuracy of approximation, it is shown surrogate models with latin hypercube sampling and spline radial basis functions dominate variable order methods in computational time of optimization, while preserving the accuracy. These surrogate models also show robustness in presence of model non-linearities. Therefore, these computational efficient predictive surrogate models are applicable in various fields, specifically to solve inverse problems and optimal control problems, some examples of which are demonstrated in this paper.
    Keywords: Proper Orthogonal Decomposition,SVD,Radial Basis Functions,Optimization,Surrogate Models,Smart Data Analytics,Parameter Estimation
    JEL: C00
    Date: 2021
  2. By: Anna Straubinger (Vrije Universiteit Amsterdam); Erik T. Verhoef (Vrije Universiteit Amsterdam); Henri L.F. de Groot (Vrije Universiteit Amsterdam)
    Abstract: Recent technological developments open up possibilities for introducing a vast number of novel mobility concepts in urban environments. One of these new concepts is urban air mobility (UAM). It makes use of passenger drones for on-demand transport in urban settings, promising high travel speeds for those willing and able to pay. This research aims to answer the question how benefits from UAM will be distributed, taking into account the spatial dimension and the differential impacts on low- and high-skilled households. We develop a framework that can more generally be used to assess the welfare impacts resulting from the introduction of novel transport modes. The development of an urban spatial computable general equilibrium model building on the polycentric modelling tradition developed by Anas and co-authors allows for an analysis of mutually dependent effects on the land, labour and product markets, triggered by changes on the transport market. Allowing for an endogenous spatial structure through the introduction of agglomeration effects and an amenity-based approach, the framework investigates the relevance of the initial spatial structure for the impact of the introduction of UAM. Incorporating different skill levels of households allows to assess location choice and travel behaviour for households with different characteristics. A numerical simulation of the model shows that the different initial spatial structures impose comparable welfare changes. Variations in UAM features like marginal cost, prices, land demand for infrastructure, vertical travel speed and access and egress times have a (much) more decisive impact on modal choice and welfare effects than the initial urban structure. Simulations show that considering households of different skill levels brings additional insights, as welfare effects of UAM introduction strongly differ between groups and sometimes even go in opposing directions.
    Keywords: Urban air mobility, spatial equilibrium, welfare effects, agglomeration effects
    JEL: R13 R41 C68
    Date: 2021–02–24
  3. By: Dhingra, Chesta
    Abstract: The aim behind doing this research is to analyse the impact of odd-even policy and lockdown implementation on the air quality index of Delhi by doing the case study on the four regions Ashok Vihar, Anand Vihar, Dwarka and R.K. Puram. The data is been collected from DPCC and the main parameters we looked for are PM10 and PM2.5. In which we find out that. highest levels of the pollutants PM10 and PM2.5 been observed during the time of odd-even policy implementation for the year 2019 (04 November 2019- 15 November 2019) whereas during the lockdown period (23 March 2020-31st August 2020) a great decline in pollutant levels is been detected. This we further try to correlate with the spatial variations of Delhi region and able to discern that meteorological parameters (Ambient Temperature, Relative Humidity, Wind Speed and Solar Radiations) in respect with seasonal variations have a major influence on PM 10 and PM 2.5 levels. During the winter season both the parameters PM10 & PM2.5 are touching the peak because of the impact of three major meteorological parameters Ambient Temperature, Wind Speed and Solar Radiation and during the monsoon season air quality conditions are quite favourable because of Ambient Temperature and Wind Speed parameters. In the end we use the ensembled machine learning algorithms like Random Forest and Extra trees regressor to have an accurate estimation of PM2.5 levels for all the four regions of Delhi and perceived that these ensembled learning techniques are better than other machine learning algorithms like Neural Networks, Linear regression and SVMs. The Random Forest and Extra trees regressor models give the R2 value 0.75 and 0.78 respectively for estimation of PM2.5; R2 value is a statistical measurement which explains the variance of dependent variable based on the independent variables of a regression model.
    Date: 2021–02–18
  4. By: Baptiste Barreau; Laurent Carlier
    Abstract: In many businesses, and particularly in finance, the behavior of a client might drastically change over time. It is consequently crucial for recommender systems used in such environments to be able to adapt to these changes. In this study, we propose a novel collaborative filtering algorithm that captures the temporal context of a user-item interaction through the users' and items' recent interaction histories to provide dynamic recommendations. The algorithm, designed with issues specific to the financial world in mind, uses a custom neural network architecture that tackles the non-stationarity of users' and items' behaviors. The performance and properties of the algorithm are monitored in a series of experiments on a G10 bond request for quotation proprietary database from BNP Paribas Corporate and Institutional Banking.
    Date: 2021–02
  5. By: Darvay, Zsolt; Rigó, Petra Renáta
    Abstract: In this paper we propose a new predictor-corrector interior-point algorithm for solving P_* (κ) horizontal linear complementarity problems defined on a Cartesian product of symmetric cones, which is not based on a usual barrier function. We generalize the predictor-corrector algorithm introduced in [13] to P_* (κ)-linear horizontal complementarity problems on a Cartesian product of symmetric cones. We apply the algebraic equivalent transformation technique proposed by Darvay [9] and we use the function φ(t)=t-√t in order to determine the new search directions. In each iteration the proposed algorithm performs one predictor and one corrector step. We prove that the predictor-corrector interior-point algorithm has the same complexity bound as the best known interior-point algorithms for solving these types of problems. Furthermore, we provide a condition related to the proximity and update parameters for which the introduced predictor-corrector algorithm is well defined.
    Keywords: Horizontal linear complementarity problem, Cartesian product of symmetric cones, Predictor-corrector interior-point algorithm, Euclidean Jordan algebra, Algebraic equivalent transformation technique
    JEL: C61
    Date: 2021–03–01
  6. By: J. Ignacio Conde-Ruiz; Juan-José Ganuza; Manu García; Luis A. Puch
    Abstract: We analyze all the articles published in Top 5 economic journals between 2002 and 2019 in order to find gender differences in their research approach. Using an unsupervised machine learning algorithm (Structural Topic Model) developed by Roberts et al. (2019) we characterize jointly the set of latent topics that best fits our data (the set of abstracts) and how the documents/abstracts are allocated in each latent topic. This latent topics are mixtures over words were each word has a probability of belonging to a topic after controlling by year and journal. This latent topics may capture research fields but also other more subtle characteristics related to the way in which the articles are written. We find that females are uneven distributed along these latent topics by using only data driven methods. The differences about gender research approaches we found in this paper, are "automatically" generated given the research articles, without an arbitrary allocation to particular categories (as JEL codes, or research areas).
    Keywords: machine learning, structural topic model, gender, research fields
    JEL: I20 J16
    Date: 2021–03
  7. By: Fryzlewicz, Piotr
    Abstract: Many existing procedures for detecting multiple change-points in data sequences fail in frequent-change-point scenarios. This article proposes a new change-point detection methodology designed to work well in both infrequent and frequent change-point settings. It is made up of two ingredients: one is “Wild Binary Segmentation 2” (WBS2), a recursive algorithm for producing what we call a ‘complete’ solution path to the change-point detection problem, i.e. a sequence of estimated nested models containing 0 , … , T- 1 change-points, where T is the data length. The other ingredient is a new model selection procedure, referred to as “Steepest Drop to Low Levels” (SDLL). The SDLL criterion acts on the WBS2 solution path, and, unlike many existing model selection procedures for change-point problems, it is not penalty-based, and only uses thresholding as a certain discrete secondary check. The resulting WBS2.SDLL procedure, combining both ingredients, is shown to be consistent, and to significantly outperform the competition in the frequent change-point scenarios tested. WBS2.SDLL is fast, easy to code and does not require the choice of a window or span parameter.
    Keywords: segmentation; break detection; jump detection; randomized algorithms; adaptive algorithms; multiscale methods; EP/ L014246/1
    JEL: C1
    Date: 2020–12–01
  8. By: Kukacka, Jiri; Sacht, Stephen
    Abstract: This paper offers a simulation-based method for the estimation of heuristic switching in nonlinear macroeconomic models. Heuristic switching is an important feature of modeling strategy since it uses simple decision rules of boundedly rational heterogeneous agents. The simulation study shows that the proposed simulated maximum likelihood method identifies the behavioral effects that stay hidden for standard econometric approaches. In the empirical application, we estimate the structural and behavioral parameters of the US economy. We are especially able to reliably identify the intensity of choice that governs the models' nonlinear dynamics.
    Keywords: Behavioral Heuristics,Heuristic Switching Model,Intensity of Choice,Simulated Maximum Likelihood
    JEL: C53 D83 E12 E32
    Date: 2021
  9. By: Auer, Benjamin R.; Rottmann, Horst
    Abstract: Dieser Beitrag illustriert mittels Monte-Carlo-Simulation die Eigenschaften des OLS- und des IV-Schätzers, wenn die erklärende Variable im einfachen linearen Regressionsmodell endogen, d. h. mit dem Störterm des Modells korreliert ist. Insbesondere werden dabei die Verzerrung des OLS-Schätzers und die Konsistenz des IV-Schätzers aufgezeigt sowie der Einfluss schwacher Instrumente verdeutlicht.
    Keywords: Monte-Carlo-Simulation,OLS-Schätzung,IV-Schätzung,Endogenität,schwacheInstrumente
    Date: 2020
  10. By: Philippe Goulet Coulombe; Massimiliano Marcellino; Dalibor Stevanovic
    Abstract: Based on evidence gathered from a newly built large macroeconomic data set for the UK, labeled UK-MD and comparable to similar datasets for the US and Canada, it seems the most promising avenue for forecasting during the pandemic is to allow for general forms of nonlinearity by using machine learning (ML) methods. But not all nonlinear ML methods are alike. For instance, some do not allow to extrapolate (like regular trees and forests) and some do (when complemented with linear dynamic components). This and other crucial aspects of ML-based forecasting in unprecedented times are studied in an extensive pseudo-out-of-sample exercise.
    Keywords: Machine Learning,Big Data,Forecasting,COVID-19,
    JEL: C53 C55 E37
    Date: 2021–03–02
  11. By: Bonomi Bezzo, Franco
    Abstract: In this paper we want to provide an utopian attempt to tackle inequality and to tackle, most specifically, what we consider the cultural and ethical origin of inequality: paid work. We believe that a globalised world, structured around the asymmetry between an increasingly small number of employers and an increasing, almost unlimited, supply of always available employees, leads to increasing inequalities. Under our perspective, in the post-industrialised economies of all major developed countries, paid work cannot be seen anymore as an instrument of self-determination (Marx, 1844) but becomes the main generator of exploitation and poverty. For this reason, we try to develop a benefit with attached strong disincentives to paid work that should provide people with an exit strategy and higher bargaining power. After presenting the main typologies of income benefits that are normally in use or discussed we provide a theoretical explanation of the Universal Independence Income (UII) benefit we want to introduce. We simulate the introduction of our preferred version of UII, two variations of UII and five forms of Universal Basic Income (UBI) to be compared with the tax and benefit system currently in place in the UK. Our main findings suggest that UII has a positive effect on inequality an almost null effect on poverty and strong positive effects on work disincentives.Â
    Date: 2021–02–15
  12. By: Bonakdar, Said Benjamin; Roos, Michael W. M.
    Abstract: Residential choice does not only depend on properties of the dwelling, neighborhood amenities and affordability, but is also affected by the population composition within a neighbourhood. All these attributes are capitalised in the house price. Empirically, it is not easy to disentangle the effect of the neighbourhood on house prices from the effects of the dwelling attributes. We implement an agentbased model of an urban housing market that allows us to analyse the interaction between residential choice, population composition in a neighbourhood and house prices. Agents differ in terms of education, income and group affiliation (majority vs. minority). The results show that the "wrong" neighbourhood can lead to an average house price depreciation of up to 13,500 monetary units or 7.1 percent. Whereas rich agents can afford to move to preferred places, roughly 13.01% of poor minorities and 8.02% of poor majority agents are locked in their current neighbourhood. By introducing a policy that provides agents more access to credit, we find that all population groups denote higher satisfaction levels. Poor agents show the largest improvements. The general satisfaction level across all population groups increases. However, the extra credit accessibility also drives up house prices and leads to higher wealth inequality within the city. If agents have a preference for status rather than for similarity, the effect of the overall inequality is smaller, since agents become more satisfied living in areas with less similar agents.
    Keywords: Agent-based modelling,residential choice,housing demand,neighbourhood characteristics,house prices
    JEL: C63 R21 R23 R32
    Date: 2021
  13. By: Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield, 0028, South Africa); Jacobus Nel (Department of Economics, University of Pretoria, Private Bag X20, Hatfield, 0028, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany)
    Abstract: Using a machine-learning technique known as random forests, we analyze the role of investor confidence in forecasting monthly aggregate realized stock-market volatility of the United States (US), over and above a wide-array of macroeconomic and financial variables. We estimate random forests on data for a period from 2001 to 2020, and study horizons up to one year by computing forecasts for recursive and a rolling estimation window. We find that investor confidence, and especially investor confidence uncertainty has out-of-sample predictive value for overall realized volatility, as well as its “good†and “bad†variants. Our results have important implications for investors and policymakers.
    Keywords: Investor Confidence, Realized Volatility, Macroeconomic and Financial Predictors, Forecasting, Machine Learning
    JEL: C22 C53 G10 G17
  14. By: Alexandre Bonnet R. Costa; Pedro Cavalcanti G. Ferreira; Wagner P. Gaglianone; Osmani Teixeira C. Guillén; João Victor Issler; Yihao Lin
    Abstract: The purpose of this paper is to explore machine learning techniques to forecast the oil price. In the era of big data, we investigate whether new automated tools can improve over traditional approaches in terms of forecast accuracy. Oil price point and density forecasts are built from 22 methods, including regression trees (random forest, quantile regression forest, xgboost), regularization procedures (elastic net, lasso, ridge), standard econometric models and forecast combinations, besides the structural factor model of Schwartz and Smith (2000). The database contains 315 macroeconomic and financial variables, used to build high-dimensional models. To evaluate the predictive power of each method, an extensive pseudo out-of-sample forecasting exercise is built, in monthly and quarterly frequencies, with horizons from one month up to five years. Overall, the results indicate a good performance of the machine learning methods in the short run. Up to six months, the lasso-based models, oil future prices, and the Schwartz-Smith model provide the best forecasts. At longer horizons, forecast combinations also become relevant. In several cases, the accuracy gains in respect to the random walk forecast are statistically significant and reach two-digit figures, in percentage terms, using the R2 out-of-sample statistic; an expressive achievement compared to the previous literature.
    Date: 2021–02
  15. By: Michael Danquah; Abdul Malik Iddrisu; Ernest Owusu Boakye; Solomon Owusu
    Abstract: Using household data from the latest wave of the Ghana Living Standards Survey, this paper utilizes machine learning techniques to examine the effect of gender wage differences within households on women's empowerment and welfare in Ghana. The structural parameters of the post-double selection LASSO estimations show that a reduction in household gender wage gap significantly enhances women's empowerment. Also, a decline in household gender wage gap results meaningfully in improving household welfare.
    Keywords: Gender wage gap, Households, Women's empowerment, Welfare, Machine learning, Ghana
    Date: 2021
  16. By: Massimo Ferrari; Frederik Kurcz; Maria Sole Pagliari
    Abstract: In this paper, we apply textual analysis and machine learning algorithms to construct an index capturing trade tensions between US and China. Our indicator matches well-known events in the US-China trade dispute and is exogenous to the developments on global financial markets. By means of local projection methods, we show that US markets are largely unaffected by rising trade tensions, with the exception of those firms that are more exposed to China, while the same shock negatively affects stock market indices in EMEs and China. Higher trade tensions also entail: i) an appreciation of the US dollar; ii) a depreciation of EMEs currencies; iii) muted changes in safe haven currencies; iv) portfolio re-balancing between stocks and bonds in the EMEs. We also show that trade tensions account for around 15% of the variance of Chinese stocks while their contribution is muted for US markets. These findings suggest that the US-China trade tensions are interpreted as a negative demand shock for the Chinese economy rather than as a global risk shock.
    Keywords: Trade Shocks; Machine Learning; Stock Indexes; Exchange Rates.
    JEL: D53 E44 F13 F14 C55
    Date: 2021
  17. By: Strittmatter, Anthony (University of St. Gallen); Wunsch, Conny (University of Basel)
    Abstract: The vast majority of existing studies that estimate the average unexplained gender pay gap use unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using a notably rich and large data set of 1.7 million employees in Switzerland, we investigate how the methodological improvements made possible by such big data affect estimates of the unexplained gender pay gap. We study the sensitivity of the estimates with regard to i) the availability of observationally comparable men and women, ii) model flexibility when controlling for wage determinants, and iii) the choice of different parametric and semi- parametric estimators, including variants that make use of machine learning methods. We find that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplained gender pay gap decline by up to 39% when we enforce comparability between men and women and use a more flexible specification of the wage equation. Semi-parametric matching yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50% smaller and also less sensitive to the way wage determinants are included.
    Keywords: gender inequality, gender pay gap, common support, model specification, matching estimator, machine learning
    JEL: J31 C21
    Date: 2021–02
  18. By: Gründler, Klaus; Krieger, Tommy
    Abstract: We provide a comprehensive overview of the literature on the measurement of democracy and present an extensive update of the Machine Learning indicator of Gründler and Krieger (2016, European Journal of Political Economy). Four improvements are particularly notable: First, we produce a continuous and a dichotomous version of the Machine Learning democracy indicator. Second, we calculate intervals that reflect the degree of measurement uncertainty. Third, we refine the conceptualization of the Machine Learning Index. Finally, we largely expand the data coverage by providing democracy indicators for 186 countries in the period from 1919 to 2019.
    Keywords: Data aggregation,Democracy indicators,Machine Learning,Measurement Issues,Regime Classifications,Support Vector Machines
    JEL: C38 C43 C82 E02 P16
    Date: 2021
  19. By: Alessandro Bitetto (University of Pavia); Paola Cerchiello (University of Pavia); Stefano Filomeni (University of Essex); Alessandra Tanda (University of Pavia); Barbara Tarantino (University of Pavia)
    Abstract: In this paper we assess credit risk of SMEs by testing and comparing a classic parametric approach fitting an ordered probit model with a non-parametric one calibrating a machine learning historical random forest (HRF) model. We do so by exploiting a unique and proprietary dataset comprising granular firm-level quarterly data collected from a large European bank and an international insurance company on a sample of 810 Italian small- and medium-sized enterprises (SMEs) over the time period 2015-2017. Our results provide novel evidence that a dynamic Historical Random Forest (HRF) approach outperforms the traditional ordered probit model, highlighting how advanced estimation methodologies that use machine learning techniques can be successfully implemented to predict SME credit risk. Moreover, by using Shapley values for the first time, we are able to assess the relevance of each variable in predicting SME credit risk. Traditionally, credit risk evaluation of informationally-opaque SMEs has relied on soft information-intensive relationship banking. However, the advent of large banking conglomerates and the limits to successfully "harden" and transmit soft information across large banking organizations, challenge the traditional role of relationship banking, urging the need to evaluate SME credit risk by implementing alternative methodologies mostly based on hard information.
    Keywords: Credit Rating, SME, Historical Random Forest, Machine Learning, Relationship Banking, Soft Information
    JEL: C52 C53 D82 D83 G21 G22
    Date: 2021–02
  20. By: Alexis Marchal (EPFL; SFI)
    Abstract: I propose a new tool to characterize the resolution of uncertainty around FOMC press conferences. It relies on the construction of a measure capturing the level of discussion complexity between the Fed Chair and reporters during the Q&A sessions. I show that complex discussions are associated with higher equity returns and a drop in realized volatility. The method creates an attention score by quantifying how much the Chair needs to rely on reading internal documents to be able to answer a question. This is accomplished by building a novel dataset of video images of the press conferences and leveraging recent deep learning algorithms from computer vision. This alternative data provides new information on nonverbal communication that cannot be extracted from the widely analyzed FOMC transcripts. This paper can be seen as a proof of concept that certain videos contain valuable information for the study of financial markets.
    Keywords: FOMC, Machine learning, Computer vision, Alternative data, Asset pricing, Equity premium.
    JEL: C45 C55 C80 E58 G12 G14
    Date: 2021–03
  21. By: Jacob, Daniel
    Abstract: We investigate the finite sample performance of sample splitting, cross-fitting and averaging for the estimation of the conditional average treatment effect. Recently proposed methods, so-called meta- learners, make use of machine learning to estimate different nuisance functions and hence allow for fewer restrictions on the underlying structure of the data. To limit a potential overfitting bias that may result when using machine learning methods, cross- fitting estimators have been proposed. This includes the splitting of the data in different folds to reduce bias and averaging over folds to restore efficiency. To the best of our knowledge, it is not yet clear how exactly the data should be split and averaged. We employ a Monte Carlo study with different data generation processes and consider twelve different estimators that vary in sample-splitting, cross-fitting and averaging procedures. We investigate the performance of each estimator independently on four different meta-learners: the doubly-robust-learner, R-learner, T-learner and X-learner. We find that the performance of all meta-learners heavily depends on the procedure of splitting and averaging. The best performance in terms of mean squared error (MSE) among the sample split estimators can be achieved when applying cross-fitting plus taking the median over multiple different sample-splitting iterations. Some meta-learners exhibit a high variance when the lasso is included in the ML methods. Excluding the lasso decreases the variance and leads to robust and at least competitive results.
    Keywords: causal inference,sample splitting,cross-fitting,sample averaging,machine learning,simulation study
    JEL: C01 C14 C31 C63
    Date: 2020

General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.