nep-big New Economics Papers
on Big Data
Issue of 2022‒02‒28
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Gender Differences in Reference Letters: Evidence from the Economics Job Market By Eberhardt, Markus; Facchini, Giovanni; Rueda, Valeria
  2. Big data forecasting of South African inflation By Byron Botha; Rulof Burger; Kevin Kotz; Neil Rankin; Daan Steenkamp
  3. Artificial intelligence, ethics, and diffused pivotality By Klockmann, Victor; von Schenk, Alicia; Villeval, Marie-Claire
  4. Structural Breaks in Carbon Emissions: A Machine Learning Analysis By Jiaxiong Yao; Mr. Yunhui Zhao
  5. The Transfer Performance of Economic Models By Isaiah Andrews; Drew Fudenberg; Annie Liang; Chaofeng Wu
  6. Machine Learning Time Series Regressions With an Application to Nowcasting By Babii, Andrii; Ghysels, Eric; Striaukas, Jonas
  7. Enterprises Providing ICT Training in Europe By Laureti, Lucio; Costantiello, Alberto; Matarrese, Marco Maria; Leogrande, Angelo
  8. Model-based recursive partitioning to estimate unfair health inequalities in the United Kingdom Household Longitudinal Study By Brunori, Paolo; Davillas, Apostolos; Jones, Andrew M.; Scarchilli, Giovanna
  9. We study the economics and finance scholars’ reaction to the 2008 financial crisis using machine learning language analyses methods of Latent Dirichlet Allocation and dynamic topic modelling algorithms, to analyze the texts of 14,270 NBER working papers covering the 1999–2016 period. We find that academic scholars as a group were insufficiently engaged in crises’ studies before 2008. As the crisis unraveled, however, they switched their focus to studying the crisis, its causes, and consequences. Thus, the scholars were “slow-to-see,” but they were “fast-to-act.” Their initial response to the ongoing Covid-19 crisis is consistent with these conclusions. By Daniel Levy; Tamir Mayer; Alon Raviv
  10. Black-box Bayesian inference for economic agent-based models By Joel Dyer; Patrick Cannon; J. Doyne Farmer; Sebastian Schmon
  11. Artificial intelligence, ethics, and intergenerational responsibility By Klockmann, Victor; von Schenk, Alicia; Villeval, Marie-Claire
  12. Automation and Related Technologies: A Mapping of the New Knowledge Base By Santarelli, Enrico; Staccioli, Jacopo; Vivarelli, Marco
  13. Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression By Masahiro Kato; Masaaki Imaizumi
  14. Profitable Strategy Design by Using Deep Reinforcement Learning for Trades on Cryptocurrency Markets By Mohsen Asgari; Seyed Hossein Khasteh

  1. By: Eberhardt, Markus (University of Oxford); Facchini, Giovanni (University of Nottingham); Rueda, Valeria (University of Nottingham)
    Abstract: Academia, and economics in particular, faces increased scrutiny because of gender imbalance. This paper studies the job market for entry-level faculty positions. We employ machine learning methods to analyze gendered patterns in the text of 9,000 reference letters written in support of 2,800 candidates. Using both supervised and unsupervised techniques, we document widespread differences in the attributes emphasized. Women are systematically more likely to be described using "grindstone" terms and at times less likely to be praised for their ability. Given the time and effort letter writers devote to supporting their students, this gender stereotyping is likely due to unconscious biases.
    Keywords: gender, natural language processing, stereotyping, diversity
    JEL: J16 A11
    Date: 2022–01
  2. By: Byron Botha; Rulof Burger; Kevin Kotz; Neil Rankin; Daan Steenkamp
    Abstract: BigdataforecastingofSouthAfricaninflatio n
    Date: 2022–02–22
  3. By: Klockmann, Victor; von Schenk, Alicia; Villeval, Marie-Claire
    Abstract: With Big Data, decisions made by machine learning algorithms depend on training data generated by many individuals. In an experiment, we identify the effect of varying individual responsibility for the moral choices of an artificially intelligent algorithm. Across treatments, we manipulated the sources of training data and thus the impact of each individual's decisions on the algorithm. Diffusing such individual pivotality for algorithmic choices increased the share of selfish decisions and weakened revealed prosocial preferences. This does not result from a change in the structure of incentives. Rather, our results show that Big Data offers an excuse for selfish behavior through lower responsibility for one's and others' fate.
    Keywords: Artificial Intelligence,Big Data,Pivotality,Ethics,Experiment
    JEL: C49 C91 D10 D63 D64 O33
    Date: 2022
  4. By: Jiaxiong Yao; Mr. Yunhui Zhao
    Abstract: To reach the global net-zero goal, the level of carbon emissions has to fall substantially at speed rarely seen in history, highlighting the need to identify structural breaks in carbon emission patterns and understand forces that could bring about such breaks. In this paper, we identify and analyze structural breaks using machine learning methodologies. We find that downward trend shifts in carbon emissions since 1965 are rare, and most trend shifts are associated with non-climate structural factors (such as a change in the economic structure) rather than with climate policies. While we do not explicitly analyze the optimal mix between climate and non-climate policies, our findings highlight the importance of the nonclimate policies in reducing carbon emissions. On the methodology front, our paper contributes to the climate toolbox by identifying country-specific structural breaks in emissions for top 20 emitters based on a user-friendly machine-learning tool and interpreting the results using a decomposition of carbon emission ( Kaya Identity).
    Keywords: Climate Policies, Carbon Emissions, Machine Learning, Structural Break, Kaya Identity
    Date: 2022–01–21
  5. By: Isaiah Andrews; Drew Fudenberg; Annie Liang; Chaofeng Wu
    Abstract: Whether a model's performance on a given domain can be extrapolated to other settings depends on whether it has learned generalizable structure. We formulate this as the problem of theory transfer, and provide a tractable way to measure a theory's transferability. We derive confidence intervals for transferability that ensure coverage in finite samples, and apply our approach to evaluate the transferability of predictions of certainty equivalents across different subject pools. We find that models motivated by economic theory perform more reliably than black-box machine learning methods at this transfer prediction task.
    Date: 2022–02
  6. By: Babii, Andrii; Ghysels, Eric (Université catholique de Louvain, LIDAM/CORE, Belgium); Striaukas, Jonas
    Abstract: This paper introduces structured machine learning regressions for high-dimensional time series data potentially sampled at different frequencies. The sparse-group LASSO estimator can take advantage of such time series data structures and outperforms the unstructured LASSO. We establish oracle inequalities for the sparse-group LASSO estimator within a framework that allows for the mixing processes and recognizes that the financial and the macroeconomic data may have heavier than exponential tails. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that text data can be a useful addition to more traditional numerical data. Our methodology is implemented in the R package midasml, available from CRAN.
    Keywords: high-dimensional time series, fat tails, tau-mixing, sparse-group LASSO, mixed frequency data, textual news data
    Date: 2021–01–01
  7. By: Laureti, Lucio; Costantiello, Alberto; Matarrese, Marco Maria; Leogrande, Angelo
    Abstract: The determinants of enterprises providing ICT training in Europe are analyzed in this article. Data are collected from the European Innovation Scoreboard-EIS of the European Commission for 36 European countries in the period 2000-2019. Data are analyzed with Panel Data with Fixed Effects, Panel Data with Random Effects, Dynamic Panel, WLS and Pooled OLS. Results show that the number of enterprises providing ICT training in Europe is positively associate with “Innovation Index”, “Innovators”, “New Doctorate Graduates”, “Tertiary Education” and negatively associated with “Government Procurement of Advanced Technology Products”, “Human Resources”, and “Marketing or Organisational Innovators”. In adjunct a cluster analysis is performed by using k-Means algorithm optimized with the Silhouette Coefficient and we find the presence of four clusters. Finally, we use eight different machine learning algorithms to predict the value of the enterprises providing ICT training in Europe. We found that the Simple Tree Regression is the best predictor and that the number of enterprises providing ICT training in Europe is expected to growth of the 5,02%.
    Keywords: Innovation and Invention: Processes and Incentives; Management of Technological Innovation; Technological Change: Choices and Consequences; Intellectual Property and Intellectual Capital.
    JEL: O30 O31 O32 O33 O34
    Date: 2022–01–29
  8. By: Brunori, Paolo; Davillas, Apostolos; Jones, Andrew M.; Scarchilli, Giovanna
    Abstract: We measure unfair health inequality in the UK using a novel data- driven empirical approach. We explain health variability as the result of circumstances beyond individual control and health-related behaviours. We do this using model-based recursive partitioning, a supervised machine learning algorithm. Unlike usual tree-based algorithms, model-based recursive partitioning does identify social groups with different expected levels of health but also unveils the heterogeneity of the relationship linking behaviors and health outcomes across groups. The empirical application is conducted using the UK Household Longitudinal Study. We show that unfair inequality is a substantial fraction of the total explained health variability. This finding holds no matter which exact definition of fairness is adopted: using both the fairness gap and direct unfairness measures, each evaluated at different reference values for circumstances or effort.
    Keywords: inequality of opportunity; health equity; machine learning; unhealthy lifestyle behaviours
    JEL: D63
    Date: 2022–01
  9. By: Daniel Levy (Bar-Ilan University); Tamir Mayer; Alon Raviv
    Keywords: Financial crisis, Economic Crisis, Great recession, NBER working papers, LDA textual analysis, Topic modeling, Dynamic Topic Modeling, Machine learning
    JEL: E32 E44 E50 F30 G01 G20
    Date: 2022–02
  10. By: Joel Dyer; Patrick Cannon; J. Doyne Farmer; Sebastian Schmon
    Abstract: Simulation models, in particular agent-based models, are gaining popularity in economics. The considerable flexibility they offer, as well as their capacity to reproduce a variety of empirically observed behaviours of complex systems, give them broad appeal, and the increasing availability of cheap computing power has made their use feasible. Yet a widespread adoption in real-world modelling and decision-making scenarios has been hindered by the difficulty of performing parameter estimation for such models. In general, simulation models lack a tractable likelihood function, which precludes a straightforward application of standard statistical inference techniques. Several recent works have sought to address this problem through the application of likelihood-free inference techniques, in which parameter estimates are determined by performing some form of comparison between the observed data and simulation output. However, these approaches are (a) founded on restrictive assumptions, and/or (b) typically require many hundreds of thousands of simulations. These qualities make them unsuitable for large-scale simulations in economics and can cast doubt on the validity of these inference methods in such scenarios. In this paper, we investigate the efficacy of two classes of black-box approximate Bayesian inference methods that have recently drawn significant attention within the probabilistic machine learning community: neural posterior estimation and neural density ratio estimation. We present benchmarking experiments in which we demonstrate that neural network based black-box methods provide state of the art parameter inference for economic simulation models, and crucially are compatible with generic multivariate time-series data. In addition, we suggest appropriate assessment criteria for future benchmarking of approximate Bayesian inference procedures for economic simulation models.
    Date: 2022–02
  11. By: Klockmann, Victor; von Schenk, Alicia; Villeval, Marie-Claire
    Abstract: In more and more situations, artificially intelligent algorithms have to model humans' (social) preferences on whose behalf they increasingly make decisions. They can learn these preferences through the repeated observation of human behavior in social encounters. In such a context, do individuals adjust the selfishness or prosociality of their behavior when it is common knowledge that their actions produce various externalities through the training of an algorithm? In an online experiment, we let participants' choices in dictator games train an algorithm. Thereby, they create an externality on future decision making of an intelligent system that affects future participants. We show that individuals who are aware of the consequences of their training on the payoffs of a future generation behave more prosocially, but only when they bear the risk of being harmed themselves by future algorithmic choices. In that case, the externality of artificially intelligence training induces a significantly higher share of egalitarian decisions in the present.
    Keywords: Artificial Intelligence,Morality,Prosociality,Generations,Externalities
    JEL: C49 C91 D10 D62 D63 O33
    Date: 2022
  12. By: Santarelli, Enrico (University of Bologna); Staccioli, Jacopo (Università Cattolica del Sacro Cuore); Vivarelli, Marco (Università Cattolica del Sacro Cuore)
    Abstract: Using the entire population of USPTO patent applications published between 2002 and 2019, and leveraging on both patent classification and semantic analysis, this paper aims to map the current knowledge base centred on robotics and AI technologies. These technologies are investigated both as a whole and distinguishing core and related innovations, along a 4-level core-periphery architecture. Merging patent applications with the Orbis IP firm-level database allows us to put forward a twofold analysis based on industry of activity and geographic location. In a nutshell, results show that: (i) rather than representing a technological revolution, the new knowledge base is strictly linked to the previous technological paradigm; (ii) the new knowledge base is characterised by a considerable – but not impressively widespread – degree of pervasiveness; (iii) robotics and AI are strictly related, converging (particularly among the related technologies and in more recent times) and jointly shaping a new knowledge base that should be considered as a whole, rather than consisting of two separate GPTs; (iv) the US technological leadership turns out to be confirmed (although declining in relative terms in favour of Asian countries such as South Korea, China and, more recently, India).
    Keywords: robotics, artificial intelligence, general purpose technology, technological paradigm, industry 4.0, patents full-text
    JEL: O33
    Date: 2022–01
  13. By: Masahiro Kato; Masaaki Imaizumi
    Abstract: We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE), with linear regression models. As the development of machine learning for causal inference, a wide range of large-scale models for causality are gaining attention. One problem is that suspicions have been raised that the large-scale models are prone to overfitting to observations with sample selection, hence the large models may not be suitable for causal prediction. In this study, to resolve the suspicious, we investigate on the validity of causal inference methods for overparameterized models, by applying the recent theory of benign overfitting (Bartlett et al., 2020). Specifically, we consider samples whose distribution switches depending on an assignment rule, and study the prediction of CATE with linear models whose dimension diverges to infinity. We focus on two methods: the T-learner, which based on a difference between separately constructed estimators with each treatment group, and the inverse probability weight (IPW)-learner, which solves another regression problem approximated by a propensity score. In both methods, the estimator consists of interpolators that fit the samples perfectly. As a result, we show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known. This difference stems from that the T-learner is unable to preserve eigenspaces of the covariances, which is necessary for benign overfitting in the overparameterized setting. Our result provides new insights into the usage of causal inference methods in the overparameterizated setting, in particular, doubly robust estimators.
    Date: 2022–02
  14. By: Mohsen Asgari; Seyed Hossein Khasteh
    Abstract: Deep Reinforcement Learning solutions have been applied to different control problems with outperforming and promising results. In this research work we have applied Proximal Policy Optimization, Soft Actor-Critic and Generative Adversarial Imitation Learning to strategy design problem of three cryptocurrency markets. Our input data includes price data and technical indicators. We have implemented a Gym environment based on cryptocurrency markets to be used with the algorithms. Our test results on unseen data shows a great potential for this approach in helping investors with an expert system to exploit the market and gain profit. Our highest gain for an unseen 66 day span is 4850 US dollars per 10000 US dollars investment. We also discuss on how a specific hyperparameter in the environment design can be used to adjust risk in the generated strategies.
    Date: 2022–01

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.