nep-big New Economics Papers
on Big Data
Issue of 2020‒10‒12
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. The terminator of social welfare? The economic consequences of algorithmic discrimination By Bauer, Kevin; Pfeuffer, Nicolas; Abdel-Karim, Benjamin M.; Hinz, Oliver; Kosfeld, Michael
  2. Effects of Voice-Based Artificial Intelligence (AI) in Customer Service: Evidence from a Natural Experiment By Lingli Wang; Ni Huang; Yili Hong; Luning Liu; Xunhua Guo
  3. Redrawing of a Housing Market: Insurance Payouts and Housing Market Recovery in the Wake of the Christchurch Earthquake of 2011 By Cuong Nguyen; Ilan Noy; Dag Einar Sommervoll; Fang Yao
  4. Machine learning sentiment analysis, Covid-19 news and stock market reactions By Costola, Michele; Nofer, Michael; Hinz, Oliver; Pelizzon, Loriana
  5. Academic Offer of Advanced Digital Skills in 2019-20. International Comparison. Focus on Artificial Intelligence, High Performance Computing, Cybersecurity and Data Science By Riccardo Righi; Montserrat Lopez-Cobo; Georgios Alaveras; Sofia Samoili; Melisande Cardona; Miguel Vazquez-Prada Baillet; Lukasz Ziemba; Giuditta De-Prato
  6. Forecasting impacts of Agricultural Production on Global Maize Price By Rotem Zelingher; David Makowski; Thierry Brunelle
  7. Meta-learning approaches for recovery rate prediction By Gambetti, Paolo; Roccazzella, Francesco; Vrins, Frédéric
  8. Uncertainty and Monetary Policy during Extreme Events By Giovanni Pellegrino; Efrem Castelnuovo; Giovanni Caggiano
  9. Comparison of Variable Selection Methods for Time-to-Event Data in High-Dimensional Settings By J. Gilhodes; Florence Dalenc; Jocelyn Gal; C. Zemmour; Eve Leconte; Jean Marie Boher; Thomas Filleron
  10. The development of AI and its impact on business models, organization and work By Lucrezia Fanti; Dario Guarascio; Massimo Moggi
  11. Forecasting recovery rates on non-performing loans with machine learning By Bellotti, Anthony; Brigo, Damiano; Gambetti, Paolo; Vrins, Frédéric
  12. Optimal and robust combination of forecasts via constrained optimization and shrinkage By Roccazzella, Francesco; Gambetti, Paolo; Vrins, Frédéric
  13. Googling Unemployment During the Pandemic: Inference and Nowcast Using Search Data By Capema, Giulio; Colagrossi, Marco; Geraci, Andrea; Mazzarella, Gianluca
  14. Machine learning for optimizing complex site-specific management By Saikai, Yuji; Patel, Vivak; Mitchell, Paul
  15. Expanding the Measurement of Culture with a Sample of Two Billion Humans By Obradovich, Nick; Özak, Ömer; Martín, Ignacio; Awad, Edmond; Cebrián, Manuel; Cuevas, Rubén; Desmet, Klaus; Rahwan, Iyad; Cuevas, Ángel

  1. By: Bauer, Kevin; Pfeuffer, Nicolas; Abdel-Karim, Benjamin M.; Hinz, Oliver; Kosfeld, Michael
    Abstract: Using experimental data from a comprehensive field study, we explore the causal eects of algorithmic discrimination on economic eciency and social welfare. We harness economic, game-theoretic, and state-of-the-art machine learning concepts allowing us to overcome the central challenge of missing counterfactuals, which generally impedes assessing economic downstream consequences of algorithmic discrimination. This way, we are able to precisely quantify downstream eciency and welfare ramifications, which provides us a unique opportunity to assess whether the introduction of an AI system is actually desirable. Our results highlight that AI systems' capabilities in enhancing welfare critically depends on the degree of inherent algorithmic biases. While an unbiased system in our setting outperforms humans and creates substantial welfare gains, the positive impact steadily decreases and ultimately reverses the more biased an AI system becomes. We show that this relation is particularly concerning in selective-labels environments, i.e., settings where outcomes are only observed if decision-makers take a particular action so that the data is selectively labeled, because commonly used technical performance metrics like the precision measure are prone to be deceptive. Finally, our results depict that continued learning, by creating feedback loops, can remedy algorithmic discrimination and associated negative eects over time.
    Date: 2020
  2. By: Lingli Wang (School of Economics and Management, Tsinghua University, Wei Lun Building, 1 Tsinghua Yuan, Haidian District, 100084, Beijing, China); Ni Huang (C. T. Bauer College of Business, University of Houston, 4750 Calhoun Road, Houston, TX 77204. United States); Yili Hong (C. T. Bauer College of Business, University of Houston, 4750 Calhoun Road, Houston, TX 77204. United States); Luning Liu (School of Economics and Management, Harbin Institute of Technology, 92 Xidazhi St, Nangang, Harbin, Heilongjiang, China); Xunhua Guo (School of Economics and Management, Tsinghua University, Wei Lun Building, 1 Tsinghua Yuan, Haidian District, 100084, Beijing, China)
    Abstract: Voice-based artificial intelligence (AI) systems have been deployed gradually to replace traditional interactive voice response (IVR) systems in call center customer service, but little evidence exists on how the implementation of AI systems impacts customer behavior, as well as AI systems’ effects on call center customer service performance. Leveraging the proprietary data from a natural field experiment, we examine how the introduction of voice-based AI affects call length, customers’ demand for human service, and customer complaints in the call center customer service of a large telecommunication service firm. We find that the implementation of the AI system significantly increases call length and decreases customer complaints. Although the AI-based service system presumably reduces users’ efforts to transfer to human agents, we do not find any significant increase in customers’ demand for human service. Furthermore, our results show interesting heterogeneity in the effectiveness of the AI-based service system. For simple service requests, the AI-based service system reduces customer complaints for both experienced and inexperienced customers. For relatively complex quests, customers learn from prior experience of interacting with the AI system, and this learning effect leads to fewer complaints. Moreover, the AI-based system exerts a significantly larger effect on reducing customer complaints for older and female customers, as well as for customers who are experienced in using the IVR system. Finally, in examining details in customer-AI conversations, we find that speech-recognition failures in customer-AI interactions result in an increase in customers’ demand for human service and customer complaints.
    Keywords: Artificial Intelligence; Customer Service; Natural Field Experiment; Difference-in-Differences
    JEL: M15 L86
    Date: 2020–09
  3. By: Cuong Nguyen; Ilan Noy; Dag Einar Sommervoll; Fang Yao
    Abstract: On the 22nd of February 2011, much of the residential housing stock in the city of Christchurch, New Zealand, was damaged by an unusually destructive earthquake. Almost all of the houses were insured. We ask whether insurance was able to mitigate the damage adequately, or whether the damage from the earthquake, and the associated insurance payments, led to a spatial re-ordering of the housing market in the city. We find a negative correlation between insurance pay-outs and house prices at the local level. We also uncover evidence that suggests that the mechanism behind this result is that in some cases houses were not fixed (i.e., owners having pocketed the payments) - indeed, insurance claims that were actively repaired (rather than paid directly) did not lead to any relative deterioration in prices. We use a genetic machine-learning algorithm which aims to improve on a standard hedonic model, and identify the dynamics of the housing market in the city, and three data sets: All housing market transactions, all earthquake insurance claims submitted to the public insurer, and all of the local authority’s building-consents data. Our results are important not only because the utility of catastrophe insurance is often questioned, but also because understanding what happens to property markets after disasters should be part of the overall assessment of the impact of the disaster itself. Without a quantification of these impacts, it is difficult to design policies that will optimally try to prevent or ameliorate disaster impacts.
    Keywords: house price prediction, machine learning, genetic algorithm, spatial aggregation
    JEL: G22 Q54 R11 R31
    Date: 2020
  4. By: Costola, Michele; Nofer, Michael; Hinz, Oliver; Pelizzon, Loriana
    Abstract: The possibility to investigate the impact of news on stock prices has observed a strong evolution thanks to the recent use of natural language processing (NLP) in finance and economics. In this paper, we investigate COVID-19 news, elaborated with the "Natural Language Toolkit" that uses machine learning models to extract the news' sentiment. We consider the period from January till June 2020 and analyze 203,886 online articles that deal with the pandemic and that were published on three platforms:, and Our findings show that there is a significant and positive relationship between sentiment score and market returns. This result indicates that an increase (decrease) in the sentiment score implies a rise in positive (negative) news and corresponds to positive (negative) market returns. We also find that the variance of the sentiments and the volume of the news sources for Reuters and MarketWatch, respectively, are negatively associated to market returns indicating that an increase of the uncertainty of the sentiment and an increase in the arrival of news have an adverse impact on the stock market.
    Keywords: COVID-19 news,Sentiment Analysis,Stock Markets
    JEL: G10 G14 G15
    Date: 2020
  5. By: Riccardo Righi (European Commission - JRC); Montserrat Lopez-Cobo (European Commission - JRC); Georgios Alaveras (European Commission - JRC); Sofia Samoili (European Commission - JRC); Melisande Cardona (European Commission - JRC); Miguel Vazquez-Prada Baillet (European Commission - JRC); Lukasz Ziemba (European Commission - JRC); Giuditta De-Prato (European Commission - JRC)
    Abstract: This work aims at supporting policy initiatives to ensure the availability in the EU27 of an adequate education offer of advanced digital skills in the domains of artificial intelligence (AI), high performance computing (HPC), cybersecurity (CS) and data science (DS). The study investigates the education offer provided in the EU27 and six additional countries: the United Kingdom, Norway, and Switzerland in Europe, Canada and United States in America, and Australia, with a focus on the characteristics of the detected programmes. It analyses the number of programmes offered in these domains, considering the distinction based on programme’s scope or depth with which education programmes address the technological domain (broad and specialised), programme’s level (bachelor programmes, master programmes and short courses), as long as the education fields in which these programmes are taught (e.g. Information and communication technologies, Engineering, manufacturing and construction, Business, administration and law), and the content areas covered by the programmes. The analysis is conducted for each technological domain separately, first addressing the features of the overall education offer detected in the countries covered by the study, and followed by an in-depth analysis of the situation in the EU27. Among the many results that this work provides, those associated to the most relevant insights can be listed as follows. First of all, the main role in the offer of advanced technological skills is held by the US, which leads in terms of number of programs provided in almost all combinations of technological domain, scope and level. Secondly, another important player is the UK, with a very consistent offer of bachelor and master degree programs (in both cases, the UK’s share is around 25% of the total offer detected). The consequences of the Brexit have, therefore, to be considered and faced also in terms of the education offer of advanced technological skills in the EU27. Thirdly, the role of the EU27 is notable but more varying (depending on the combination of domain, scope and level of programmes) than that of the UK. Regarding more specific aspects related to the EU27 offer, we detect a good amount of programmes offered in the domain of DS. As this domain is found out to be remarkably associated to the field of education of Business, Administration and Law, this is a positive finding suggesting a good supply of competences that are suitable to economic activities of various types. Therefore, what observed for the EU27 suggests a good alignment between the offer and the demand of DS-related skills. In the EU27 we observe a large share of programmes belonging simultaneously to both DS and AI. Considering the relatively high offer in DS, and the fact that AI is currently a techno-economic domain that is attracting a lot of attention and of private and public resources, a consistent connection between these two domains can be considered as an important key to favour synergies and future economic growth. Additionally, we find DS programmes quite widespread among the fields of education, which may facilitate the role of DS as a vehicle to further introduce AI, HPC and CS in the fields of education barely addressing these technological domains. We also observe a relatively large offer of AI master degree programmes in the EU27, which is an important finding given the role of this education level in the provision of competences for the workforce. Finally, it is important to note that we detect potential elements of weakness in the EU27’s education offer related to CS. These competences are increasingly crucial to prevent and fight cyber-related incidents, concerning both private and public spheres. Therefore, the detection of a relatively modest CS education offer (in comparison to other geographic areas) is a point that deserves attention. Many other findings are described throughout this report, but what discussed in this abstract has to be retained as the most relevant content aimed at supporting EU policies.
    Keywords: digital skills, education, artificial intelligence, cybersecurity, high performance computing, digital transformation
    Date: 2020–09
  6. By: Rotem Zelingher (ECO-PUB - Economie Publique - AgroParisTech - Université Paris-Saclay - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); David Makowski; Thierry Brunelle (CIRED - Centre International de Recherche sur l'Environnement et le Développement - CNRS - Centre National de la Recherche Scientifique - ENPC - École des Ponts ParisTech - EHESS - École des hautes études en sciences sociales - AgroParisTech - Cirad - Centre de Coopération Internationale en Recherche Agronomique pour le Développement)
    Abstract: Agricultural price shocks strongly affect farmers' income and food security. It is therefore important to understand the origin of these shocks and anticipate their occurrence. In this study, we explore the possibility of predicting global prices of one of the world main agricultural commodity-maize-based on variations in regional production. We examine the performances of several machine-learning (ML) methods and compare them with a powerful time series model (TBATS) trained with 56 years of price data. Our results show that, out of nineteen regions, global maize prices are mostly influenced by Northern America. More specifically, small positive production changes relative to the previous year in Northern America negatively impact the world price while production of other regions have weak or no influence. We find that TBATS is the most accurate method for a forecast horizon of three months or less. For longer forecasting horizons, ML techniques based on bagging and gradient boosting perform better but require yearly input data on regional maize productions. Our results highlight the interest of ML for predicting global prices of major commodities and reveal the strong sensitivity of global maize price to small variations of maize production in Northern America.
    Keywords: Food-security,Maize,Agricultural commodity prices,Regional production,Machine learning
    Date: 2020–09–22
  7. By: Gambetti, Paolo; Roccazzella, Francesco; Vrins, Frédéric
    Keywords: machine learning ; forecasts combination ; loss given default ; credit risk ; model risk
    Date: 2020–01–01
  8. By: Giovanni Pellegrino; Efrem Castelnuovo; Giovanni Caggiano
    Abstract: How damaging are uncertainty shocks during extreme events such as the great recession and the Covid-19 outbreak? Can monetary policy limit output losses in such situations? We use a nonlinear VAR framework to document the large response of real activity to a financial uncertainty shock during the great recession. We replicate this evidence with an estimated DSGE framework featuring a concept of uncertainty comparable to that in our VAR. We employ the DSGE model to quantify the impact on real activity of an uncertainty shock under different Taylor rules estimated with normal times vs. great recession data (the latter associated with a stronger response to output). We find that the uncertainty shock-induced output loss experienced during the 2007-09 recession could have been twice as large if policymakers had not responded aggressively to the abrupt drop in output in 2008Q3. Finally, we use our estimated DSGE framework to simulate different paths of uncertainty associated to different hypothesis on the evolution of the coronavirus pandemic. We find that: i) Covid-19-induced uncertainty could lead to an output loss twice as large as that of the great recession; ii) aggressive monetary policy moves could reduce such loss by about 50%.
    Keywords: house price prediction, machine learning, genetic algorithm, spatial aggregation
    JEL: G22 Q54 R11 R31
    Date: 2020
  9. By: J. Gilhodes (Institut Claudius Regaud); Florence Dalenc (Institut Claudius Regaud); Jocelyn Gal (UNICANCER/CAL - Centre de Lutte contre le Cancer Antoine Lacassagne [Nice] - UNICANCER - UCA - Université Côte d'Azur); C. Zemmour (Institut Paoli-Calmettes - Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)); Eve Leconte (TSE - Toulouse School of Economics - UT1 - Université Toulouse 1 Capitole - EHESS - École des hautes études en sciences sociales - CNRS - Centre National de la Recherche Scientifique - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); Jean Marie Boher (Institut Paoli-Calmettes - Fédération nationale des Centres de lutte contre le Cancer (FNCLCC)); Thomas Filleron (Institut Claudius Regaud)
    Abstract: Over the last decades, molecular signatures have become increasingly important in oncology and are opening up a new area of personalized medicine. Nevertheless, biological relevance and statistical tools necessary for the development of these signatures have been called into question in the literature. Here, we investigate six typical selection methods for high-dimensional settings and survival endpoints, including LASSO and some of its extensions, component-wise boosting, and random survival forests (RSF). A resampling algorithm based on data splitting was used on nine high-dimensional simulated datasets to assess selection stability on training sets and the intersection between selection methods. Prognostic performances were evaluated on respective validation sets. Finally, one application on a real breast cancer dataset has been proposed. The false discovery rate (FDR) was high for each selection method, and the intersection between lists of predictors was very poor. RSF selects many more variables than the other methods and thus becomes less efficient on validation sets. Due to the complex correlation structure in genomic data, stability in the selection procedure is generally poor for selected predictors, but can be improved with a higher training sample size. In a very high-dimensional setting, we recommend the LASSO-pcvl method since it outperforms other methods by reducing the number of selected genes and minimizing FDR in most scenarios. Nevertheless, this method still gives a high rate of false positives. Further work is thus necessary to propose new methods to overcome this issue where numerous predictors are present. Pluridisciplinary discussion between clinicians and statisticians is necessary to ensure both statistical and biological relevance of the predictors included in molecular signatures.
    Date: 2020–07
  10. By: Lucrezia Fanti; Dario Guarascio; Massimo Moggi
    Abstract: This project explores the development of Artificial Intelligence (AI) and its impact on business models, market structure, organization and work. By adopting a history-friendly perspective, the present contribution traces a stylized history of the AI technological domain in order to highlight moments in time, places and sectoral domains that fostered its diffusion and transformative potential. Some descriptive analyses are also provided to investigate the diffusion of AI technologies and, at the same time, the underlying industrial and market dynamics.
    Keywords: Artificial intelligence; industrial dynamics; work organization
    Date: 2020–09–24
  11. By: Bellotti, Anthony; Brigo, Damiano; Gambetti, Paolo; Vrins, Frédéric
    Keywords: loss given default ; credit risk ; defaulted loans ; debt collection ; superior set of models
    Date: 2020–01–01
  12. By: Roccazzella, Francesco; Gambetti, Paolo; Vrins, Frédéric
    Keywords: forecasts combination ; robust methods ; optimal combination ; machine learning
    Date: 2020–01–01
  13. By: Capema, Giulio (European Commission); Colagrossi, Marco (European Commission); Geraci, Andrea (European Commission); Mazzarella, Gianluca (European Commission)
    Abstract: The economic crisis caused by the covid-19 pandemic is unprecedented in recent history. We contribute to a growing literature investigating the economic consequences of covid-19 by showing how unemployment-related online searches across the EU27 reacted to the introduction of lock-downs. We exploit Google Trends topics to retrieve over two thousand search queries related to unemployment in 27 countries. We nowcast the monthly unemployment rate in the EU Member States to assess the relationship between search data and the underlying phenomenon as well as to identify the keywords that improve predictive accuracy. Drawing from this finding, we use the set of best predictors in a Difference-in-Differences framework to document a surge of unemploymentrelated searches in the wake of lock-downs of about 30%. This effect persists for more than five weeks. We suggest that the effect is most likely due to an increase in unemployment expectations.
    Keywords: Unemployment, nowcast, random forest, covid-19, Google Trends, Difference-in-Differences
    JEL: E24 C21 C53
    Date: 2020–09
  14. By: Saikai, Yuji; Patel, Vivak; Mitchell, Paul
    Abstract: Despite the promise of precision agriculture for increasing the productivity by implementing site-specific management, farmers remain skeptical and its utilization rate is lower than expected. A major cause is a lack of concrete approaches to higher profitability. When involving many variables in both controlled management and monitored environment, optimal site-specific management for such high-dimensional cropping systems is considerably more complex than the traditional low-dimensional cases widely studied in the existing literature, calling for a paradigm shift in optimization of site-specific management. We propose an algorithmic approach that enables farmers to efficiently learn their own site-specific management through on-farm experiments. We test its performance in two simulated scenarios---one of medium complexity with 150 management variables and one of high complexity with 864 management variables. Results show that, relative to uniform management, site-specific management learned from 5-year experiments generates $43/ha higher profits with 25 kg/ha less nitrogen fertilizer in the first scenario and $40/ha higher profits with 55 kg/ha less nitrogen fertilizer in the second scenario. Thus, complex site-specific management can be learned efficiently and be more profitable and environmentally sustainable than uniform management.
    Keywords: Research and Development/Tech Change/Emerging Technologies
    Date: 2020–09–16
  15. By: Obradovich, Nick (Max Planck Institute for Human Development); Özak, Ömer (Southern Methodist University); Martín, Ignacio (Universidad Carlos III de Madrid); Awad, Edmond (University of Exeter); Cebrián, Manuel (Max Planck Institute for Human Development); Cuevas, Rubén (Universidad Carlos III de Madrid); Desmet, Klaus (Southern Methodist University); Rahwan, Iyad (Max Planck Institute for Human Development); Cuevas, Ángel (Universidad Carlos III de Madrid)
    Abstract: Culture has played a pivotal role in human evolution. Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments. Scholars of culture must regularly choose between scalable but sparse survey-based methods or restricted but rich ethnographic methods. Here, we demonstrate that massive online social networks can advance the study of human culture by providing quantitative, scalable, and high-resolution measurement of behaviorally revealed cultural values and preferences. We employ publicly available data across nearly 60,000 topic dimensions drawn from two billion Facebook users across 225 countries and territories. We first validate that cultural distances calculated from this measurement instrument correspond to traditional survey-based and objective measures of cross-national cultural differences. We then demonstrate that this expanded measure enables rich insight into the cultural landscape globally at previously impossible resolution. We analyze the importance of national borders in shaping culture, explore unique cultural markers that identify subnational population groups, and compare subnational divisiveness to gender divisiveness across countries. The global collection of massive data on human behavior provides a high-dimensional complement to traditional cultural metrics. Further, the granularity of the measure presents enormous promise to advance scholars' understanding of additional fundamental questions in the social sciences. The measure enables detailed investigation into the geopolitical stability of countries, social cleavages within both small and large-scale human groups, the integration of migrant populations, and the disaffection of certain population groups from the political process, among myriad other potential future applications.
    Keywords: gender differences, regional culture, identity, cultural distance, culture
    JEL: C80 F1 J1 O10 R10 Z10
    Date: 2020–09

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.