nep-big New Economics Papers
on Big Data
Issue of 2020‒10‒05
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Flexible Work Arrangements in Low Wage Jobs: Evidence from Job Vacancy Data By Adams-Prassl, Abigail; Balgova, Maria; Qian, Matthias
  2. Reassessing the Resource Curse using Causal Machine Learning By Hodler, Roland; Lechner, Michael; Raschky, Paul A.
  3. Object Recognition for Economic Development from Daytime Satellite Imagery By Klaus Ackermann; Alexey Chernikov; Nandini Anantharama; Miethy Zaman; Paul A Raschky
  4. The merge of two worlds: Integrating artificial neural networks into agent-based electricity market simulation By Fraunholz, Christoph; Kraft, Emil; Keles, Dogan; Fichtner, Wolf
  5. Machine Learning for Temporal Data in Finance: Challenges and Opportunities By Jason Wittenbach; Brian d'Alessandro; C. Bayan Bruss
  6. A Deep Learning Approach to Estimate Forward Default Intensities By Marc-Aurèle Divernois
  7. An Adaptive Strategy for Connected Eco-Driving under Uncertain Traffic and Signal Conditions By Hao, Peng; Wei, Zhensong; Bai, Zhengwei; Barth, Matthew
  8. Volatility Forecasting with 1-dimensional CNNs via transfer learning By Bernadett Aradi; G\'abor Petneh\'azi; J\'ozsef G\'all
  9. Supervised learning for the prediction of firm dynamics By Falco J. Bargagli-Stoffi; Jan Niederreiter; Massimo Riccaboni
  10. Forecasting the Leading Indicator of a Recession: The 10-Year minus 3-Month Treasury Yield Spread By Sudiksha Joshi
  11. Platform Design when Sellers Use Pricing Algorithms By Johnson, Justin Pappas; Rhodes, Andrew; Wildenbeest, Matthij
  12. A Machine Learning Based Regulatory Risk Index for Cryptocurrencies By Xinwen Ni; Wolfgang Karl H\"ardle; Taojun Xe
  13. Les incidences de l’intelligence artificielle sur la gestion des compétences dans le secteur des services financiers By Sylvie St-Onge; Michel Magnan; Catherine Vincent
  14. Man vs. Machine Learning: The Term Structure of Earnings Expectations and Conditional Biases By Jules H. van Binsbergen; Xiao Han; Alejandro Lopez-Lira
  15. Digital Entrepreneurship Research: A Concise Introduction By Naudé, Wim; Liebregts, Werner
  16. Suitability of index insurance: new insights from satellite data By Stigler, Matthieu M.; Lobell, David
  17. Inteligencia Artificial y los Artistas By Mario S. Moreno
  18. Visualizing Income Distribution in the United States By Humberto Barreto; Sang T. Truong
  19. Expanding the Measurement of Culture with a Sample of Two Billion Humans By Nick Obradovich; Ömer Özak; Ignacio Martín; Ignacio Ortuño-Ortín; Edmond Awad; Manuel Cebrián; Rubén Cuevas; Klaus Desmet; Iyad Rahwan; Ángel Cuevas

  1. By: Adams-Prassl, Abigail (University of Oxford); Balgova, Maria (IZA); Qian, Matthias (University of Oxford)
    Abstract: In this paper, we analyze firm demand for flexible jobs by exploiting the language used to describe work arrangements in job vacancies. We take a supervised machine learning approach to classify the work arrangements described in more than 46 million UK job vacancies. We highlight the existence of very different types of flexibility amongst low and high wage vacancies. Job flexibility at low wages is more likely to be offered alongside a wage-contract that exposes workers to earnings risk, while flexibility at higher wages and in more skilled occupations is more likely to be offered alongside a fixed salary that shields workers from earnings variation. We show that firm demand for flexible work arrangements is partly driven by a desire to reduce labor costs; we find that a large and unexpected change to the minimum wage led to a 7 percentage point increase in the proportion of flexible and non-salaried vacancies at low wages.
    Keywords: flexible jobs, minimum wage, labor demand
    JEL: J23 J31 J80
    Date: 2020–09
  2. By: Hodler, Roland; Lechner, Michael; Raschky, Paul A.
    Abstract: We reassess the effects of natural resources on economic development and conflict, applying a causal forest estimator and data from 3,800 Sub-Saharan African districts. We find that, on average, mining activities and higher world market prices of locally mined minerals both increase economic development and conflict. Consistent with the previous literature, mining activities have more positive effects on economic development and weaker effects on conflict in places with low ethnic diversity and high institutional quality. In contrast, the effects of changes in mineral prices vary little in ethnic diversity and institutional quality, but are non-linear and largest at relatively high prices.
    Keywords: Resource curse, mining, economic development, conflict, causal machine learning, Africa
    JEL: C21 O13 O55 Q34 R12
    Date: 2020–09
  3. By: Klaus Ackermann; Alexey Chernikov; Nandini Anantharama; Miethy Zaman; Paul A Raschky
    Abstract: Reliable data about the stock of physical capital and infrastructure in developing countries is typically very scarce. This is particular a problem for data at the subnational level where existing data is often outdated, not consistently measured or coverage is incomplete. Traditional data collection methods are time and labor-intensive costly, which often prohibits developing countries from collecting this type of data. This paper proposes a novel method to extract infrastructure features from high-resolution satellite images. We collected high-resolution satellite images for 5 million 1km $\times$ 1km grid cells covering 21 African countries. We contribute to the growing body of literature in this area by training our machine learning algorithm on ground-truth data. We show that our approach strongly improves the predictive accuracy. Our methodology can build the foundation to then predict subnational indicators of economic development for areas where this data is either missing or unreliable.
    Date: 2020–09
  4. By: Fraunholz, Christoph; Kraft, Emil; Keles, Dogan; Fichtner, Wolf
    Abstract: Machine learning and agent-based modeling are two popular tools in energy research. In this article, we propose an innovative methodology that combines these methods. For this purpose, we develop an electricity price forecasting technique using artificial neural networks and integrate the novel approach into the established agent-based electricity market simulation model PowerACE. In a case study covering ten interconnected European countries and a time horizon from 2020 until 2050 at hourly resolution, we benchmark the new forecasting approach against a simpler linear regression model as well as a naive forecast. Contrary to most of the related literature, we also evaluate the statistical significance of the superiority of one approach over another by conducting Diebold-Mariano hypothesis tests. Our major results can be summarized as follows. Firstly, in contrast to real-world electricity price forecasts, we find the naive approach to perform very poorly when deployed model-endogenously. Secondly, although the linear regression performs reasonably well, it is outperformed by the neural network approach. Thirdly, the use of an additional classifier for outlier handling substantially improves the forecasting accuracy, particularly for the linear regression approach. Finally, the choice of the model-endogenous forecasting method has a clear impact on simulated electricity prices. This latter finding is particularly crucial since these prices are a major results of electricity market models.
    Keywords: Agent-based simulation,Artificial neural network,Electricity price forecasting,Electricity market
    Date: 2020
  5. By: Jason Wittenbach; Brian d'Alessandro; C. Bayan Bruss
    Abstract: Temporal data are ubiquitous in the financial services (FS) industry -- traditional data like economic indicators, operational data such as bank account transactions, and modern data sources like website clickstreams -- all of these occur as a time-indexed sequence. But machine learning efforts in FS often fail to account for the temporal richness of these data, even in cases where domain knowledge suggests that the precise temporal patterns between events should contain valuable information. At best, such data are often treated as uniform time series, where there is a sequence but no sense of exact timing. At worst, rough aggregate features are computed over a pre-selected window so that static sample-based approaches can be applied (e.g. number of open lines of credit in the previous year or maximum credit utilization over the previous month). Such approaches are at odds with the deep learning paradigm which advocates for building models that act directly on raw or lightly processed data and for leveraging modern optimization techniques to discover optimal feature transformations en route to solving the modeling task at hand. Furthermore, a full picture of the entity being modeled (customer, company, etc.) might only be attainable by examining multiple data streams that unfold across potentially vastly different time scales. In this paper, we examine the different types of temporal data found in common FS use cases, review the current machine learning approaches in this area, and finally assess challenges and opportunities for researchers working at the intersection of machine learning for temporal data and applications in FS.
    Date: 2020–09
  6. By: Marc-Aurèle Divernois (EPFL; Swiss Finance Institute)
    Abstract: This paper proposes a machine learning approach to estimate physical forward default intensities. Default probabilities are computed using artificial neural networks to estimate the intensities of the inhomogeneous Poisson processes governing default process. The major contribution to previous literature is to allow the estimation of non-linear forward intensities by using neural networks instead of classical maximum likelihood estimation. The model specification allows an easy replication of previous literature using linear assumption and shows the improvement that can be achieved.
    Keywords: Bankruptcy, Credit Risk, Default, Machine Learning, Neural Networks, Doubly Stochastic, Forward Poisson Intensities
    JEL: C22 C23 C53 C58 G33 G34
    Date: 2020–07
  7. By: Hao, Peng; Wei, Zhensong; Bai, Zhengwei; Barth, Matthew
    Abstract: Connected and automated vehicle technology could bring about transformative reductions in traffic congestion, greenhouse gas emissions, air pollution, and energy consumption. Connected and automated vehicles (CAVs) can directly communicate with other vehicles and road infrastructure and use sensing technology and artificial intelligence to respond to traffic conditions and optimize fuel consumption. An eco-approach and departure application for connected and automated vehicles has been widely studied as a means of calculating the most energy-efficient speed profile and guiding a vehicle through signalized intersections without unnecessary stops and starts. Simulations using this application on roads with fixed-timing traffic signals have produced 12% reductions in fuel consumption and greenhouse gas emissions. But real-world traffic conditions are much more complex—uncertainties and the limited sensing range of automated vehicles create challenges for determining the most energy-efficient speed. To account for this uncertainty, researchers from the University of California, Riverside, propose a prediction-based, adaptive connected eco-driving strategy. The proposed strategy analyzes the possible upcoming traffic and signal scenarios based on historical data and live information collected from communication and sensing devices, and then chooses the most energy-efficient speed. This approach can be extended to accommodate different vehicle powertrains and types of roadway infrastructure. This research brief summarizes findings from the research and provides research implications. View the NCST Project Webpage
    Keywords: Engineering, Autonomous vehicles, Connected vehicles, Ecodriving, Energy consumption, Machine learning, Microsimulation, Signalized intersections, Vehicle mix
    Date: 2020–09–01
  8. By: Bernadett Aradi; G\'abor Petneh\'azi; J\'ozsef G\'all
    Abstract: Volatility is a natural risk measure in finance as it quantifies the variation of stock prices. A frequently considered problem in mathematical finance is to forecast different estimates of volatility. What makes it promising to use deep learning methods for the prediction of volatility is the fact, that stock price returns satisfy some common properties, referred to as `stylized facts'. Also, the amount of data used can be high, favoring the application of neural networks. We used 10 years of daily prices for hundreds of frequently traded stocks, and compared different CNN architectures: some networks use only the considered stock, but we tried out a construction which, for training, uses much more series, but not the considered stocks. Essentially, this is an application of transfer learning, and its performance turns out to be much better in terms of prediction error. We also compare our dilated causal CNNs to the classical ARIMA method using an automatic model selection procedure.
    Date: 2020–09
  9. By: Falco J. Bargagli-Stoffi; Jan Niederreiter; Massimo Riccaboni
    Abstract: Thanks to the increasing availability of granular, yet high-dimensional, firm level data, machine learning (ML) algorithms have been successfully applied to address multiple research questions related to firm dynamics. Especially supervised learning (SL), the branch of ML dealing with the prediction of labelled outcomes, has been used to better predict firms' performance. In this contribution, we will illustrate a series of SL approaches to be used for prediction tasks, relevant at different stages of the company life cycle. The stages we will focus on are (i) startup and innovation, (ii) growth and performance of companies, and (iii) firms exit from the market. First, we review SL implementations to predict successful startups and R&D projects. Next, we describe how SL tools can be used to analyze company growth and performance. Finally, we review SL applications to better forecast financial distress and company failure. In the concluding Section, we extend the discussion of SL methods in the light of targeted policies, result interpretability, and causality.
    Date: 2020–09
  10. By: Sudiksha Joshi
    Abstract: In this research paper, I have applied various econometric time series and two machine learning models to forecast the daily data on the yield spread. First, I decomposed the yield curve into its principal components, then simulated various paths of the yield spread using the Vasicek model. After constructing univariate ARIMA models, and multivariate models such as ARIMAX, VAR, and Long Short Term Memory, I calibrated the root mean squared error to measure how far the results deviate from the current values. Through impulse response functions, I measured the impact of various shocks on the difference yield spread. The results indicate that the parsimonious univariate ARIMA model outperforms the richly parameterized VAR method, and the complex LSTM with multivariate data performs equally well as the simple ARIMA model.
    Date: 2020–09
  11. By: Johnson, Justin Pappas; Rhodes, Andrew; Wildenbeest, Matthij
    Abstract: Using both economic theory and Artificial Intelligence (AI) pricing algorithms, we investigate the ability of a platform to design its marketplace to promote competition, improve consumer surplus, and even raise its own profits. We allow sellers to use Q-learning algorithms (a common reinforcement-learning technique from the computer-science literature) to devise pricing strategies in a setting with repeated interactions, and consider the effect of steering policies that reward firms that cut prices with additional exposure to consumers. Overall, the evidence from our experiments suggests that platform design decisions can meaningfully benefit consumers even when algorithmic collusion might otherwise emerge but that achieving these gains may require more than the simplest steering policies when algorithms value the future highly. We also find that policies that raise consumer surplus can raise the profits of the platform, depending on the platform’s revenue model. Finally, we document several learning challenges faced by the algorithms.
    Date: 2020–09–08
  12. By: Xinwen Ni; Wolfgang Karl H\"ardle; Taojun Xe
    Abstract: Cryptocurrencies' values often respond aggressively to major policy changes, but none of the existing indices informs on the market risks associated with regulatory changes. In this paper, we quantify the risks originating from new regulations on FinTech and cryptocurrencies (CCs), and analyse their impact on market dynamics. Specifically, a Cryptocurrency Regulatory Risk IndeX (CRRIX) is constructed based on policy-related news coverage frequency. The unlabeled news data are collected from the top online CC news platforms and further classified using a Latent Dirichlet Allocation model and Hellinger distance. Our results show that the machine-learning-based CRRIX successfully captures major policy-changing moments. The movements for both the VCRIX, a market volatility index, and the CRRIX are synchronous, meaning that the CRRIX could be helpful for all participants in the cryptocurrency market. The algorithms and Python code are available for research purposes on
    Date: 2020–09
  13. By: Sylvie St-Onge; Michel Magnan; Catherine Vincent
    Abstract: The rise of artificial intelligence (AI) offers many opportunities to financial services institutions but also presents a number of challenges, both organizational and societal. On the one hand, AI dramatically changes the skills management within these companies. On the other hand, AI has the potential to affect in a major way employment needs and prospects in the financial services sector, and therefore challenges governments and higher education institutions current practices. In this text, we provide an overview of recent writings on these issues, all interspersed with concrete illustrations. The objective is to present a portrait of the situation to date and to identify lines of thought or action for financial services institutions, educational institutions and governments as well as to propose some research avenues. La montée en puissance de l’intelligence artificielle (IA) offre de nombreuses opportunités aux entreprises du secteur financier mais comporte également un certain nombre d’enjeux, tant organisationnels que sociétaux. D’une part, l’IA modifie de manière dramatique la gestion des compétences au sein de ces entreprises. D’autre part, l’IA risque d’avoir un impact potentiel majeur sur les besoins et perspectives d’emploi dans le secteur financier, et interpelle donc les gouvernements et institutions d’enseignement supérieur. Dans ce texte, nous effectuons un survol des écrits récents sur ces questions, le tout intercalé d’illustrations concrètes. L’objectif visé est présenter un portrait de la situation à ce jour et d’identifier des pistes de réflexion ou d’action pour les entreprises du secteur financier, les institutions d’enseignement et les gouvernements ainsi que pour proposer des avenues de recherche.
    Keywords: Artificial Intelligence,Talent Management,Competencies Management,Financial Services,Employment, Intelligence artificielle,Gestion du talent,Gestion des compétences,Services financiers,Emploi
    Date: 2020–09–17
  14. By: Jules H. van Binsbergen; Xiao Han; Alejandro Lopez-Lira
    Abstract: We use machine learning to construct a statistically optimal and unbiased benchmark for firms' earnings expectations. We show that analyst expectations are on average biased upwards, and that this bias exhibits substantial time-series and cross-sectional variation. On average, the bias increases in the forecast horizon, and analysts revise their expectations downwards as earnings announcement dates approach. We find that analysts' biases are associated with negative cross-sectional return predictability, and the short legs of many anomalies consist of firms for which the analysts' forecasts are excessively optimistic relative to our benchmark. Managers of companies with the greatest upward biased earnings forecasts are more likely to issue stocks.
    JEL: D22 D83 D84 G11 G12 G14 G31
    Date: 2020–09
  15. By: Naudé, Wim (RWTH Aachen University); Liebregts, Werner (Tilburg University)
    Abstract: In the past few decades, technological progress has led to the digitization and digitalization of economies into what one could now call digital economies. The COVID-19 pandemic will accelerate the development of the digital economy. In a digital economy, digital entrepreneurs pursue opportunities to produce and trade in digital artifacts on digital artifact stores or platforms, and/or to create these digital artifact stores or platforms themselves. There is a well-recognized need for more research on digital entrepreneurship. As such, this paper provides an overview of the central research questions currently being pursued in this field. These include questions such as: What is digital entrepreneurship? What is different in the digital economy from an entrepreneurial perspective? What is the impact of digitalization - and big data - on business models and entrepreneurship? How can digital entrepreneurship be supported and regulated? The paper identifies areas of neglect, and makes proposals for future research.
    Keywords: gig economy, digital platforms, network effects, digital artifacts, digital entrepreneurship, digital entrepreneurial ecosystems
    JEL: L26 D21 M13 O33
    Date: 2020–09
  16. By: Stigler, Matthieu M.; Lobell, David
    Keywords: Agricultural Finance, Production Economics, Industrial Organization
    Date: 2020–07
  17. By: Mario S. Moreno
    Abstract: La tecnología hace tiempo ha despertado cierto interés en los artistas. Existen gran cantidad de obras de arte que emplean Inteligencia Artificial (IA) como parte de su construcción. ¿Existe espacio donde el ser humano y la máquina produzcan conjuntamente? Se generaron dos ramas, una que advierte que el software debilita la creatividad y otra donde aseguran que la máquina y el artista pueden no solo convivir sino potenciarse y/o estimularse el uno al otro. He aquí nuestro punto de análisis. Este trabajo pretende poner en discusión el impacto de la tecnología en el arte y su clasificación posterior.
    Date: 2020–09
  18. By: Humberto Barreto (Department of Economics and Management, DePauw University); Sang T. Truong (Department of Economics and Management, DePauw University)
    Abstract: Visit to see a novel, eye-catching visual display of the income distribution in the United States that conveys fundamental information about the evolution and current level of income inequality to a wide audience. We use IPUMS CPS data to create household income deciles adjusted for price level and household size for each of the 50 states and the District of Columbia from 1976 to 2018. We adjust for state price differences from 2008 to 2018. Plotting these data gives a 3D chart that provides a startling picture of income differences within and across states over time. Those interested in further customization can use our Python visualization toolbox, incomevis, available at
    Keywords: visualization, household income, inequality, microdata, big data, data-mining, Python
    JEL: A10 A20 J10
    Date: 2020–09–12
  19. By: Nick Obradovich; Ömer Özak; Ignacio Martín; Ignacio Ortuño-Ortín; Edmond Awad; Manuel Cebrián; Rubén Cuevas; Klaus Desmet; Iyad Rahwan; Ángel Cuevas
    Abstract: Culture has played a pivotal role in human evolution. Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments. Scholars of culture must regularly choose between scalable but sparse survey-based methods or restricted but rich ethnographic methods. Here, we demonstrate that massive online social networks can advance the study of human culture by providing quantitative, scalable, and high-resolution measurement of behaviorally revealed cultural values and preferences. We employ publicly available data across nearly 60,000 topic dimensions drawn from two billion Facebook users across 225 countries and territories. We first validate that cultural distances calculated from this measurement instrument correspond to traditional survey-based and objective measures of cross-national cultural differences. We then demonstrate that this expanded measure enables rich insight into the cultural landscape globally at previously impossible resolution. We analyze the importance of national borders in shaping culture, explore unique cultural markers that identify subnational population groups, and compare subnational divisiveness to gender divisiveness across countries. The global collection of massive data on human behavior provides a high-dimensional complement to traditional cultural metrics. Further, the granularity of the measure presents enormous promise to advance scholars' understanding of additional fundamental questions in the social sciences. The measure enables detailed investigation into the geopolitical stability of countries, social cleavages within both small and large-scale human groups, the integration of migrant populations, and the disaffection of certain population groups from the political process, among myriad other potential future applications.
    JEL: C80 J10 J16 O10 R10 Z10
    Date: 2020–09

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.