nep-big New Economics Papers
on Big Data
Issue of 2019‒10‒14
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Will Artificial Intelligence Replace Computational Economists Any Time Soon? By Maliar, Lilia; Maliar, Serguei; Winant, Pablo
  2. When the U.S. catches a cold, Canada sneezes: a lower-bound tale told by deep learning By Lepetyuk, Vadym; Maliar, Lilia; Maliar, Serguei
  3. Emerging African Economies:Digital Structures, Disruptive Responses and Demographic Implications By Nwaobi, Godwin
  4. Artificial Intelligence, Data, Ethics: An Holistic Approach for Risks and Regulation By Alexis Bogroff; Dominique Guégan
  5. On Fintech and Financial Inclusion By Thomas Philippon
  6. Application of Machine Learning in Forecasting International Trade Trends By Feras Batarseh; Munisamy Gopinath; Ganesh Nalluru; Jayson Beckman
  7. How Incumbents Beat Disruptors? Evidence from Hotels’ Responses to Home-sharing Rivals By Wei Chen; Karen Xie; Jianwei Liu; Yong Liu
  8. Macroeconomic Indicator Forecasting with Deep Neural Networks By Thomas Cook
  9. Local Incentives and National Tax Evasion: Unintended Effects of a Mining Royalties Reform in Colombia By Saavedra, S; Romero, M
  10. Forecasting Formal Employment in Cities By Eduardo Lora
  11. Digital Connectivity in sub-Saharan Africa: A Comparative Perspective By Emre Alper; Michal Miktus
  12. Use of AI and Its Impact on Business: Updated Evidence from a Firm Survey (Japanese) By MORIKAWA Masayuki
  13. Financial Frictions and the Wealth Distribution By Fernández-Villaverde, Jesús; Hurtado, Samuel; Nuño, Galo
  14. Institutions, infrastructures, and data friction – reforming secondary use of health data in Finland By Aula, Ville
  15. Predicting popularity of EV charging infrastructure from GIS data By Milan Straka; Pasquale De Falco; Gabriella Ferruzzi; Daniela Proto; Gijs van der Poel; Shahab Khormali; \v{L}ubo\v{s} Buzna
  16. Where is the middle class? Inequality, gender and the shape of the upper tail from 60 million English death and probate records, 1892-2016 By Cummins, Neil
  17. Royal African company networks By Ruderman, Anne Elizabeth; Heller, Mark; Xue, Harry
  18. Boosting High Dimensional Predictive Regressions with Time Varying Parameters By Kashif Yousuf; Serena Ng
  19. Hidden Wealth By Cummins, Neil

  1. By: Maliar, Lilia; Maliar, Serguei; Winant, Pablo
    Abstract: Artificial intelligence (AI) has impressive applications in many fields (speech recognition, computer vision, etc.). This paper demonstrates that AI can be also used to analyze complex and high-dimensional dynamic economic models. We show how to convert three fundamental objects of economic dynamics -- lifetime reward, Bellman equation and Euler equation -- into objective functions suitable for deep learning (DL). We introduce all-in-one integration technique that makes the stochastic gradient unbiased for the constructed objective functions. We show how to use neural networks to deal with multicollinearity and perform model reduction in Krusell and Smith's (1998) model in which decision functions depend on thousands of state variables -- we literally feed distributions into neural networks! In our examples, the DL method was reliable, accurate and linearly scalable. Our ubiquitous Python code, built with Dolo and Google TensorFlow platforms, is designed to accommodate a variety of models and applications.
    Keywords: artificial intelligence; Bellman equation; deep learning; Dynamic Models; Dynamic programming; Euler Equation; Machine Learning; neural network; stochastic gradient; value function
    Date: 2019–09
  2. By: Lepetyuk, Vadym; Maliar, Lilia; Maliar, Serguei
    Abstract: The Canadian economy was not initially hit by the 2007-2009 Great Recession but ended up having a prolonged episode of the effective lower bound (ELB) on nominal interest rates. To investigate the Canadian ELB experience, we build a "baby" ToTEM model -- a scaled-down version of the Terms of Trade Economic Model (ToTEM) of the Bank of Canada. Our model includes 49 nonlinear equations and 21 state variables. To solve such a high-dimensional model, we develop a projection deep learning algorithm -- a combination of unsupervised and supervised (deep) machine learning techniques. Our findings are as follows: The Canadian ELB episode was contaminated from abroad via large foreign demand shocks. Prolonged ELB episodes are easy to generate in open-economy models, unlike in closed-economy models. Nonlinearities associated with the ELB constraint have virtually no impact on the Canadian economy but other nonlinearities do, in particular, the degree of uncertainty and specific closing condition used to induce the model's stationarity.
    Keywords: central banking; clustering analysis large-scale model; deep learning; Machine Learning; neural networks; New Keynesian Model; supervised learning; ToTEM; ZLB
    JEL: C61 C63 C68 E31 E52
    Date: 2019–09
  3. By: Nwaobi, Godwin
    Abstract: Indeed, the world economy is a complex system that has undergone many different phases in the past century. Particularly, the African economy is undergoing a series of transformations (transitions) that subject the future to considerable uncertainty, complexity and unpredictability. In fact, some transformations are cyclical while others are longer-term and more structural in nature. Yet, these transitions or emergence interact in shaping the future; making extrapolation from the past an increasingly unreliable source for future predictions. Thus unlike the previous revolutions, the fourth industrial revolution is characterized by the emergence of various technologies such as virtual (augmented) realities, nanotechnologies, 3D printing, machine learning, big data, cloud computing, drones, autonomous vehicles, robotics, artificial intelligence and blockchain technologies. Again, in this digitization era, work is constantly reshaped by technological progress, while firms adopt new ways of production and markets expand. In other worlds, digital technology brings opportunity, pave the way to create new jobs and increase productivity. Unfortunately, this paper argued that while the digital revolution has forged ahead, its analog complements (regulated entry and competition, new economy skills access and accountable institutions) have not kept pace in Africa. Consequently, African governments should formulate digital development strategies that are much broader than current ICTs strategies. That is, they should create a policy and institutional environment for technology that fosters the greatest benefits to African people of twenty-first century and beyond.
    Keywords: Africa, Digitization, Industrial Revolution, Technologies, Disruptions, Development, Old Work, Innovation, Automation, ICTs, E-commerce Robotics, Artificial Intelligence Block Chain, Cryptology, Fintech, Productivity, New Skills, Human Capital, Institutions, Policies, Emergence, Transformations, Economies, Analog Complements, Unemployment, New Jobs, Social Protection
    JEL: D80 D83 E24 E60 G10 I2 J10 J40 J6 J60 L50 O10 O30 O31 O33 O38
    Date: 2019–10–03
  4. By: Alexis Bogroff (Université Paris1 Panthéon-Sorbonne); Dominique Guégan (Université Paris1 Panthéon-Sorbonne, Centre d'Economie de la Sorbonne, LabEx ReFi and Ca' Foscari University of Venezia)
    Abstract: An extensive list of risks relative to big data frameworks and their use through models of artificial intelligence is provided along with measurements and implementable solutions. Bias, interpretability and ethics are studied in depth, with several interpretations from the point of view of developers, companies and regulators. Reflexions suggest that fragmented frameworks increase the risks of models misspecification, opacity and bias in the result; Domain experts and statisticians need to be involved in the whole process as the business objective must drive each decision from the data extraction step to the final activatable prediction. We propose an holistic and original approach to take into account the risks encountered all along the implementation of systems using artificial intelligence from the choice of the data and the selection of the algorithm, to the decision making
    Keywords: Artificial Intelligence; Bias; Big Data; Ethics; Governance; Interpretability; Regulation; Risk
    JEL: C4 C5 C6 C8 D8 G28 G38 K2
    Date: 2019–06
  5. By: Thomas Philippon
    Abstract: The cost of financial intermediation has declined in recent years thanks to technological progress and increased competition. I document this fact and I analyze two features of new financial technologies that have stirred controversy: returns to scale, and the use of big data and machine learning. I argue that the nature of fixed versus variable costs in robo-advising is likely to democratize access to financial services. Big data is likely to reduce the impact of negative prejudice in the credit market but it could reduce the effectiveness of existing policies aimed at protecting minorities.
    JEL: G11 G2 L1 N2
    Date: 2019–09
  6. By: Feras Batarseh; Munisamy Gopinath; Ganesh Nalluru; Jayson Beckman
    Abstract: International trade policies have recently garnered attention for limiting cross-border exchange of essential goods (e.g. steel, aluminum, soybeans, and beef). Since trade critically affects employment and wages, predicting future patterns of trade is a high-priority for policy makers around the world. While traditional economic models aim to be reliable predictors, we consider the possibility that Machine Learning (ML) techniques allow for better predictions to inform policy decisions. Open-government data provide the fuel to power the algorithms that can explain and forecast trade flows to inform policies. Data collected in this article describe international trade transactions and commonly associated economic factors. Machine learning (ML) models deployed include: ARIMA, GBoosting, XGBoosting, and LightGBM for predicting future trade patterns, and K-Means clustering of countries according to economic factors. Unlike short-term and subjective (straight-line) projections and medium-term (aggre-gated) projections, ML methods provide a range of data-driven and interpretable projections for individual commodities. Models, their results, and policies are introduced and evaluated for prediction quality.
    Date: 2019–10
  7. By: Wei Chen (Eller College of Management, University of Arizona, Tucson, Arizona, 85721); Karen Xie (Daniels College of Business, University of Denver, Denver, Colorado, 80208); Jianwei Liu (School of Management, Harbin Institute of Technology, Harbin, China, 150001); Yong Liu (Eller College of Management, University of Arizona, Tucson, Arizona, 85721)
    Abstract: Growing research attention is paid to the disruption of sharing economy services (Airbnb, Uber, Lending Club, etc.) and how they cut into incumbent firms’ profit. Yet, the literature is silent on how incumbents respond to the rivalry and what are the performance outcomes if taking a defensive stance. In this paper, we investigate incumbent hotels’ responses to home sharing and how different reactions among hotels lead to distinct outcomes in customer satisfaction. Integrating casual inference and machine learning, we analyze large-scale, multidimensional data on hotels and home-sharing services in Beijing from March, 2015 to December, 2017 and three findings are gleaned. First, we find heterogeneous reactions of hotels, with their management responses to online guest reviews (reviews, hereafter) surging at higher-priced hotels while plunging at lower-priced ones compared with hotels that do not experience home sharing’s entry. The distinct response strategy (active vs passive) is likely due to different extent of decline in sales at these two types of hotels after home sharing’s entry. Second, hotels that are responsive to reviews experience a significant rise in customer satisfaction while the less responsive hotels do not. We show that this difference can be attributed to distinct response strategies of hotels and not their price segment (higher-priced or lower-priced). Third, utilizing state-of-the-art deep learning algorithms combined with topic modeling, we identify the theme-specific content features (topics and their sentiments) in reviews on both hotels and home sharing. Hotels that are responsive to reviews improve significantly on sentiments of two out of seven topics (i.e., cleanliness and service), which explains their performance gains when facing the disruption. And these two topics are the exact areas where home sharing outperforms hotels based on the review comparison. These suggest that responding to reviews allows hotel managers to not only bridge the gap between their property and home-sharing rivals but also differentiate from hotels less responsive - an interesting segmentation in the market when disruptors enter the game. This study makes the first attempt to investigate incumbent firms acting to sharing economy disruptors. Implications are made on how different types of hotels can and should react for improved performance.
    Keywords: Incumbent business, sharing economy, management response, difference-in-differences, deep learning, convolutional neutral network, topic modeling
    JEL: L8 M31
    Date: 2019–09
  8. By: Thomas Cook (Federal Reserve Bank of Kansas City)
    Abstract: Economic policymaking relies upon accurate forecasts of economic conditions. Current methods for unconditional forecasting are dominated by inherently linear models that exhibit model dependence and have high data demands. We explore deep neural networks as an opportunity to improve upon forecast accuracy with limited data and while remaining agnostic as to functional form. We focus on predicting civilian unemployment using models based on four different neural network architectures. Each of these models outperforms bench- mark models at short time horizons. One model, based on an Encoder Decoder architecture outperforms benchmark models at every forecast horizon (up to four quarters).
    Date: 2019
  9. By: Saavedra, S; Romero, M
    Abstract: Achieving a fair distribution of resources is one of the key goals of fiscal policy. To do this, governments often transfer tax resources from rich to marginalized areas. We study whether lower transfers dampen the incentives of local authorities to curb tax evasion in the context of mining in Colombia. To overcome the challenge of measuring evasion, we use machine learning on satellite images. Using differencein- differences strategies, we find that a reduction in the share of revenue transferred back to mining municipalities led to an increase in illegal mining. This result illustrates the difficulties of redistributing tax revenues.
    Keywords: Illegal Minig, Machine Learning
    JEL: H26 O13 O17
    Date: 2019–10–02
  10. By: Eduardo Lora (Center for International Development at Harvard University)
    Abstract: Can “full and productive employment for all” be achieved by 2030 as envisaged by the United Nations Sustainable Development Goals? This paper assesses the issue for the largest 62 Colombian cities using social security administrative records between 2008 and 2015, which show that the larger the city, the higher its formal occupation rate. This is explained by the fact that formal employment creation is restricted by the availability of the diverse skills needed in complex sectors. Since skill accumulation is a gradual path-dependent process, future formal employment by city can be forecasted using either ordinary least square regression results or machine learning algorithms. The results show that the share of working population in formal employment will increase between 13 and nearly 32 percent points between 2015 and 2030, which is substantial but still insufficient to achieve the goal. Results are broadly consistent across methods for the larger cities, but not the smaller ones. For these, the machine learning method provides nuanced forecasts which may help further explorations into the relation between complexity and formal employment at the city level.
    Keywords: Employment creation
    Date: 2019–07
  11. By: Emre Alper; Michal Miktus
    Abstract: Higher digital connectivity is expected to bring opportunities to leapfrog development in sub-Saharan Africa (SSA). Experience within the region demonstrates that if there is an adequate digital infrastructure and a supportive business environment, new forms of business spring up and create jobs for the educated as well as the less educated. The paper first confirms the global digital divide through the unsupervised machine learning clustering K-means algorithm. Next, it derives a composite digital connectivity index, in the spirit of De Muro-Mazziotta-Pareto, for about 190 economies. Descriptive analysis shows that majority of SSA countries lag in digital connectivity, specifically in infrastructure, internet usage, and knowledge. Finally, using fractional logit regressions we document that better business enabling and regulatory environment, financial access, and urbanization are associated with higher digital connectivity.
    Date: 2019–09–27
  12. By: MORIKAWA Masayuki
    Abstract: This study, based on an original survey of Japanese firms, presents evidence on the use of AI, big data, and robots as well as firms' perception about the impacts of these new technologies on business and employment. The major findings can be summarized as follows. First, the number of firms already using AI and big data is small, but the number of firms interested in using these technologies for their business is large and increasing. Second, the use of AI and big data is positively associated with the share of highly educated employees, but this relationship is weak for the use of robots in the manufacturing industry. Third, the use of the new technologies has a strong positive association with the innovation probability of the firms. Fourth, the majority, and an increasing number of firms view the impact of these new technologies on their future business positively. Fifth, relatively large numbers of firms expects that the use of these new technologies is likely to reduce their employees.
    Date: 2019–08
  13. By: Fernández-Villaverde, Jesús; Hurtado, Samuel; Nuño, Galo
    Abstract: This paper investigates how, in a heterogeneous agents model with financial frictions, idiosyncratic individual shocks interact with exogenous aggregate shocks to generate time-varying levels of leverage and endogenous aggregate risk. To do so, we show how such a model can be efficiently computed, despite its substantial nonlinearities, using tools from machine learning. We also illustrate how the model can be structurally estimated with a likelihood function, using tools from inference with diffusions. We document, first, the strong nonlinearities created by financial frictions. Second, we report the existence of multiple stochastic steady states with properties that differ from the deterministic steady state along important dimensions. Third, we illustrate how the generalized impulse response functions of the model are highly state-dependent. In particular, we find that the recovery after a negative aggregate shock is more sluggish when the economy is more leveraged. Fourth, we prove that wealth heterogeneity matters in this economy because of the asymmetric responses of household consumption decisions to aggregate shocks.
    Keywords: Aggregate shocks; continuous-time; Heterogeneous Agents; Machine Learning; structural estimation
    JEL: C45 C63 E32 E44 G01 G11
    Date: 2019–09
  14. By: Aula, Ville
    Abstract: New data-driven ideas of healthcare have increased pressures to reform existing data infrastructures. This paper explores the role of data governing institutions during a reform of both secondary health data infrastructure and related legislation in Finland. The analysis elaborates on recent conceptual work on data journeys and data frictions, connecting them to institutional and regulatory issues. The study employs an interpretative approach, using interview and document data. The results show the stark contrast between the goals of open and big data inspired reforms and the existing institutional realities. The multiple tensions that emerged during the process indicate how data frictions emanate to the institutional level, and how mundane data practices and institutional dynamics are intertwined. The paper argues that in the Finnish case, public institutions acted as sage-guards of public interest, preventing more controversial parts from passing. Finally, it argues that initiating regulatory and infrastructural reforms simultaneously was beneficial for solving the tensions of the initiative and analyzing either side separately would have produced misleading accounts of the overall initiative. The results highlight the benefits of analyzing institutional dynamics and data practices as connected issues.
    Keywords: ES/P000622/1
    JEL: J50
    Date: 2019–09–30
  15. By: Milan Straka; Pasquale De Falco; Gabriella Ferruzzi; Daniela Proto; Gijs van der Poel; Shahab Khormali; \v{L}ubo\v{s} Buzna
    Abstract: The availability of charging infrastructure is essential for large-scale adoption of electric vehicles (EV). Charging patterns and the utilization of infrastructure have consequences not only for the energy demand, loading local power grids but influence the economic returns, parking policies and further adoption of EVs. We develop a data-driven approach that is exploiting predictors compiled from GIS data describing the urban context and urban activities near charging infrastructure to explore correlations with a comprehensive set of indicators measuring the performance of charging infrastructure. The best fit was identified for the size of the unique group of visitors (popularity) attracted by the charging infrastructure. Consecutively, charging infrastructure is ranked by popularity. The question of whether or not a given charging spot belongs to the top tier is posed as a binary classification problem and predictive performance of logistic regression regularized with an l-1 penalty, random forests and gradient boosted regression trees is evaluated. Obtained results indicate that the collected predictors contain information that can be used to predict the popularity of charging infrastructure. The significance of predictors and how they are linked with the popularity are explored as well. The proposed methodology can be used to inform charging infrastructure deployment strategies.
    Date: 2019–10
  16. By: Cummins, Neil
    Abstract: This paper analyses a newly constructed individual level dataset of every English death and probate from 1892-2016. The estimated top wealth shares match closely existing estimates. However, this analysis clearly shows that the 20th century's `Great Equalization' of wealth stalled in mid-century. The probate rate, which captures the proportion of English with any significant wealth at death rose from 10% in the 1890s to 40% by 1950 and has stagnated to 2016. Despite the large declines in the wealth share of the top 1%, from 73% to 20%, the median English person died with almost nothing throughout. All changes in inequality after 1950 involve a reshuffling of wealth within the top 30%. Further, I find that a log-linear distribution fits the empirical data better than a Pareto power law. Finally, I show that the top wealth shares are increasingly and systematically male as one ascends in wealth, 1892- 1992, but this has equalized over the 20th century.
    Keywords: inequality; economic history; big data
    JEL: N00 N33 N34 D31
    Date: 2019–02
  17. By: Ruderman, Anne Elizabeth; Heller, Mark; Xue, Harry
    Abstract: Royal African Company Networks is a pilot project designed to explore the possibilities of using computational text analysis and GIS to investigate the correspondence of the Royal African Company, England’s late seventeenth-century African trade monopoly. Our project maps over 3,000 letters between the company’s main fort, Cape Coast Castle, in modern-day Ghana and the company’s ‘outforts,’ or smaller holdings on the coast. We then combine mapping with computational text analysis to draw out themes in the correspondence. We hope this project demonstrates the potential of bringing an interdisciplinary approach to historical analysis and serves as a stepping-stone for further exploration.
    JEL: N0
    Date: 2019
  18. By: Kashif Yousuf; Serena Ng
    Abstract: High dimensional predictive regressions are useful in wide range of applications. However, the theory is mainly developed assuming that the model is stationary with time invariant parameters. This is at odds with the prevalent evidence for parameter instability in economic time series, but theories for parameter instability are mainly developed for models with a small number of covariates. In this paper, we present two $L_2$ boosting algorithms for estimating high dimensional models in which the coefficients are modeled as functions evolving smoothly over time and the predictors are locally stationary. The first method uses componentwise local constant estimators as base learner, while the second relies on componentwise local linear estimators. We establish consistency of both methods, and address the practical issues of choosing the bandwidth for the base learners and the number of boosting iterations. In an extensive application to macroeconomic forecasting with many potential predictors, we find that the benefits to modeling time variation are substantial and they increase with the forecast horizon. Furthermore, the timing of the benefits suggests that the Great Moderation is associated with substantial instability in the conditional mean of various economic series.
    Date: 2019–10
  19. By: Cummins, Neil
    Abstract: Sharp declines in wealth-concentration occurred across Europe and the US during the 20th century. But this stylized fact is based on declared wealth. It is possible that today the richest are not less rich but rather that they are hiding much of their wealth. This paper proposes a method to measure this hidden wealth, in any form. In England, 1920-1992, elites are concealing 20-32% of their wealth. Among dynasties, hidden wealth, independent of declared wealth, predicts appearance in the Offshore Leaks Database of 2013-6, house values in 1999, and Oxbridge attendance, 1990-2016. Accounting for hidden wealth eliminates one-third of the observed decline of top 10% wealth-share over the past century.
    Keywords: Big Data; economic history; hidden wealth; inequality; tax evasion
    JEL: D31 H26 N00 N33 N34
    Date: 2019–09

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.