nep-big New Economics Papers
on Big Data
Issue of 2020‒03‒30
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Perceptions of Coronavirus Mortality and Contagiousness Weaken Economic Sentiment By Thiemo Fetzer; Lukas Hensel; Johannes Hermle; Christopher Roth
  2. Impacts of Social and Economic Factors on the Transmission of Coronavirus Disease 2019 (COVID-19) in China By Qiu, Yun; Chen, Xi; Shi, Wei
  3. Application of Deep Q-Network in Portfolio Management By Ziming Gao; Yuan Gao; Yi Hu; Zhengyong Jiang; Jionglong Su
  4. Propuestas para la Bancarización e Inclusión Financiera en Argentina By Emiliano Delfau
  5. Artificial Intelligence, Historical Materialism, and close enough to a jobless society By Rogerio Silva Mattos
  6. Forecasting in a complex environment: Machine learning sales expectations in a Stock Flow Consistent Agent-Based simulation model By Ermanno Catullo; Mauro Gallegati; Alberto Russo
  7. Traffic, Air Pollution, and Distributional Impacts in Dar es Salaam : A Spatial Analysis with New Satellite Data By Dasgupta,Susmita; Lall,Somik V.; Wheeler,David
  8. Understanding the Exposure at Default Risk of Commercial Real Estate Construction and Land Development Loans By Shan Luo; Anthony Murphy
  9. Double Machine Learning based Program Evaluation under Unconfoundedness By Michael C. Knaus
  10. There is Not One But Many AI: A Network Perspective on Regional Demand in AI Skills By Stephany, Fabian
  11. Matching State Business Registration Records to Census Business Data By J. Daniel Kim; Kristin McCue
  12. Deep Deterministic Portfolio Optimization By Ayman Chaouki; Stephen Hardiman; Christian Schmidt; Emmanuel S\'eri\'e; Joachim de Lataillade
  13. Understanding the Geographical Distribution of Stunting in Tanzania : A Geospatial Analysis of the 2015-16 Demographic and Health Survey By Joseph,George; Gething,Peter William; Bhatt,Samir; Ayling,Sophie Charlotte Emi
  14. Le digital labor : humain, trop humain By Frédérique Pallez

  1. By: Thiemo Fetzer; Lukas Hensel; Johannes Hermle; Christopher Roth
    Abstract: We provide the first analysis on how fear of the novel coronavirus affects current economic sentiment. First, we collect a global dataset on internet searches indicative of economic anxieties, which serve as a leading indicator of subsequent aggregate demand contractions. We find that the arrival of coronavirus in a country leads to a substantial increase in such internet searches of up to 58 percent. Second, to understand how information about the coronavirus drives economic anxieties, we conduct a survey experiment in a representative sample of the US population. We find that participants vastly overestimate mortality from and contagiousness of the virus. Providing participants with information regarding these statistics substantially lowers participants' expectations about the severity of the crisis and participants' worries regarding the aggregate economy and their personal economic situation. These results suggest that factual public education about the virus will help to contain spreading economic anxiety and improve economic sentiment.
    Date: 2020–03
  2. By: Qiu, Yun; Chen, Xi; Shi, Wei
    Abstract: This paper examines the role of various socioeconomic factors in mediating the local and cross-city transmissions of the novel coronavirus 2019 (COVID-19) in China. We implement a machine learning approach to select instrumental variables that strongly predict virus transmission among the rich exogenous weather characteristics. Our 2SLS estimates show that the stringent quarantine, massive lockdown and other public health measures imposed in late January significantly reduced the transmission rate of COVID-19. By early February, the virus spread had been contained. While many socioeconomic factors mediate the virus spread, a robust government response since late January played a determinant role in the containment of the virus. We also demonstrate that the actual population ow from the outbreak source poses a higher risk to the destination than other factors such as geographic proximity and similarity in economic conditions. The results have rich implications for ongoing global efforts in containment of COVID-19.
    Keywords: 2019 novel coronavirus,transmission
    JEL: I18 I12 C23
    Date: 2020
  3. By: Ziming Gao; Yuan Gao; Yi Hu; Zhengyong Jiang; Jionglong Su
    Abstract: Machine Learning algorithms and Neural Networks are widely applied to many different areas such as stock market prediction, face recognition and population analysis. This paper will introduce a strategy based on the classic Deep Reinforcement Learning algorithm, Deep Q-Network, for portfolio management in stock market. It is a type of deep neural network which is optimized by Q Learning. To make the DQN adapt to financial market, we first discretize the action space which is defined as the weight of portfolio in different assets so that portfolio management becomes a problem that Deep Q-Network can solve. Next, we combine the Convolutional Neural Network and dueling Q-net to enhance the recognition ability of the algorithm. Experimentally, we chose five lowrelevant American stocks to test the model. The result demonstrates that the DQN based strategy outperforms the ten other traditional strategies. The profit of DQN algorithm is 30% more than the profit of other strategies. Moreover, the Sharpe ratio associated with Max Drawdown demonstrates that the risk of policy made with DQN is the lowest.
    Date: 2020–03
  4. By: Emiliano Delfau
    Abstract: Es sabido que en Argentina, contrariamente a la región, el crédito interno sobre PBI históricamente se encontró en niveles bajos. Analizando los últimos diez años podemos ver que el país fluctuó siempre entre un 12 y 15% de bancarización respecto al PBI (Banco Mundial, 2017). Asimismo, encontramos que el sector micro informal de la estructura productiva alcanzó el 49,3% hacia fines de 2018 (UCA, 2018). Esto nos muestra que el desafío no es solo bancarizar a la población no bancarizada sino, además, lograr bancarizar a parte de la población del sector micro informal. De esta manera estaríamos no solo abordando un proyecto de inclusión financiera, sino que asimismo trataríamos de minimizar el uso de efectivo por otros medios de pago y transacciones. No obstante las características mencionadas anteriormente, la Argentina si se encuentra dentro de las tendencias tecnológicas mundiales y, por lo tanto, el país cuenta con un “ecosistema” tecnológico que le permitiría afrontar los desafíos antes mencionados. Bajo la premisa de “todo dato es dato crediticio” se plantea la creación de una plataforma o banco con abordaje 100% digital cuyo motor principal sea un score de crédito basado en análisis de Big Data y técnicas de Machine Learning. Finalmente se enumeran algunos casos de éxito sobre este nuevo modelo de negocio mediante la aplicación de Big Data y técnicas de Machine Learning
    Keywords: Inclusión Financiera, Big Data, Fintechs, Datos no Estructurados, Analytics, Scoring, Recnología Digital, Banca Digital
    Date: 2020–02
  5. By: Rogerio Silva Mattos (Universidade Federal de Juiz de Fora)
    Abstract: Advancing artificial Intelligence draws most of its power from the artificial neural network, a software technique that has successfully replicated some information processing functions of the human brain and the unconscious mind. Jobs are at risk to disappear because even the tacit knowledge typically used by humans to perform complex tasks is now amenable to computerization. The paper discusses implications of this technology for capitalism and jobs, concluding that a very long run transition to a jobless economy should not be discarded. Rising business models and new collaborative schemes provide clues for how things may unfold. A scenario in which society is close enough to full unemployment is analyzed and strategic paths to tackle the challenges involved are discussed. The analysis follows an eclectic approach, based on the Marxist theory of historical materialism and the job task model created by mainstream economists.
    Keywords: artificial intelligence,historical materialism,task model,neural networks,jobless society
    Date: 2019–01–01
  6. By: Ermanno Catullo (Research Department, Link Campus University, Rome, Italy); Mauro Gallegati (Department of Management, Università Politecnica delle Marche, Acona, Italy); Alberto Russo (Department of Management, Università Politecnica delle Marche, Ancona, Italy and Department of Economics, Universitat Jaume I, Castellón, Spain)
    Abstract: The aim of this paper is to investigate how different degrees of sophistication in agents’ behavioural rules may affect individual and macroeconomic performances. In particular, we analyze the effects of introducing into an agentbased macro model firms that are able to formulate effective sales forecasts by using machine learning. These techniques are able to provide predictions that are unbiased and present a certain degree of accuracy, especially in the case of a genetic algorithm. We observe that machine learning allows firms to increase profits, though this result in a declining wage share and a smaller long-run growth rate. Moreover, the predictive methods are able to formulate expectations that remain unbiased when shocks are not massive, thus providing firms with forecasting capabilities that to a certain extent may be consistent with the Lucas Critique.
    Keywords: agent-based model, machine learning, genetic algorithm, forecasting, policy shocks
    JEL: C63 D84 E32 E37
    Date: 2020
  7. By: Dasgupta,Susmita; Lall,Somik V.; Wheeler,David
    Abstract: Air pollution from vehicular traffic is a major source of health damage in urban areas. The problems of urban traffic and pollution are essentially geographic, because their incidence and impacts depend on the spatial distribution of economic activities, households, and transport links. This paper uses satellite images to investigate the spatial dynamics of vehicle traffic, air pollution, and exposure of vulnerable residents in the Dar es Salaam metro region of Tanzania. The results highlight significant impacts of seasonal weather (temperature, humidity, and wind-speed factors) on the spatial distribution and intensity of air pollution from vehicle emissions. These effects on the metro region's air quality vary highly by area. During seasons when weather factors maximize pollution, the worst exposure occurs in areas along the wind path of high-traffic roadways. The research identifies core areas where congestion reduction would yield the greatest exposure reduction for children and the elderly in poor households.
    Keywords: Intelligent Transport Systems,Air Quality&Clean Air,Pollution Management&Control,Brown Issues and Health,Inequality,Health Care Services Industry,Railways Transport
    Date: 2020–03–13
  8. By: Shan Luo (Federal Reserve Bank of Chicago); Anthony Murphy
    Abstract: We study and model the determinants of exposure at default (EAD) for large U.S. construction and land development loans from 2010 to 2017. EAD is an important component of credit risk, and commercial real estate (CRE) construction loans are more risky than income producing loans. This is the first study modeling the EAD of construction loans. The underlying EAD data come from a large, confidential supervisory dataset used in the U.S. Federal Reserve’s annual Comprehensive Capital Assessment Review (CCAR) stress tests. EAD reflects the relative bargaining ability and information sets of banks and obligors. We construct OLS and Tobit regression models, as well as several other machine-learning models, of EAD conversion measures, using a four-quarter horizon. The popular LEQ and CCF conversion measure is unstable, so we focus on EADF and AUF measures. Property type, the lagged utilization rate and loan size are important drivers of EAD. Changing local and national economic conditions also matter, so EAD is sensitive to macro-economic conditions. Even though default and EAD risk are negatively correlated, a conservative assumption is that all undrawn construction commitments will be fully drawn in default.
    Keywords: Credit Risk; Commercial Real Estate (CRE); Construction; Exposure at Default; EAD Conversion Measures; Macro-sensitivity; Machine Learning
    JEL: G21 G28
    Date: 2020–03–17
  9. By: Michael C. Knaus
    Abstract: This paper consolidates recent methodological developments based on Double Machine Learning (DML) with a focus on program evaluation under unconfoundedness. DML based methods leverage flexible prediction methods to control for confounding in the estimation of (i) standard average effects, (ii) different forms of heterogeneous effects, and (iii) optimal treatment assignment rules. We emphasize that these estimators build all on the same doubly robust score, which allows to utilize computational synergies. An evaluation of multiple programs of the Swiss Active Labor Market Policy shows how DML based methods enable a comprehensive policy analysis. However, we find evidence that estimates of individualized heterogeneous effects can become unstable.
    Date: 2020–03
  10. By: Stephany, Fabian
    Abstract: This work proposes a network perspective in order to empirically identify the relevant ICT skills related to AI, to what extent they are systemically related, and how their composition varies across regions. With the example of 5,227 job openings from Germany advertised as postings in Artificial Intelligence, relevant skills are identified and connected in a network fashion. Two skills are connected, if they are jointly required by the same job advertisement. Similarly, regional skill networks can be constructed: Job postings are screened by city location and skill networks are constructed for this set of regional postings exclusively. The resulting networks depict the regional city ecosystem of AI skills currently in demand.
    Date: 2020–03–02
  11. By: J. Daniel Kim; Kristin McCue
    Abstract: We describe our methodology and results from matching state Business Registration Records (BRR) to Census business data. We use data from Massachusetts and California to develop methods and preliminary results that could be used to guide matching data for additional states. We obtain matches to Census business records for 45% of the Massachusetts BRR records and 40% of the California BRR records. We find higher match rates for incorporated businesses and businesses with higher startup-quality scores as assigned in Guzman and Stern (2018). Clerical reviews show that using relatively strict matching on address is important for match accuracy, while results are less sensitive to name matching strictness. Among matched BRR records, the modal timing of the first match to the BR is in the year in which the BRR record was filed. We use two sets of software to identify matches: SAS DQ Match and a machine-learning algorithm described in Cuffe and Goldschlag (2018). We find preliminary evidence that while the ML-based method yields more match results, SAS DQ tends to result in higher accuracy rates. To conclude, we provide suggestions on how to proceed with matching other states’ data in light of our findings using these two states.
    Date: 2020–01
  12. By: Ayman Chaouki; Stephen Hardiman; Christian Schmidt; Emmanuel S\'eri\'e; Joachim de Lataillade
    Abstract: Can deep reinforcement learning algorithms be exploited as solvers for optimal trading strategies? The aim of this work is to test reinforcement learning algorithms on conceptually simple, but mathematically non-trivial, trading environments. The environments are chosen such that an optimal or close-to-optimal trading strategy is known. We study the deep deterministic policy gradient algorithm and show that such a reinforcement learning agent can successfully recover the essential features of the optimal trading strategies and achieve close-to-optimal rewards.
    Date: 2020–03
  13. By: Joseph,George; Gething,Peter William; Bhatt,Samir; Ayling,Sophie Charlotte Emi
    Abstract: Tanzania is home to the third highest population of stunted children in Sub-Saharan Africa, with about 2.7 million children under the age of five failing to reach their full potential of growth attainment compared with the reference population as per the World Health Organization standards. Several studies have shown that stunted growth during childhood entraps the future of children in a vicious circle of recurrent diseases, reduced human development, and lower earnings, thus increasing their likelihood of being poor when they grow up. To reduce stunting, the Government of Tanzania and development partners are introducing a convergence of multisectoral interventions adapted to local needs. However, the existing stunting data are representative only at higher administrative levels, thus making it difficult to implement these efforts. The paper uses the 2016 geo-referenced Demographic and Health Survey in conjunction with relevant spatially gridded covariate data, such as nighttime lights, water and sanitation access, vegetation index, travel time, and so on. Geospatial techniques, such as model-based statistics and Bayesian inference implemented using the INLA algorithm, along with appropriate model validation exercises are employed to develop high-resolution maps of stunting in Tanzania at 1×1-kilometer spatial resolution. The maps show that areas of consistently high stunting rates tend to be more common in rural parts of the country, especially throughout the western and southwestern border areas. There is high prevalence of low stunting in the urban areas around Dar es Salaam, Arusha, and Dodoma, as well as in the south of Lake Victoria.
    Keywords: Reproductive Health,Early Child and Children's Health,Health Care Services Industry,Hydrology,Inequality,Nutrition
    Date: 2019–01–03
  14. By: Frédérique Pallez (CGS i3 - Centre de Gestion Scientifique i3 - MINES ParisTech - École nationale supérieure des mines de Paris - PSL - PSL Research University - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Frédérique PALLEZ Le digital labor : humain, trop humain À propos de l'essai d'Antonio CASILLI, En attendant les robots. Enquête sur le travail du clic, Le Seuil, coll. « La couleur des idées », 2019 A paraître dans Gérer & Comprendre, 2019/4 L'intelligence artificielle est-elle l'avenir de l'humanité ? En tout cas, l'engouement qu'elle suscite masque une réalité qu'Antonio Casilli nous dévoile dans une enquête détaillée et passionnante : derrière le mythe d'activités entièrement automatisées et la peur de la « disparition du travail » se cachent en fait de nouvelles formes de travail, bien humain celui-ci, le digital labor, qui métamorphose le geste productif humain en une multitude de micro-opérations, sous-payées ou même non payées, nécessaires au traitement des données qui alimentent la nouvelle économie informationnelle que nous connaissons. Cette transformation du travail est inséparable du développement des plateformes. 2L'ouvrage est organisé en trois parties. La première (Quelle automation ?) analyse le lien entre le programme de l'intelligence artificielle et le paradigme des plateformes ; la deuxième décortique les formes que prend le digital labor à partir d'exemples ; la troisième cherche à penser théoriquement les phénomènes de surexploitation des individus et d'asymétrie économique mis au jour, pour fournir quelques pistes permettant de les dépasser. 3Dans la première partie, l'auteur réexamine le débat sur la possible disparition du travail humain, en nous montrant que c'est surtout sur un plan qualitatif plus que quantitatif que l'automation fait sentir ses effets : elle aboutit à une standardisation et une externalisation des tâches, par ailleurs de plus en plus fragmentées. Des milliards de petites mains, selon l'auteur, bien loin des experts informaticiens que l'on imagine, interviennent quotidiennement, parfois à leur insu (comme vous et moi), pour assister, maintenir, contrôler, entraîner des machines qui ne peuvent fonctionner sans elles. Et les plateformes, formes hybrides entre entreprise et marché, captent ainsi la valeur produite par les mécanismes de coordination multiface qu'elles organisent entre les divers types d'utilisateurs. Ceux-ci contribuent à cette création de valeur sous trois formes : le travail à la demande, le micro-travail, et le travail social en réseau. 4Ces trois formes sont analysées successivement dans la deuxième partie, qui en démonte les mécanismes, exemples connus de tous à l'appui. La plateforme Uber, qui illustre le premier type, le travail à la demande, met ainsi au travail non seulement les chauffeurs (qui font beaucoup plus que conduire leur véhicule) mais aussi les usagers, qui, chacun de leur côté, se livrent à un intense travail de qualification (notation des chauffeurs, renseignement de profils…), de production de données monétisables (temps de trajet, par exemple) qui seront réutilisés par la plateforme. 5Le micro travail correspond, lui, à des activités paradoxalement simples, comme l'annotation de vidéos, le tri d'adresses, le contrôle de documents…, que les machines ne peuvent effectuer efficacement elles-mêmes. La plateforme de micro-travail d'Amazon, Mechanical Turk, en est une illustration frappante : elle met en relation une entreprise « requérante », cherchant par exemple à trier des milliers de pages d'archives manuscrites, et
    Date: 2019

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.