nep-big 2017-12-18 papers

on Big Data

Issue of 2017‒12‒18
ten papers chosen by
Tom Coupé
University of Canterbury

Planning Ahead for Better Neighborhoods: Long Run Evidence from Tanzania By Ferdinand Rauch; Guy Michaels; Dzhamilya Nigmatulina; Tanner Regan; Neeraj Baruah; Amanda Dahlstrand-Rudin
The Effect of Positive Mood on Cooperation in Repeated Interaction By Proto, Eugenio; Sgroi, Daniel; Nazneen, Mahnaz
Beyond Early Warning Indicators: High School Dropout and Machine Learning By Dario Sansone
Spatial Patterns of Development: A Meso Approach By Michalopoulos, Stelios; Papaioannou, Elias
Spatial Patterns of Development: A Meso Approach By Stelios Michalopoulos; Elias Papaioannou
Analysing and predicting micro-location patterns of software firms By Kinne, Jan; Resch, Bernd
Aggregating Google Trends: Multivariate Testing and Analysis By Stephen L. France; Yuying Shi
Regulatory Learning: how to supervise machine learning models? An application to credit scoring By Dominique Guegan; Bertrand Hassani
Probabilistic Forecasting of Thunderstorms in the Eastern Alps By Thorsten Simon; Peter Fabsic; Georg J. Mayr; Nikolaus Umlauf; Achim Zeileis
Human Capital/Human Intelligence and Neuromorphic Artificial Intelligence: In pursuit of the relevant intelligence concept (Japanese) By ITO Kazuyori

Planning Ahead for Better Neighborhoods: Long Run Evidence from Tanzania

By:	Ferdinand Rauch; Guy Michaels; Dzhamilya Nigmatulina; Tanner Regan; Neeraj Baruah; Amanda Dahlstrand-Rudin
Abstract:	Abstract What are the long run consequences of planning and providing basic infrastructure in neighborhoods, where people build their own homes? We study "Sites and Services" projects implemented in seven Tanzanian cities during the 1970s and 1980s, half of which provided infrastructure in previously unpopulated areas (de novo neighborhoods), while the other half upgraded squatter settlements. Using satellite images and surveys from the 2010s, we find that de novo neighborhoods developed better housing than adjacent residential areas (control areas) that were also initially unpopulated. Specifically, de novo neighborhood are more orderly and their buildings have larger footprint areas and are more likely to have multiple stories, as well as connections to electricity and water, basic sanitation and access to roads. And though de novo neighborhoods generally attracted better educated residents than control areas, the educational difference is too small to account for the large difference in residential quality that we find. While we have no natural counterfactual for the upgrading areas, descriptive evidence suggests that they are if anything worse than the control areas.
Keywords:	Urban Economics, Economic Development, Slums, Africa.
JEL:	R31 O18 R14
Date:	2017–09–20
URL:	http://d.repec.org/n?u=RePEc:oxf:wpaper:834&r=big

The Effect of Positive Mood on Cooperation in Repeated Interaction

By:	Proto, Eugenio (University of Warwick, CAGE and IZA); Sgroi, Daniel (University of Warwick, CAGE and Nuffield College, University of Oxford); Nazneen, Mahnaz (University of Warwick)
Abstract:	Existing research supports two opposing mechanisms through which positive mood might affect cooperation. Some studies have suggested that positive mood produces more altruistic, open and helpful behavior, fostering cooperation. However, there is contrasting research supporting the idea that positive mood produces more assertiveness and inward-orientation and reduced use of information, hampering cooperation. We find evidence that suggests the second hypothesis dominates when playing the repeated Prisoner’s Dilemma. Players in an induced positive mood tend to cooperate less than players in a neutral mood setting. This holds regardless of uncertainty surrounding the number of repetitions or whether pre-play communication has taken place. This finding is consistent with a text analysis of the pre-play communication between players indicating that subjects in a more positive mood use more inward-oriented, more negative and less positive language. To the best of our knowledge we are the first to use text analysis in pre-play communication.
Keywords:	JEL Classification:
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:cge:wacage:347&r=big

Beyond Early Warning Indicators: High School Dropout and Machine Learning

By:	Dario Sansone (Department of Economics, Georgetown University)
Abstract:	This paper provides an algorithm to predict which students are going to drop out of high schools relying only on information from 9th grade. It verifies that using a parsimonious early warning system - as implemented in many schools - leads to poor results. It shows that schools can obtain more precise predictions by exploiting the available high-dimensional data jointly with machine learning tools such as Support Vector Machine, Boosted Regression and Post-LASSO. It carefully selects goodness-of-fit criteria based on the context and the underlying theoretical framework: model parameters are calibrated by taking into account policy goals and budget constraints. Finally, it uses unsupervised machine learning to divide students at risk of dropping out into different clusters.
Keywords:	High School Dropout, Machine Learning, Big Data
JEL:	C53 I20
URL:	http://d.repec.org/n?u=RePEc:geo:guwopa:gueconwpa~17-17-09&r=big

Spatial Patterns of Development: A Meso Approach

By:	Michalopoulos, Stelios (Federal Reserve Bank of Minneapolis); Papaioannou, Elias (London Business School)
Abstract:	Over the last two decades, the literature on comparative development has moved from country-level to within-country analyses. The questions asked have expanded, as economists have used satellite images of light density at night and other big spatial data to proxy for development at the desired level. The focus has also shifted from uncovering correlations to identifying causal relations, using elaborate econometric techniques including spatial regression discontinuity designs. In this survey we show how the combination of geographic information systems with insights from disciplines ranging from the earth sciences to linguistics and history has transformed the research landscape on the roots of the spatial patterns of development. We discuss the limitations of the luminosity data and associated econometric techniques and conclude by offering some thoughts on future research.
Keywords:	Development; Language; Ethnicity; History; Borders; Luminosity; Regression discontinuity; Regions
JEL:	N00 N9 O10 O43 O55
Date:	2017–12–11
URL:	http://d.repec.org/n?u=RePEc:fip:fedmoi:0004&r=big

Spatial Patterns of Development: A Meso Approach

By:	Stelios Michalopoulos; Elias Papaioannou
Abstract:	Over the last two decades, the literature on comparative development has moved from country-level to within-country analyses. The questions asked have expanded, as economists have used satellite images of light density at night and other big spatial data to proxy for development at the desired level. The focus has also shifted from uncovering correlations to identifying causal relations, using elaborate econometric techniques including spatial regression discontinuity designs. In this survey we show how the combination of geographic information systems with insights from disciplines ranging from the earth sciences to linguistics and history has transformed the research landscape on the roots of the spatial patterns of development. We discuss the limitations of the luminosity data and associated econometric techniques and conclude by offering some thoughts on future research.
JEL:	D0 N0 O0 Z1
Date:	2017–11
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:24088&r=big

Analysing and predicting micro-location patterns of software firms

By:	Kinne, Jan; Resch, Bernd
Abstract:	While the effects of non-geographic aggregation on inference are well studied in economics, research on geographic aggregation is rather scarce. This knowledge gap together with the use of aggregated spatial units in previous firm location studies result in a lack of understanding of firm location determinants at the microgeographic level. Suitable data for microgeographic location analysis has become available only recently through the emergence of Volunteered Geographic Information (VGI), especially the OpenStreetMap (OSM) project, and the increasing availability of official (open) geodata. In this paper, we use a comprehensive dataset of three million street-level geocoded firm observations to explore the location pattern of software firms in an Exploratory Spatial Data Analysis (ESDA). Based on the ESDA results, we develop a software firm location prediction model using Poisson regression and OSM data. Our findings demonstrate that the model yields plausible predictions and OSM data is suitable for microgeographic location analysis. Our results also show that non-aggregated data can be used to detect information on location determinants, which are superimposed when aggregated spatial units are analysed, and that some findings of previous firm location studies are not robust at the microgeographic level. However, we also conclude that the lack of high-resolution geodata on socio-economic population characteristics causes systematic prediction errors, especially in cities with diverse and segregated populations.
Keywords:	Firm Location,Location Factors,Software Industry,Microgeography,OpenStreetMap (OSM),Prediction,Volunteered Geographic Information (VGI)
JEL:	R12 L86 R30
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:zbw:zewdip:17063&r=big

Aggregating Google Trends: Multivariate Testing and Analysis

By:	Stephen L. France; Yuying Shi
Abstract:	Google Trends is a valuable source of econnomic information. Previous studies have utilized Google Trends data for economic forcasting. We expand this work by providing algorithms to combine and aggregate Trends data and show how time series clustering can be used to analyze competition.
Date:	2017–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1712.03152&r=big

Regulatory Learning: how to supervise machine learning models? An application to credit scoring

By:	Dominique Guegan (CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, Labex ReFi - Université Paris1 - Panthéon-Sorbonne); Bertrand Hassani (CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, Labex ReFi - Université Paris1 - Panthéon-Sorbonne)
Abstract:	The arrival of big data strategies is threatening the lastest trends in financial regulation related to the simplification of models and the enhancement of the comparability of approaches chosen by financial institutions. Indeed, the intrinsic dynamic philosophy of Big Data strategies is almost incompatible with the current legal and regulatory framework as illustrated in this paper. Besides, as presented in our application to credit scoring, the model selection may also evolve dynamically forcing both practitioners and regulators to develop libraries of models, strategies allowing to switch from one to the other as well as supervising approaches allowing financial institutions to innovate in a risk mitigated environment. The purpose of this paper is therefore to analyse the issues related to the Big Data environment and in particular to machine learning models highlighting the issues present in the current framework confronting the data flows, the model selection process and the necessity to generate appropriate outcomes.
Keywords:	Regulation,AUC,Machine Learning,Big Data,Credit Scoring
Date:	2017–09
URL:	http://d.repec.org/n?u=RePEc:hal:cesptp:halshs-01592168&r=big

Probabilistic Forecasting of Thunderstorms in the Eastern Alps

By:	Thorsten Simon; Peter Fabsic; Georg J. Mayr; Nikolaus Umlauf; Achim Zeileis
Abstract:	A probabilistic forecasting method to predict thunderstorms in the European Eastern Alps is developed. A statistical model links lightning occurrence from the ground-based ALDIS detection network to a large set of direct and derived variables from a numerical weather prediction (NWP) system. The NWP system is the high resolution run (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF). The statistical model is a generalized additive model (GAM) framework, which is estimated by Markov chain Monte Carlo (MCMC) simulation. Gradient boosting with stability selection serves as a tool for selecting a stable set of potentially nonlinear terms. Three grids from 64×64 km 2 to 16×16 km 2 and 5 forecasts horizons from 5 to 1 day ahead are investigated to predict thunderstorms during afternoons (1200 UTC to 1800 UTC). Frequently selected covariates for the nonlinear terms are variants of convective precipitation, convective potential available energy, relative humidity and temperature in the mid layers of the troposphere, among others. All models, even for a lead time of five days, outperform a forecast based on climatology in an out-of-sample comparison. An example case illustrates that coarse spatial patterns are already successfully forecast five days ahead.
Keywords:	lightning detection data, statistical post-processing, generalized additive models, gradient boosting, stability selection, MCMC
JEL:	C11 C53 Q54
Date:	2017–12
URL:	http://d.repec.org/n?u=RePEc:inn:wpaper:2017-25&r=big

Human Capital/Human Intelligence and Neuromorphic Artificial Intelligence: In pursuit of the relevant intelligence concept (Japanese)

By:	ITO Kazuyori
Abstract:	There are two current types of artificial intelligence (AI): big data-driven AI (BD-AI) which is currently at the height of its influence and neuromorphic AI (NM-AI) which is expected to be quite prosperous but still lacks practicality. The first objective of this paper is to consider, from the perspective of computer science, semiconductor integrated circuits, and neuroscience as well as economics, why we particularly need to pay attention now to the latter NM-AI which is expected to be the core of AI in the mid- and long-term. Moreover, based on such consideration, we try to clarify what is the intelligence embodied by NM-AI and BD-AI, and discuss the complementarity or substitutability between human capital (HC)/human intelligence (HI) and the future completed version of NM-AI beyond the current BD-AI. More concretely, we try to reclassify and subdivide, in a non-behavioristic way, the suitcase-like word of intelligence which is full of ambiguities. Furthermore, based on the discussion, we try to understand what kinds of (non-) inclusion relation exist between both types of intelligence by referring to "response capabilities to change-causing or unusual situations" and their self-evolvability. In doing this, we will especially take up the following three viewpoints: HP/HI as a social network, the role of emotion as the fast perspective-switching device to cope with change-causing or unusual situations, and the role of emotion as the community forming device to create a wide range of cooperation among people with common knowledge/cultures as well as their diverse mutual intentions.
Date:	2017–11
URL:	http://d.repec.org/n?u=RePEc:eti:rpdpjp:17031&r=big

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.