nep-big 2017-10-22 papers

on Big Data

Issue of 2017–10–22
six papers chosen by
Tom Coupé, University of Canterbury

Planning Ahead for Better Neighborhoods: Long Run Evidence from Tanzania By Michaels, Guy; Nigmatulina, Dzhamilya; Rauch, Ferdinand; Regan, Tanner; Baruah, Neeraj; Dahlstrand-Rudin, Amanda
Machine Learning from Schools about Energy Efficiency By Fiona Burlig; Christopher Knittel; David Rapson; Mar Reguant; Catherine Wolfram
Personal Privacy of HMDA in a World of Big Data By Anthony Yezer
Forecasting Across Time Series Databases using Long Short-Term Memory Networks on Groups of Similar Series By Kasun Bandara; Christoph Bergmeir; Slawek Smyl
Asymptotic Expansion as Prior Knowledge in Deep Learning Method for high dimensional BSDEs By Masaaki Fujii; Akihiko Takahashi; Masayuki Takahashi
Data Governance – Einordnung, Konzepte und aktuelle Herausforderungen By Alrik Brüning; Peter Gluchowski; Andre Kaiser

Planning Ahead for Better Neighborhoods: Long Run Evidence from Tanzania

By:	Michaels, Guy (London School of Economics); Nigmatulina, Dzhamilya (London School of Economics); Rauch, Ferdinand (University of Oxford); Regan, Tanner (London School of Economics); Baruah, Neeraj (London School of Economics); Dahlstrand-Rudin, Amanda (London School of Economics)
Abstract:	What are the long run consequences of planning and providing basic infrastructure in neighborhoods, where people build their own homes? We study "Sites and Services" projects implemented in seven Tanzanian cities during the 1970s and 1980s, half of which provided infrastructure in previously unpopulated areas (de novo neighborhoods), while the other half upgraded squatter settlements. Using satellite images and surveys from the 2010s, we find that de novo neighborhoods developed better housing than adjacent residential areas (control areas) that were also initially unpopulated. Specifically, de novo neighborhood are more orderly and their buildings have larger footprint areas and are more likely to have multiple stories, as well as connections to electricity and water, basic sanitation and access to roads. And though de novo neighborhoods generally attracted better educated residents than control areas, the educational difference is too small to account for the large difference in residential quality that we find. While we have no natural counterfactual for the upgrading areas, descriptive evidence suggests that they are if anything worse than the control areas.
Keywords:	urban economics, economic development, slums, Africa
JEL:	R31 O18 R14
Date:	2017–09
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp11036

Machine Learning from Schools about Energy Efficiency

By:	Fiona Burlig; Christopher Knittel; David Rapson; Mar Reguant; Catherine Wolfram
Abstract:	In the United States, consumers invest billions of dollars annually in energy efficiency, often on the assumption that these investments will pay for themselves via future energy cost reductions. We study energy efficiency upgrades in K-12 schools in California. We develop and implement a novel machine learning approach for estimating treatment effects using high-frequency panel data, and demonstrate that this method outperforms standard panel fixed effects approaches. We find that energy efficiency upgrades reduce electricity consumption by 3 percent, but that these reductions total only 24 percent of ex ante expected savings. HVAC and lighting upgrades perform better, but still deliver less than half of what was expected. Finally, beyond location, school characteristics that are readily available to policymakers do not appear to predict realization rates across schools, suggesting that improving realization rates via targeting may prove challenging.
JEL:	C14 L9 Q41
Date:	2017–10
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:23908

Personal Privacy of HMDA in a World of Big Data

By:	Anthony Yezer (George Washington University)
Abstract:	When the Home Mortgage Disclosure Act was passed in 1975, it required selected depository institutions to report limited data from mortgage applications. This was collected and processed by the Federal Reserve Board in accordance with Regulation C. A subset of the reported information was then disclosed to the public. At the time, it was difficult to determine the identity of individual respondents in HMDA data. Since that time four things have changed. First, reporting requirements have been expanded to an increasing range of lenders. Second, the personal information reported and revealed has expanded. Third, over 30% of home purchases do not involve a HMDA reported mortgage and mortgage lending is increasingly internet based. Fourth, modern computing and big data techniques now allow the HMDA data releases to be matched with the names of individual borrowers in a fashion that violates standards for privacy established by the U.S. Bureau of the Census and appears to violate privacy standards of HMDA itself. Lack of privacy is particularly a problem for minority borrowers for whom the â€œriskâ€ of re-identification is a virtual certainty.
Date:	2017
URL:	https://d.repec.org/n?u=RePEc:gwi:wpaper:2017-21

Forecasting Across Time Series Databases using Long Short-Term Memory Networks on Groups of Similar Series

By:	Kasun Bandara; Christoph Bergmeir; Slawek Smyl
Abstract:	With the advent of Big Data, nowadays in many applications databases containing large quantities of similar time series are available. Forecasting time series in these domains with traditional univariate forecasting procedures leaves great potentials for producing accurate forecasts untapped. Recurrent neural networks, and in particular Long Short-Term Memory (LSTM) networks have proven recently that they are able to outperform state-of-the-art univariate time series forecasting methods in this context, when trained across all available time series. However, if the time series database is heterogeneous accuracy may degenerate, so that on the way towards fully automatic forecasting methods in this space, a notion of similarity between the time series needs to be built into the methods. To this end, we present a prediction model using LSTMs on subgroups of similar time series, which are identified by time series clustering techniques. The proposed methodology is able to consistently outperform the baseline LSTM model, and it achieves competitive results on benchmarking datasets, in particular outperforming all other methods on the CIF2016 dataset.
Date:	2017–10
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1710.03222

Asymptotic Expansion as Prior Knowledge in Deep Learning Method for high dimensional BSDEs

By:	Masaaki Fujii; Akihiko Takahashi; Masayuki Takahashi
Abstract:	We demonstrate that the use of asymptotic expansion as prior knowledge in the "deep BSDE solver", which is a deep learning method for high dimensional BSDEs proposed by Weinan E, Han & Jentzen (2017), drastically reduces the loss function and accelerates the speed of convergence. We illustrate the technique and its implications using Bergman's model with different lending and borrowing rates and a class of quadratic-growth BSDEs.
Date:	2017–10
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1710.07030

Data Governance – Einordnung, Konzepte und aktuelle Herausforderungen

By:	Alrik Brüning (Chemnitz University of Technology, Department of Economics, Professur Wirtschaftsinformatik II – Systementwicklung und Anwendungssysteme); Peter Gluchowski (Chemnitz University of Technology, Department of Economics, Professur Wirtschaftsinformatik II – Systementwicklung und Anwendungssysteme); Andre Kaiser (Chemnitz University of Technology, Department of Economics, Professur Wirtschaftsinformatik II – Systementwicklung und Anwendungssysteme)
Abstract:	Es existiert inzwischen eine ganze Reihe an Konzepten zur Data Governance mit Vorschlägen zur Ausgestaltung und konkreten Umsetzung im Unternehmen. Die jeweiligen Ansätze ordnen Data Governance unterschiedlich in die datenbezogenen Funktionen der Unternehmung ein und sehen auch für die Zuweisung von Aufgaben und Kompetenzen voneinander abweichende Alternativen vor. Wichtig ist dabei, dass Begriffe wie Data Governance, IT-Governance, Datenmanagement oder Datenqualitätsmanagement zwar in Verbindung stehen können, allerdings sehr differenzierte Funktionen mit sich bringen. Ziel des Beitrags ist darum zunächst, die Data Governance begrifflich einzuordnen und im Folgenden ausgewählte Konzepte vorzustellen. Diese können schließlich hinsichtlich der Stellung von Data Governance im Unternehmen vergleichen werden. Da die Bedeutung von Daten in Unternehmen sehr stark gestiegen und im Zuge der Digitalen Transformation und Industrie 4.0 ein weiterer Bedeutungszuwachs zu erwarten ist, beleuchtet der Beitrag schließlich auch aktuelle Herausforderungen für die Data Governance.
Keywords:	Data Governance, Datenmanagement, Digitale Transformation
Date:	2017–10
URL:	https://d.repec.org/n?u=RePEc:tch:wpaper:cep015

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.