nep-big 2017-08-27 papers

on Big Data

Issue of 2017–08–27
ten papers chosen by
Tom Coupé, University of Canterbury

Inside Job or Deep Impact? Using Extramural Citations to Assess Economic Scholarship By Joshua Angrist; Pierre Azoulay; Glenn Ellison; Ryan Hill; Susan Feng Lu
Fake News in Social Networks By Christoph Aymanns; Jakob Foerster; Co-Pierre Georg
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach By Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
Improving the forecasts of European regional banks' profitability with machine learning algorithms By Haskamp, Ulrich
Forecasting day-ahead electricity prices in Europe: the importance of considering market integration By Jesus Lago; Fjo De Ridder; Peter Vrancx; Bart De Schutter
Importance of the long-term seasonal component in day-ahead electricity price forecasting revisited: Neural network models By Grzegorz Marcjasz; Bartosz Uniejewski; Rafal Weron
On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments By Windmeijer, F.; Farbmacher, H.; Davies, N.; Davey Smith, G.;
DGM: A deep learning algorithm for solving partial differential equations By Justin Sirignano; Konstantinos Spiliopoulos
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach By Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
Special Report: The Potential of Wastewater Testing for Public Health and Safety By Aparna Keshaviah; Editor

Inside Job or Deep Impact? Using Extramural Citations to Assess Economic Scholarship

By:	Joshua Angrist; Pierre Azoulay; Glenn Ellison; Ryan Hill; Susan Feng Lu
Abstract:	Does academic economic research produce material of scientific value, or are academic economists writing only for clients and peers? Is economics scholarship uniquely insular? We address these questions by quantifying interactions between economics and other disciplines. Changes in the impact of economic scholarship are measured here by the way other disciplines cite us. We document a clear rise in the extramural influence of economic research, while also showing that economics is increasingly likely to reference other social sciences. A breakdown of extramural citations by economics fields shows broad field impact. Differentiating between theoretical and empirical papers classified using machine learning, we see that much of the rise in economics’ extramural influence reflects growth in citations to empirical work. This parallels a growing share of empirical cites within economics. At the same time, the disciplines of computer science and operations research are mostly influenced by economic theory.
JEL:	A11 A12 A13 A14 B41 C18
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:23698

Fake News in Social Networks

By:	Christoph Aymanns; Jakob Foerster; Co-Pierre Georg
Abstract:	We model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1708.06233

Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach

By:	Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
Abstract:	We systematically investigate the effect heterogeneity of job search programmes for unemployed workers. To investigate possibly heterogeneous employment effects, we combine non-experimental causal empirical models with Lasso-type estimators. The empirical analyses are based on rich administrative data from Swiss social security records. We find considerable heterogeneities only during the first six months after the start of training. Consistent with previous results of the literature, unemployed persons with fewer employment opportunities profit more from participating in these programmes. Furthermore, we also document heterogeneous employment effects by residence status. Finally, we show the potential of easy-to-implement programme participation rules for improving average employment effects of these active labour market programmes.
Keywords:	active labour market policy; conditional average treatment effects; individualized treatment effects; Machine Learning
JEL:	C21 H43 J68
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:cpr:ceprdp:12224

Improving the forecasts of European regional banks' profitability with machine learning algorithms

By:	Haskamp, Ulrich
Abstract:	Regional banks as savings and cooperative banks are widespread in continental Europe. In the aftermath of the financial crisis, however, they had problems keeping their profitability which is an important quantitative indicator for the health of a bank and the banking sector overall. We use a large data set of bank-level balance sheet items and regional economic variables to forecast protability for about 2,000 regional banks. Machine learning algorithms are able to beat traditional estimators as ordinary least squares as well as autoregressive models in forecasting performance.
Keywords:	profitability,regional banking,forecasting,machine learning
JEL:	C53 G21
Date:	2017
URL:	https://d.repec.org/n?u=RePEc:zbw:rwirep:705

Forecasting day-ahead electricity prices in Europe: the importance of considering market integration

By:	Jesus Lago; Fjo De Ridder; Peter Vrancx; Bart De Schutter
Abstract:	Motivated by the increasing integration among electricity markets, in this paper we propose three different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance. First, we propose a deep neural network that considers features from connected markets to improve the predictive accuracy in a local market. To measure the importance of these features, we propose a novel feature selection algorithm that, by using Bayesian optimization and functional analysis of variance, analyzes the effect of the features on the algorithm performance. In addition, using market integration, we propose a second model that, by simultaneously predicting prices from two markets, improves even further the forecasting accuracy. Finally, we present a third model to predict the probability of price spikes; then, we use it as an input in the other two forecasters to detect spikes. As a case study, we consider the electricity market in Belgium and the improvements in forecasting accuracy when using various French electricity features. In detail, we show that the three proposed models lead to improvements that are statistically significant. Particularly, due to market integration, predictive accuracy is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage error). In addition, we also show that the proposed feature selection algorithm is able to perform a correct assessment, i.e. to discard the irrelevant features.
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1708.07061

Importance of the long-term seasonal component in day-ahead electricity price forecasting revisited: Neural network models

By:	Grzegorz Marcjasz; Bartosz Uniejewski; Rafal Weron
Abstract:	In day-ahead electricity price forecasting the daily and weekly seasonalities are always taken into account, but the long-term seasonal component was believed to add unnecessary complexity and in most studies ignored. The recent introduction of the Seasonal Component AutoRegressive (SCAR) modeling framework has changed this viewpoint. However, the latter is based on linear models estimated using Ordinary Least Squares. Here we show that considering non-linear neural network-type models with the same inputs as the corresponding SCAR model can lead to a yet better performance. While individual Seasonal Component Artificial Neural Network (SCANN) models are generally worse than the corresponding SCAR-type structures, we provide empirical evidence that committee machines of SCANN networks can significantly outperform the latter.
Keywords:	Electricity spot price; Forecasting; Day-ahead market; Long-term seasonal component; Neural network; Committee machine
JEL:	C14 C22 C45 C51 C53 Q47
Date:	2017–07–29
URL:	https://d.repec.org/n?u=RePEc:wuu:wpaper:hsc1703

On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments

By:	Windmeijer, F.; Farbmacher, H.; Davies, N.; Davey Smith, G.;
Abstract:	We investigate the behaviour of the Lasso for selecting invalid instruments in linear instrumental variables models for estimating causal effects of exposures on outcomes, as proposed recently by Kang, Zhang, Cai and Small (2016, Journal of the American Statistical Association).Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. We show that for this setup, the Lasso may not select all invalid instruments in large samples if they are relatively strong. Consistent selection also depends on the correlation structure of the instruments. We propose a median estimator that is consistent when less than 50% of the instruments are invalid, but its consistency does not depend on the relative strength of the instruments or their correlation structure. This estimator can therefore be used for adaptive Lasso estimation. The methods are applied to a Mendelian randomisation study to estimate the causal effect of BMI on diastolic blood pressure using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI.
Keywords:	causal inference; instrumental variables estimation; invalid instruments; Lasso; Mendelian randomisation;
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:yor:hectdg:17/22

DGM: A deep learning algorithm for solving partial differential equations

By:	Justin Sirignano; Konstantinos Spiliopoulos
Abstract:	High-dimensional PDEs have been a longstanding computational challenge. We propose a deep learning algorithm similar in spirit to Galerkin methods, using a deep neural network instead of linear combinations of basis functions. The PDE is approximated with a deep neural network, which is trained on random batches of spatial points to satisfy the differential operator and boundary conditions. The algorithm is mesh-less, which is key since meshes become infeasible in higher dimensions. Instead of forming a mesh, sequences of spatial points are randomly sampled. We implement the approach for American options (a type of free-boundary PDE which is widely used in finance) in up to 100 dimensions. We call the algorithm a "Deep Galerkin Method (DGM)".
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1708.07469

Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach

By:	Knaus, Michael C.; Lechner, Michael; Strittmatter, Anthony
Abstract:	We systematically investigate the effect heterogeneity of job search programmes for unemployed workers. To investigate possibly heterogeneous employment effects, we combine non-experimental causal empirical models with Lasso-type estimators. The empirical analyses are based on rich administrative data from Swiss social security records. We find considerable heterogeneities only during the first six months after the start of training. Consistent with previous results of the literature, unemployed persons with fewer employment opportunities profit more from participating in these programmes. Furthermore, we also document heterogeneous employment effects by residence status. Finally, we show the potential of easy-to-implement programme participation rules for improving average employment effects of these active labour market programmes.
Keywords:	Machine learning, individualized treatment effects, conditional average treatment effects, active labour market policy
JEL:	J68 H43 C21
Date:	2017–08
URL:	https://d.repec.org/n?u=RePEc:usg:econwp:2017:11

Special Report: The Potential of Wastewater Testing for Public Health and Safety

By:	Aparna Keshaviah; Editor
Abstract:	This report synthesizes research and recommendations from Mathematicaâ€™s symposium on â€œThe Potential of Wastewater Testing for Public Health and Safety."
Keywords:	wastewater, testing, substance use, opioids, Arnold Foundation, Advanced analytics, Machine learning, Public health, public safety
JEL:	I
URL:	https://d.repec.org/n?u=RePEc:mpr:mprres:5a867fbc382040a1af74f957b565fd98

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.