nep-big 2017-09-10 papers

on Big Data

Issue of 2017‒09‒10
ten papers chosen by
Tom Coupé
University of Canterbury

Economic Predictions with Big Data: The Illusion Of Sparsity By Giannone, Domenico; Lenza, Michele; Primiceri, Giorgio E
Machine learning at central banks By Chakraborty, Chiranjit; Joseph, Andreas
Machine learning to improve experimental design By Aufenanger, Tobias
Exploring the Potential of Machine Learning for Automatic Slum Identification from VHE Imagery By Duque, Juan Carlos; Patino, Jorge Eduardo; Betancourt, Alejandro
Between disciplines and experience By Paris Chrysos
Reactivity in Economic Science By Bruno S. Frey
Tensor Representation in High-Frequency Financial Data for Price Change Prediction By Dat Tran Thanh; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
Employee turnover prediction and retention policies design: a case study By Edouard Ribes; Karim Touahri; Benoît Perthame
Die Medienökonomik personalisierter Daten und der Facebook-Fall By Budzinski, Oliver; Grusevaja, Marina
Wettbewerbsregeln für das Digitale Zeitalter - Die Ökonomik personalisierter Daten, Verbraucherschutz und die 9. GWB-Novelle By Budzinski, Oliver

Economic Predictions with Big Data: The Illusion Of Sparsity

By:	Giannone, Domenico; Lenza, Michele; Primiceri, Giorgio E
Abstract:	We compare sparse and dense representations of predictive models in macroeconomics, microeconomics and finance. To deal with a large number of possible predictors, we specify a "spike-and-slab" prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse or dense model but on a wide set of models. A clearer pattern of sparsity can only emerge when models of very low dimension are strongly favored a priori.
Date:	2017–08
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:12256&r=big

Machine learning at central banks

By:	Chakraborty, Chiranjit (Bank of England); Joseph, Andreas (Bank of England)
Abstract:	We introduce machine learning in the context of central banking and policy analyses. Our aim is to give an overview broad enough to allow the reader to place machine learning within the wider range of statistical modelling and computational analyses, and provide an idea of its scope and limitations. We review the underlying technical sources and the nascent literature applying machine learning to economic and policy problems. We present popular modelling approaches, such as artificial neural networks, tree-based models, support vector machines, recommender systems and different clustering techniques. Important concepts like the bias-variance trade-off, optimal model complexity, regularisation and cross-validation are discussed to enrich the econometrics toolbox in their own right. We present three case studies relevant to central bank policy, financial regulation and economic modelling more widely. First, we model the detection of alerts on the balance sheets of financial institutions in the context of banking supervision. Second, we perform a projection exercise for UK CPI inflation on a medium-term horizon of two years. Here, we introduce a simple training-testing framework for time series analyses. Third, we investigate the funding patterns of technology start-ups with the aim to detect potentially disruptive innovators in financial technology. Machine learning models generally outperform traditional modelling approaches in prediction tasks, while open research questions remain with regard to their causal inference properties.
Keywords:	Machine learning; artificial intelligence; big data; econometrics; forecasting; inflation; financial markets; banking supervision; financial technology
JEL:	A12 A33 C14 C38 C44 C45 C51 C52 C53 C54 C61 C63 C87 E37 E58 G17 Y20
Date:	2017–09–04
URL:	http://d.repec.org/n?u=RePEc:boe:boeewp:0674&r=big

Machine learning to improve experimental design

By:	Aufenanger, Tobias
Abstract:	This paper proposes a way of using observational pretest data for the design of experiments. In particular, this paper suggests to train a random forest on the pretest data and to stratify the allocation of treatments to experimental units on the predicted dependent variables. This approach reduces much of the arbitrariness involved in defining strata directly on the basis of covariates. A simulation on 300 random samples drawn from six data sets shows that this algorithm is extremely effective in increasing power compared to random allocation and to traditional ways of stratification. In more than 80% of all samples the estimated variance of the treatment estimator is lower and the estimated power is higher than for standard designs such as complete randomization, conventional stratification or Mahalanobis matching.
Keywords:	experiment design,treatment allocation
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:zbw:iwqwdp:162017&r=big

Exploring the Potential of Machine Learning for Automatic Slum Identification from VHE Imagery

By:	Duque, Juan Carlos; Patino, Jorge Eduardo; Betancourt, Alejandro
Abstract:	Slum identification in urban settlements is a crucial step in the process of formulation of propoor policies. However, the use of conventional methods for slums detection such as field surveys may result time consuming and costly. This paper explores the possibility of implementing a low-cost standardized method for slum detection. We use spectral, texture and structural features extracted from very high spatial resolution imagery as input data and evaluate the capability of three machine learning algorithms (Logistic Regression, Support Vector Machine and Random Forest) to classify urban areas as slum or no-slum. Using data from Buenos Aires (Argentina), Medellin (Colombia), and Recife (Brazil), we found that Support Vector Machine with radial basis kernel deliver the best performance (over 0.81). We also found that singularities within cities preclude the use of a unified classification model.
Keywords:	Ciudades, Desarrollo urbano, Economía, Equidad e inclusión social, Georreferenciación, Investigación socioeconómica, Pobreza, Políticas públicas, Servicios públicos, Vivienda,
Date:	2016
URL:	http://d.repec.org/n?u=RePEc:dbl:dblwop:975&r=big

Between disciplines and experience

By:	Paris Chrysos (ISC PARIS)
Abstract:	What do we see when we look at data? This recurrent question when confronted to Big Data, is largely answered by two different disciplinary visions dominating the debate during the last years, concluding that data are raw and Big Data is a hubris. While disciplines still treat just a part of them, we experience a wider and wider spreading of Big Data. The notion of “monuments of cyberspace” discussed here helps understand their peculiar nature and delimit related issues of method, of wealth and of experience.
Keywords:	Big data,Disciplines,Experience
Date:	2017–03–23
URL:	http://d.repec.org/n?u=RePEc:hal:journl:halshs-01498296&r=big

Reactivity in Economic Science

By:	Bruno S. Frey
Abstract:	There is a fundamental difference between the natural and the social sciences due to reactivity. This difference remains even in the age of Artificially Intelligent Learning Machines and Big Data. Many academic economists take it as a matter of course that economics should become a natural science. Such a characterization misses an essential aspect of a social science, namely reactivity, i.e. human beings systematically respond to economic data, and in particular to interventions by economic policy, in a foreseeable way. To illustrate this finding, I use three examples from quite different fields: Happiness policy, World Heritage policy, and Science policy.
Keywords:	Economics; Social; Natural Science; Reactivity; Data; Happiness; Economic Policy
JEL:	A10 B40 C70 C80 D80 Z10
Date:	2017–08
URL:	http://d.repec.org/n?u=RePEc:cra:wpaper:2017-10&r=big

Tensor Representation in High-Frequency Financial Data for Price Change Prediction

By:	Dat Tran Thanh; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
Abstract:	Nowadays, with the availability of massive amount of trade data collected, the dynamics of the financial markets pose both a challenge and an opportunity for high frequency traders. In order to take advantage of the rapid, subtle movement of assets in High Frequency Trading (HFT), an automatic algorithm to analyze and detect patterns of price change based on transaction records must be available. The multichannel, time-series representation of financial data naturally suggests tensor-based learning algorithms. In this work, we investigate the effectiveness of two multilinear methods for the mid-price prediction problem against other existing methods. The experiments in a large scale dataset which contains more than 4 millions limit orders show that by utilizing tensor representation, multilinear models outperform vector-based approaches and other competing ones.
Date:	2017–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1709.01268&r=big

Employee turnover prediction and retention policies design: a case study

By:	Edouard Ribes (IRSEM - Institut de recherche stratégique de l'Ecole militaire - Ecole militaire); Karim Touahri (UPD5 - Université Paris Descartes - Paris 5); Benoît Perthame (LJLL - Laboratoire Jacques-Louis Lions - UPMC - Université Pierre et Marie Curie - Paris 6 - UPD7 - Université Paris Diderot - Paris 7 - CNRS - Centre National de la Recherche Scientifique)
Abstract:	This paper illustrates the similarities between the problems of customer churn and employee turnover. An example of employee turnover prediction model leveraging classical machine learning techniques is developed. Model outputs are then discussed to design & test employee retention policies. This type of retention discussion is, to our knowledge, innovative and constitutes the main value of this paper.
Keywords:	Churn prediction,Machine learning techniques,Employee Turnover,Classifi- cation,Retention Policy,Workforce Planning
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-01556746&r=big

Die Medienökonomik personalisierter Daten und der Facebook-Fall

By:	Budzinski, Oliver; Grusevaja, Marina
Abstract:	Im Internet erfreut sich ein Geschäftsmodell erheblicher Beliebtheit, bei welchem den Nut-zern Dienstleistungen oder Inhalte (in traditionellen Geldeinheiten) unentgeltlich zur Verfügung gestellt werden und stattdessen die auf dem Wege der Nutzung durch die Nutzer (bewusst oder unbewusst) bereitgestellten persönlichen Daten profitabel verwertet werden, sei es für gezielte Werbung, die Personalisierung und Individualisierung von Produkten und Dienstleistungen oder für datenbasierte Preisdiskriminierung. Im Kontext dieser innovativen Unternehmensstrategien können beim Vorliegen von Marktmacht auch neuartige Formen des Missbrauchs dieser Marktmacht zu Lasten der Nutzer auftreten. So geht beispielsweise derzeit das Bundeskartellamt dem Verdacht nach, dass der dominierende Anbieter von Soziale-Medien-Dienstleistungen, Facebook, seine Marktmacht missbrauche, indem er den Nutzern zu weit reichende Nutzungsrechte an persönlichen Daten abverlangt. Der vorliegende Beitrag nimmt diesen aktuellen Fall zum Anlass, die neuere ökonomische Forschung zur Rolle personalisierter Daten auf Onlinemärkten fallbezogen zusammenzufassen und exemplarisch auf Facebook anzuwenden. Dabei werden mögliche Missbrauchsstrategien auf ihre Plausibilität untersucht. Dabei wird deutlich, dass auch auf Märkten bzw. Plattformseiten, auf denen kein Geld im Sinne der gesetzlichen Währung fließt, dennoch Ausbeutungsmissbrauch möglich und vorstellbar ist. Dies wäre auch im Falle Facebook denkbar, wobei hierzu ohne eine empirische Analyse interner Daten (welche den Autoren nicht vorliegen) keine endgültige Aussage möglich ist.
Keywords:	Medienökonomik,personalisierte Daten,big data,Soziale Medien,Wettbewerbspolitik,Industrieökonomik,Facebook,Marktmacht,targeted advertising,zero-price economy,Internetökonomie,Onlinemärkte
JEL:	L40 K21 L82 D43 D42 E42 L86 L41
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:zbw:tuiedp:107&r=big

Wettbewerbsregeln für das Digitale Zeitalter - Die Ökonomik personalisierter Daten, Verbraucherschutz und die 9. GWB-Novelle

By:	Budzinski, Oliver
Abstract:	Die Digitalisierung der Wirtschaftsbeziehungen stellt die Wettbewerbspolitik vor erhebliche Herausforderungen. Um diesen zu begegnen, soll das Gesetz gegen Wettbewerbsbeschränkungen (GWB) mit Hilfe der 9. Novelle fit gemacht werden für das digitale Zeitalter. Der vorliegende Beitrag gibt eine Übersicht über die wesentlichen Änderungen der deutschen Wettbewerbsregeln und diskutiert ausgewählte, die digitale Ökonomie betreffende Änderungen vor dem Hintergrund des aktuellen Standes der Wirtschaftstheorie. Dabei wird deutlich, dass die 9. Novelle in einigen Bereichen eine Verbesserung der Wettbewerbspolitik in digitalisierten Märkten ermöglicht, in anderen Bereichen jedoch zu kurz greift. Schließlich werden drei Bereiche des andauernden Digitalisierungsprozesses skizziert, von denen zu erwarten ist, dass sie die Wettbewerbspolitik in den nächsten Jahren vor neue Herausforderungen stellen werden.
Keywords:	Digitalisierung,Wettbewerbspolitik,personalisierte Daten,big data,Verbraucherschutz,Wettbewerbsökonomik,Facebook,Internetökonomie,algorithmenbasierte Kollusion,datenbasierte Preisdiskriminierung,persönliche digitale Assistenten,Industrie 4.0
JEL:	L40 K21 L86 L82 L81 L10 L15 D80
Date:	2017
URL:	http://d.repec.org/n?u=RePEc:zbw:tuiedp:108&r=big

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.