nep-big New Economics Papers
on Big Data
Issue of 2017‒09‒10
ten papers chosen by
Tom Coupé
University of Canterbury

  1. Economic Predictions with Big Data: The Illusion Of Sparsity By Giannone, Domenico; Lenza, Michele; Primiceri, Giorgio E
  2. Machine learning at central banks By Chakraborty, Chiranjit; Joseph, Andreas
  3. Machine learning to improve experimental design By Aufenanger, Tobias
  4. Exploring the Potential of Machine Learning for Automatic Slum Identification from VHE Imagery By Duque, Juan Carlos; Patino, Jorge Eduardo; Betancourt, Alejandro
  5. Between disciplines and experience By Paris Chrysos
  6. Reactivity in Economic Science By Bruno S. Frey
  7. Tensor Representation in High-Frequency Financial Data for Price Change Prediction By Dat Tran Thanh; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
  8. Employee turnover prediction and retention policies design: a case study By Edouard Ribes; Karim Touahri; Benoît Perthame
  9. Die Medienökonomik personalisierter Daten und der Facebook-Fall By Budzinski, Oliver; Grusevaja, Marina
  10. Wettbewerbsregeln für das Digitale Zeitalter - Die Ökonomik personalisierter Daten, Verbraucherschutz und die 9. GWB-Novelle By Budzinski, Oliver

  1. By: Giannone, Domenico; Lenza, Michele; Primiceri, Giorgio E
    Abstract: We compare sparse and dense representations of predictive models in macroeconomics, microeconomics and finance. To deal with a large number of possible predictors, we specify a "spike-and-slab" prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse or dense model but on a wide set of models. A clearer pattern of sparsity can only emerge when models of very low dimension are strongly favored a priori.
    Date: 2017–08
  2. By: Chakraborty, Chiranjit (Bank of England); Joseph, Andreas (Bank of England)
    Abstract: We introduce machine learning in the context of central banking and policy analyses. Our aim is to give an overview broad enough to allow the reader to place machine learning within the wider range of statistical modelling and computational analyses, and provide an idea of its scope and limitations. We review the underlying technical sources and the nascent literature applying machine learning to economic and policy problems. We present popular modelling approaches, such as artificial neural networks, tree-based models, support vector machines, recommender systems and different clustering techniques. Important concepts like the bias-variance trade-off, optimal model complexity, regularisation and cross-validation are discussed to enrich the econometrics toolbox in their own right. We present three case studies relevant to central bank policy, financial regulation and economic modelling more widely. First, we model the detection of alerts on the balance sheets of financial institutions in the context of banking supervision. Second, we perform a projection exercise for UK CPI inflation on a medium-term horizon of two years. Here, we introduce a simple training-testing framework for time series analyses. Third, we investigate the funding patterns of technology start-ups with the aim to detect potentially disruptive innovators in financial technology. Machine learning models generally outperform traditional modelling approaches in prediction tasks, while open research questions remain with regard to their causal inference properties.
    Keywords: Machine learning; artificial intelligence; big data; econometrics; forecasting; inflation; financial markets; banking supervision; financial technology
    JEL: A12 A33 C14 C38 C44 C45 C51 C52 C53 C54 C61 C63 C87 E37 E58 G17 Y20
    Date: 2017–09–04
  3. By: Aufenanger, Tobias
    Abstract: This paper proposes a way of using observational pretest data for the design of experiments. In particular, this paper suggests to train a random forest on the pretest data and to stratify the allocation of treatments to experimental units on the predicted dependent variables. This approach reduces much of the arbitrariness involved in defining strata directly on the basis of covariates. A simulation on 300 random samples drawn from six data sets shows that this algorithm is extremely effective in increasing power compared to random allocation and to traditional ways of stratification. In more than 80% of all samples the estimated variance of the treatment estimator is lower and the estimated power is higher than for standard designs such as complete randomization, conventional stratification or Mahalanobis matching.
    Keywords: experiment design,treatment allocation
    Date: 2017
  4. By: Duque, Juan Carlos; Patino, Jorge Eduardo; Betancourt, Alejandro
    Abstract: Slum identification in urban settlements is a crucial step in the process of formulation of propoor policies. However, the use of conventional methods for slums detection such as field surveys may result time consuming and costly. This paper explores the possibility of implementing a low-cost standardized method for slum detection. We use spectral, texture and structural features extracted from very high spatial resolution imagery as input data and evaluate the capability of three machine learning algorithms (Logistic Regression, Support Vector Machine and Random Forest) to classify urban areas as slum or no-slum. Using data from Buenos Aires (Argentina), Medellin (Colombia), and Recife (Brazil), we found that Support Vector Machine with radial basis kernel deliver the best performance (over 0.81). We also found that singularities within cities preclude the use of a unified classification model.
    Keywords: Ciudades, Desarrollo urbano, Economía, Equidad e inclusión social, Georreferenciación, Investigación socioeconómica, Pobreza, Políticas públicas, Servicios públicos, Vivienda,
    Date: 2016
  5. By: Paris Chrysos (ISC PARIS)
    Abstract: What do we see when we look at data? This recurrent question when confronted to Big Data, is largely answered by two different disciplinary visions dominating the debate during the last years, concluding that data are raw and Big Data is a hubris. While disciplines still treat just a part of them, we experience a wider and wider spreading of Big Data. The notion of “monuments of cyberspace” discussed here helps understand their peculiar nature and delimit related issues of method, of wealth and of experience.
    Keywords: Big data,Disciplines,Experience
    Date: 2017–03–23
  6. By: Bruno S. Frey
    Abstract: There is a fundamental difference between the natural and the social sciences due to reactivity. This difference remains even in the age of Artificially Intelligent Learning Machines and Big Data. Many academic economists take it as a matter of course that economics should become a natural science. Such a characterization misses an essential aspect of a social science, namely reactivity, i.e. human beings systematically respond to economic data, and in particular to interventions by economic policy, in a foreseeable way. To illustrate this finding, I use three examples from quite different fields: Happiness policy, World Heritage policy, and Science policy.
    Keywords: Economics; Social; Natural Science; Reactivity; Data; Happiness; Economic Policy
    JEL: A10 B40 C70 C80 D80 Z10
    Date: 2017–08
  7. By: Dat Tran Thanh; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
    Abstract: Nowadays, with the availability of massive amount of trade data collected, the dynamics of the financial markets pose both a challenge and an opportunity for high frequency traders. In order to take advantage of the rapid, subtle movement of assets in High Frequency Trading (HFT), an automatic algorithm to analyze and detect patterns of price change based on transaction records must be available. The multichannel, time-series representation of financial data naturally suggests tensor-based learning algorithms. In this work, we investigate the effectiveness of two multilinear methods for the mid-price prediction problem against other existing methods. The experiments in a large scale dataset which contains more than 4 millions limit orders show that by utilizing tensor representation, multilinear models outperform vector-based approaches and other competing ones.
    Date: 2017–09
  8. By: Edouard Ribes (IRSEM - Institut de recherche stratégique de l'Ecole militaire - Ecole militaire); Karim Touahri (UPD5 - Université Paris Descartes - Paris 5); Benoît Perthame (LJLL - Laboratoire Jacques-Louis Lions - UPMC - Université Pierre et Marie Curie - Paris 6 - UPD7 - Université Paris Diderot - Paris 7 - CNRS - Centre National de la Recherche Scientifique)
    Abstract: This paper illustrates the similarities between the problems of customer churn and employee turnover. An example of employee turnover prediction model leveraging classical machine learning techniques is developed. Model outputs are then discussed to design & test employee retention policies. This type of retention discussion is, to our knowledge, innovative and constitutes the main value of this paper.
    Keywords: Churn prediction,Machine learning techniques,Employee Turnover,Classifi- cation,Retention Policy,Workforce Planning
    Date: 2017
  9. By: Budzinski, Oliver; Grusevaja, Marina
    Abstract: Im Internet erfreut sich ein Geschäftsmodell erheblicher Beliebtheit, bei welchem den Nut-zern Dienstleistungen oder Inhalte (in traditionellen Geldeinheiten) unentgeltlich zur Verfügung gestellt werden und stattdessen die auf dem Wege der Nutzung durch die Nutzer (bewusst oder unbewusst) bereitgestellten persönlichen Daten profitabel verwertet werden, sei es für gezielte Werbung, die Personalisierung und Individualisierung von Produkten und Dienstleistungen oder für datenbasierte Preisdiskriminierung. Im Kontext dieser innovativen Unternehmensstrategien können beim Vorliegen von Marktmacht auch neuartige Formen des Missbrauchs dieser Marktmacht zu Lasten der Nutzer auftreten. So geht beispielsweise derzeit das Bundeskartellamt dem Verdacht nach, dass der dominierende Anbieter von Soziale-Medien-Dienstleistungen, Facebook, seine Marktmacht missbrauche, indem er den Nutzern zu weit reichende Nutzungsrechte an persönlichen Daten abverlangt. Der vorliegende Beitrag nimmt diesen aktuellen Fall zum Anlass, die neuere ökonomische Forschung zur Rolle personalisierter Daten auf Onlinemärkten fallbezogen zusammenzufassen und exemplarisch auf Facebook anzuwenden. Dabei werden mögliche Missbrauchsstrategien auf ihre Plausibilität untersucht. Dabei wird deutlich, dass auch auf Märkten bzw. Plattformseiten, auf denen kein Geld im Sinne der gesetzlichen Währung fließt, dennoch Ausbeutungsmissbrauch möglich und vorstellbar ist. Dies wäre auch im Falle Facebook denkbar, wobei hierzu ohne eine empirische Analyse interner Daten (welche den Autoren nicht vorliegen) keine endgültige Aussage möglich ist.
    Keywords: Medienökonomik,personalisierte Daten,big data,Soziale Medien,Wettbewerbspolitik,Industrieökonomik,Facebook,Marktmacht,targeted advertising,zero-price economy,Internetökonomie,Onlinemärkte
    JEL: L40 K21 L82 D43 D42 E42 L86 L41
    Date: 2017
  10. By: Budzinski, Oliver
    Abstract: Die Digitalisierung der Wirtschaftsbeziehungen stellt die Wettbewerbspolitik vor erhebliche Herausforderungen. Um diesen zu begegnen, soll das Gesetz gegen Wettbewerbsbeschränkungen (GWB) mit Hilfe der 9. Novelle fit gemacht werden für das digitale Zeitalter. Der vorliegende Beitrag gibt eine Übersicht über die wesentlichen Änderungen der deutschen Wettbewerbsregeln und diskutiert ausgewählte, die digitale Ökonomie betreffende Änderungen vor dem Hintergrund des aktuellen Standes der Wirtschaftstheorie. Dabei wird deutlich, dass die 9. Novelle in einigen Bereichen eine Verbesserung der Wettbewerbspolitik in digitalisierten Märkten ermöglicht, in anderen Bereichen jedoch zu kurz greift. Schließlich werden drei Bereiche des andauernden Digitalisierungsprozesses skizziert, von denen zu erwarten ist, dass sie die Wettbewerbspolitik in den nächsten Jahren vor neue Herausforderungen stellen werden.
    Keywords: Digitalisierung,Wettbewerbspolitik,personalisierte Daten,big data,Verbraucherschutz,Wettbewerbsökonomik,Facebook,Internetökonomie,algorithmenbasierte Kollusion,datenbasierte Preisdiskriminierung,persönliche digitale Assistenten,Industrie 4.0
    JEL: L40 K21 L86 L82 L81 L10 L15 D80
    Date: 2017

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.