nep-big 2018-05-28 papers

on Big Data

Issue of 2018‒05‒28
thirteen papers chosen by
Tom Coupé
University of Canterbury

Perpetual growth, distribution, and robots By Nomaler, Onder; Verspagen, Bart
Terrorist Attacks and Immigration Rhetoric: A Natural Experiment on British MPs By Daniele Guariso
RNN-based counterfactual time-series prediction By Jason Poulos
Intertopic Distances as Leading Indicators By Melody Y. Huang; Randall R. Rojas; Patrick D. Convery
Toward labor market policy 2.0 : the potential for using online job-portal big data to inform labor market policies in India By Nomura,Shinsaku; Imaizumi,Saori; Areias,Ana Carolina; Yamauchi,Futoshi; Nomura,Shinsaku; Imaizumi,Saori; Areias,Ana Carolina; Yamauchi,Futoshi
The potential of big housing data: an application to the Italian real-estate market By Michele Loberto; Andrea Luciani; Marco Pangallo
Selecting Directors Using Machine Learning By Erel, Isil; Stern, Lea Henny; Tan, Chenhao; Weisbach, Michael S.
The Finite Sample Performance of Treatment Effects Estimators based on the Lasso By Michael Zimmert
The Production of Information in an Online World: Is Copy Right? By Julia Cage; nicolas Hervé; Marie-Luce Viaud
Cloud computing and big data in the context of industry 4.0 : opportunities and challenges By thabit atobishi; Szalay Zsigmond Gábor; Szilard podruzsik
Measuring the Diffusion of Innovations with Paragraph Vector Topic Models By David Lenz; Peter Winker
Cost-Benefit Analysis of Artificial Intelligence (AI) Fired Robots (AI-Bots) Replacing Educators By Tejendra Kalia
Digitalisation and Jobs in the Real Estate Industry By Daniel Piazolo

Perpetual growth, distribution, and robots

By:	Nomaler, Onder (ECIS, TU Eindhoven); Verspagen, Bart (UNU-MERIT, Maastricht University)
Abstract:	The current literature on the economic effects of machine learning, robotisation and artificial intelligence suggests that there may be an upcoming wave of substitution of human labour by machines (including software). We take this as a reason to rethink the traditional ways in which technological change has been represented in economic models. In doing so, we contribute to the recent literature on so-called perpetual growth, i.e., growth of per capita income without technological progress. When technology embodied in capital goods are sufficiently advanced, per capita growth becomes possible with a non-progressing state of technology. We present a simple Solow-like growth model that incorporates these ideas. The model predicts a rising wage rate but declining share of wage income in the steady state growth path. We present simulation experiments on several policy options to combat the inequality that results from this, including a universal basic income as well as an option in which workers become owners of "robots".
Keywords:	perpetual economic growth, economic effects of robots, income distribution
JEL:	O15 O41 O33 E25 P17
Date:	2018–05–23
URL:	http://d.repec.org/n?u=RePEc:unm:unumer:2018023&r=big

Terrorist Attacks and Immigration Rhetoric: A Natural Experiment on British MPs

By:	Daniele Guariso (Department of Economics, University of Sussex, Brighton, UK)
Abstract:	We study the effects of exogenous shocks on the rhetoric of British politicians on social media. In particular, we focus on the impact of terrorist attacks on the issue of immigration. For this purpose, we collect all the immigration-related Tweets from the active Twitter accounts of MPs using Web Scraping and Machine Learning techniques. Looking at the Manchester bombing of 2017 as our main Event Study, we detect a counterintuitive finding: a substantial decrease in the expected number of immigration-related Tweets occurred after the incident. We hypothesize that this “muting effect” results from risk-averse strategic behaviour of politicians during the election campaign. However, the MPs' response shows remarkable heterogeneity according to the socio-economic characteristics of their constituencies.
Keywords:	political behaviour; machine learning; social media; immigration; terrorism
JEL:	C81 D72 Z13
Date:	2018–05
URL:	http://d.repec.org/n?u=RePEc:sus:susewp:1218&r=big

RNN-based counterfactual time-series prediction

By:	Jason Poulos
Abstract:	This paper proposes an alternative to the synthetic control method (SCM) for estimating the effect of a policy intervention on an outcome over time. Recurrent neural networks (RNNs) are used to predict counterfactual time-series of treated unit outcomes using only the outcomes of control units as inputs. Unlike SCM, the proposed method does not rely on pre-intervention covariates, allows for nonconvex combinations of control units, can handle multiple treated units, and can share model parameters across time-steps. RNNs outperform SCM in terms of recovering experimental estimates from a field experiment extended to a time-series observational setting. In placebo tests run on three different benchmark datasets, RNNs are more accurate than SCM in predicting the post-intervention time-series of control units, while yielding a comparable proportion of false positives. The proposed method contributes to a new literature that uses machine learning techniques for data-driven counterfactual prediction.
Date:	2017–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1712.03553&r=big

Intertopic Distances as Leading Indicators

By:	Melody Y. Huang; Randall R. Rojas; Patrick D. Convery
Abstract:	We use a topic modeling algorithm and sentiment scoring methods to construct a novel metric to use as a leading indicator in recession prediction models. We hypothesize that due to non-instantaneous information flows, the inclusion of such a sentiment indicator derived purely from unstructured news data will improve our capabilities to forecast future recessions. We go on to show that the use of this proposed metric, even when included with consumer survey data, helps improve model performance significantly. This metric, in combination with consumer survey data, S&P 500 returns, and the yield curve, produces forecasts that significantly outperform models of higher complexity, containing traditional economic indicators.
Date:	2018–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1805.04160&r=big

Toward labor market policy 2.0 : the potential for using online job-portal big data to inform labor market policies in India

By:	Nomura,Shinsaku; Imaizumi,Saori; Areias,Ana Carolina; Yamauchi,Futoshi; Nomura,Shinsaku; Imaizumi,Saori; Areias,Ana Carolina; Yamauchi,Futoshi
Abstract:	Economists and other social scientists are increasingly using big data analytics to address longstanding economic questions and complement existing information sources. Big data produced by online platforms can yield a wealth of diverse, highly granular, multidimensional information with a variety of potential applications. This paper examines how online job-portal data can be used as a basis for policy-relevant research in the fields of labor economics and workforce skills development, through an empirical analysis of information generated by Babajob, an online Indian job portal. The analysis highlights five key areas where online job-portal data can contribute to the development of labor market policies and analytical knowledge: (i) labor market monitoring and analysis; (ii) assessing demand for workforce skills; (iii) observing job-search behavior and improving skills matching; (iv) predictive analysis of skills demand; and (v) experimental studies. The unique nature of the data produced by online job-search portals allows for the application of diverse analytical methodologies, including descriptive data analysis, time-series analysis, text analysis, predictive analysis, and transactional data analysis. This paper is intended to contribute to the academic literature and the development of public policies. It contributes to the literature on labor economics through application of big data analytics to real-world data. The analysis also provides a unique case study on labor market data analytics in a developing-country context in South Asia. Finally, the report examines the potential for using big data to improve the design and implementation of labor market policies and promote demand-driven skills development.
Keywords:	Labor Markets,Rural Labor Markets,ICT Applications,Educational Sciences
Date:	2017–02–09
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:7966&r=big

The potential of big housing data: an application to the Italian real-estate market

By:	Michele Loberto (Bank of Italy); Andrea Luciani (Bank of Italy); Marco Pangallo (University of Oxford)
Abstract:	We present a new dataset of housing sales advertisements (ads) taken from Immobiliare.it, a popular online portal for real estate services in Italy. This dataset fills a big gap in Italian housing market statistics, namely the absence of detailed physical characteristics for houses sold. The granularity of online data also makes possible timely analyses at a very detailed geographical level. We first address the main problem of the dataset, i.e. the mismatch between ads and actual housing units - agencies have incentives for posting multiple ads for the same unit. We correct this distortion by using machine learning tools and provide evidence about its quantitative relevance. We then show that the information from this dataset is consistent with existing official statistical sources. Finally, we present some unique applications for these data. For example, we provide first evidence at the Italian level that online interest in a particular area is a leading indicator of prices. Our work is a concrete example of the potential of large user-generated online databases for institutional applications.
Keywords:	big data, machine learning, housing market
JEL:	C44 C81 R31
Date:	2018–04
URL:	http://d.repec.org/n?u=RePEc:bdi:wptemi:td_1171_18&r=big

Selecting Directors Using Machine Learning

By:	Erel, Isil (Ohio State University); Stern, Lea Henny (University of Washington); Tan, Chenhao (University of Colorado); Weisbach, Michael S. (Ohio State University)
Abstract:	Can an algorithm assist firms in their hiring decisions of corporate directors? This paper proposes a method of selecting boards of directors that relies on machine learning. We develop algorithms with the goal of selecting directors that would be preferred by the shareholders of a particular firm. Using shareholder support for individual directors in subsequent elections and firm profitability as performance measures, we construct algorithms to make out-of-sample predictions of these measures of director performance. We then run tests of the quality of these predictions and show that, when compared with a realistic pool of potential candidates, directors predicted to do poorly by our algorithms indeed rank much lower in performance than directors who were predicted to do well. Deviations from the benchmark provided by the algorithms suggest that firm-selected directors are more likely to be male, have previously held more directorships, have fewer qualifications and larger networks. Machine learning holds promise for understanding the process by which existing governance structures are chosen, and has potential to help real world firms improve their governance.
JEL:	G34 M12 M51
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:ecl:ohidic:2018-05&r=big

The Finite Sample Performance of Treatment Effects Estimators based on the Lasso

By:	Michael Zimmert
Abstract:	This paper contributes to the literature on treatment effects estimation with machine learning inspired methods by studying the performance of different estimators based on the Lasso. Building on recent work in the field of high-dimensional statistics, we use the semiparametric efficient score estimation structure to compare different estimators. Alternative weighting schemes are considered and their suitability for the incorporation of machine learning estimators is assessed using theoretical arguments and various Monte Carlo experiments. Additionally we propose an own estimator based on doubly robust Kernel matching that is argued to be more robust to nuisance parameter misspecification. In the simulation study we verify theory based intuition and find good finite sample properties of alternative weighting scheme estimators like the one we propose.
Date:	2018–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1805.05067&r=big

The Production of Information in an Online World: Is Copy Right?

By:	Julia Cage (Département d'économie); nicolas Hervé (Institut national de l'audiovisuel (INA)); Marie-Luce Viaud (Institut national de l'audiovisuel (INA))
Abstract:	This paper documents the extent of copying and estimates the returns to originality in online news production. We build a unique dataset combining all the online content produced by the universe of news media (newspaper, television, radio, pure online media, and a news agency) in France during the year 2013 with new micro audience data. We develop a topic detection algorithm that identifies each news event, trace the timeline of each story and study news propagation. We show that one quarter of the news stories are reproduced online in less than 4 minutes. High reactivity comes with verbatim copying. We find that only 32.6% of the online content is original. The negative impact of copying on newsgathering incentives might however be counterbalanced by reputation effects. By using media-level daily audience and article-level Facebook shares, we show that original content represents 57.8% of online news consumption. Reputation mechanisms actually appear to solve about 40% of the copyright violation problem.
Keywords:	Copyright; Facebook; Information spreading; Internet; Investigative journalism; Reputation
JEL:	L11 L15 L82 L86
Date:	2017–05
URL:	http://d.repec.org/n?u=RePEc:spo:wpmain:info:hdl:2441/38tbdqmgvf8f9amamb132hea9b&r=big

Cloud computing and big data in the context of industry 4.0 : opportunities and challenges

By:	thabit atobishi (Szent Istvan university); Szalay Zsigmond Gábor (Szent Istvan university); Szilard podruzsik (Corvinus University of Budapest)
Abstract:	The global industrial systems have changed in the last few years due to great technological advancement in many fields. The Industry 4.0 concept has emerged in 2011 in Germany and later has been adopted and investigated by both academic and practitioners in many other advanced countries. Two main new technologies associated by and will have a great impact on industry 4.0. In this review paper, we shielded the light on cloud computing and big data. We present the possibilities and challenges associated with these technologies. The review reveals that sharing, efficiency in production and information sharing are the main possibilities of cloud computing meanwhile the security and privacy are main concerns. From other hand big data brings many opportunities like cost reduction, support the efficiency of decision making, however challenges related to large-scale parallel system and technical challenges still need to be addressed.
Keywords:	cloud computing, information technology, industry 4.0
JEL:	M15 O32 O32
Date:	2018–04
URL:	http://d.repec.org/n?u=RePEc:sek:iacpro:7508667&r=big

Measuring the Diffusion of Innovations with Paragraph Vector Topic Models

By:	David Lenz (Justus-Liebig-University Giessen); Peter Winker (Justus-Liebig-University Giessen)
Abstract:	Topic modeling became an intensively researched area in economics, mainly due to the ever increasing availability of huge digital text information and the improvements in methods to analyze these datasets. In natural language processing, topic modeling describes a set of methods to extract the latent topics from a collection of documents. Several new methods have recently been proposed to improve the topic generation process. However, examination of the generated topics is still mostly based on unsatisfactory practices, for example by looking only at the list of most frequent words for a topic. Our contribution is threefold: 1) We present a topic modeling approach based on neural embeddings and Gaussian mixture modeling, which is shown to generate coherent and meaningful topics. 2) We propose a novel "topic report" based on dimensionality reduction techniques and model generated document vector features which helps to easily identify topics and significantly reduces the required mental overhead. 3) Lastly, we demonstrate on a technology related newsticker corpus how our approach could be used by economists to tackle economic problems, for example to measure the diffusion of innovations.
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:201815&r=big

Cost-Benefit Analysis of Artificial Intelligence (AI) Fired Robots (AI-Bots) Replacing Educators

By:	Tejendra Kalia (Worcester State University, MA)
Abstract:	In 2016, Buckingham Universityâ€™s Vice-Chancellor predicted that the educators will lose their traditional role in 10 years and effectively become little more than classroom assistants (2017, News.com.au. This is supported by the Georgia Institute of Technologyâ€™s Computer Science Professor â€œAshok Goyalâ€ , who has been using Jill Watson (AI-Bot) successfully since 2016 as a Teaching Assistant to help online students (2016, Hillary Lipko). Jokes cracking Sophia (AI-Bot) by Hanson Robotics of Hong Kong mimics human beings. She appeared at Austin in the 2016 Interactive Festival and in the same year became a citizen of Saudi Arabia (2016, Sean Martin). Apparently, AI-Bots have already demonstrated superior performance in many areas. This poses a threat to educators of being replaced by AI-Bots. However, AI-Bots are expensive. The cost of the most advanced AI-Bot â€œASIMOâ€ by Honda in 2016 was US$ 2.5 million (Hnoda.com). Most are afraid of being replaced by Robots. In 2017, Sophia urged, people in India, not to fear AI-Bots, but the 2017 Oxford Universityâ€™s study estimated that 47% of all U.S. jobs could be replaced by AI-Bots within 20 years. This trend is confirmed by the findings of the Center for Business and Economic Research at Ball State University, which attributed 85% of the 5.6m manufacturing job losses between 2000 to 2010 in the USA were due to technology.
Keywords:	Artificial Intelligence, AI-Bots, Sofia, ASIMO, Potential Savings
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:smo:fpaper:019&r=big

Digitalisation and Jobs in the Real Estate Industry

By:	Daniel Piazolo
Abstract:	Digital technologies and automation of job routines will lead to the replacement of administrative and operative roles in which activities are most repetitive and predictable. Analyses find for Facility Management jobs a 70 percent chance of being automated. However, novel possibilities through the use of digital tools like artificial intelligence will create new employment possibilities within the various real estate areas like Facility Management and Property Management. This paper will examine the likely loosers and winners within the real estate industry due to the digital revolution. A larger share of the real estate sector work force will perform complex, judgement-based problem solving. Higher-skilled and better paying jobs will emerge. Digitalisation is a job killer and a job engine at the same time.
Keywords:	Automation; Digitalisation; Disruption; Employment; Future of work
JEL:	R3
Date:	2017–07–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2017_63&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.