nep-big 2018-10-29 papers

on Big Data

Issue of 2018‒10‒29
fourteen papers chosen by
Tom Coupé
University of Canterbury

Spread the Word: International Spillovers from Central Bank Communication By Armelius, Hanna; Bertsch, Christoph; Hull, Isaiah; Zhang, Xin
Credit Risk Analysis Using Machine and Deep Learning Models By Dominique Guegan; Peter Addo; Bertrand Hassani
Data, measurement and initiatives for inclusive digitalization and future of work By Nofal, María B.; Coremberg, Ariel; Sartorio, Luca
Reinforcement learning in financial markets - a survey By Fischer, Thomas G.
Identifikation von empirischen Unternehmenscharakteristika mittels Machine Learning Verfahren: Gemeinsames Projekt von DATAlovers, Institut der deutschen Wirtschaft und IW Consult By Fritsch, Manuel; Goecke, Henry; Kulpa, Andreas
Breaking Audio Captchas for IRCTC Booking Automization By Nipun Bansal; Mukul Sachdeva; Tanisha Mittal
The quality of spatial variety of urban tourism and hotel room rates By Tony ShunTe Yuo
Organizational culture and public diplomacy in the digital sphere: The case of South Korea By Jeffrey Robertson
The role of government in science and technology legislation to prepare for the era of artificial intelligence By Soyoung Park
IoT measurement and applications By OECD
Different automated valuation modelling techniques evaluated over time. By Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Scognamiglio
Privacy concerns in China's smart city campaign: The deficit of China's Cybersecurity Law By Fan Yang and Jian Xu
Nearly exact Bayesian estimation of non-linear no-arbitrage term structure models By Marcello Pericoli; Marco Taboga
Harnessing the opportunities of inclusive technologies in a global economy By Beliz, Gustavo; Basco, Ana Inés; de Azevedo, Belisario

Spread the Word: International Spillovers from Central Bank Communication

By:	Armelius, Hanna (Payments Department); Bertsch, Christoph (Research Department, Central Bank of Sweden); Hull, Isaiah (Research Department, Central Bank of Sweden); Zhang, Xin (Research Department, Central Bank of Sweden)
Abstract:	We use text analysis and a novel dataset to measure the sentiment component of central bank communications in 23 countries over the 2002-2017 period. Our analysis yields three key results. First, using directed networks, we show that comovement in sentiment across central banks is not reducible to trade or financial ow exposure. Second, we find that geographic distance is a robust and economically significant determinant of comovement in central bank sentiment, while shared language and colonial ties are economically significant, but less robust. Third, we use structural VARs to show that sentiment shocks generate cross-country spillovers in sentiment, policy rates, and macroeconomic variables. We also find that the Fed plays a uniquely in uential role in generating such sentiment spillovers, while the ECB is primarily in uenced by other central banks. Overall, our results suggest that central bank communication contains systematic biases that could lead to suboptimal policy outcomes.
Keywords:	communication; monetary policy; international policy transmission
JEL:	E52 E58 F42
Date:	2018–09–01
URL:	http://d.repec.org/n?u=RePEc:hhs:rbnkwp:0357&r=big

Credit Risk Analysis Using Machine and Deep Learning Models

By:	Dominique Guegan (UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - CNRS - Centre National de la Recherche Scientifique - UP1 - Université Panthéon-Sorbonne, Labex ReFi - UP1 - Université Panthéon-Sorbonne, IPAG Business School - IPAG BUSINESS SCHOOL PARIS, University of Ca’ Foscari [Venice, Italy]); Peter Addo (AFD - Agence française de développement, Labex ReFi - UP1 - Université Panthéon-Sorbonne); Bertrand Hassani (Labex ReFi - UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - CNRS - Centre National de la Recherche Scientifique - UP1 - Université Panthéon-Sorbonne, Capgemini Consulting [Paris], UCL-CS - Computer science department [University College London] - UCL - University College of London [London])
Abstract:	Due to the advanced technology associated with Big Data, data availability and computing power, most banks or lending institutions are renewing their business models. Credit risk predictions, monitoring, model reliability and effective loan processing are key to decision-making and transparency. In this work, we build binary classifiers based on machine and deep learning models on real data in predicting loan default probability. The top 10 important features from these models are selected and then used in the modeling process to test the stability of binary classifiers by comparing their performance on separate data. We observe that the tree-based models are more stable than the models based on multilayer artificial neural networks. This opens several questions relative to the intensive use of deep learning systems in enterprises.
Keywords:	financial regulation,deep learning,Big data,data science,credit risk
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:hal:cesptp:halshs-01835164&r=big

Data, measurement and initiatives for inclusive digitalization and future of work

By:	Nofal, María B.; Coremberg, Ariel; Sartorio, Luca
Abstract:	As the pace of digitalization and automation accelerates globally, and more disruptive innovations in machine learning, artificial intelligence and robotics are expected, new data sources and measurement tools are needed to complement existing valuable statistics and administrative data. This is necessary to better understand the impact of technological change on the labor market and the economy and better inform policy decisions for inclusive people centered growth. In accordance with G20 Roadmap for Digitalisation (2017), points 10, 5 and 7, the authors propose to: i) track technological developments globally in a multidisciplinary and coordinated fashion; ii) develop new methods of measurement for the digital economy; iii) harmonize occupational taxonomies and develop new sources of data and indicators at the international level; iv) Build International Collaborative Platforms for Digital Skills and the Digital Transformation of SMES.
Keywords:	globalization,labor markets,employment polarization,labor share,skills,productivity,innovation,technological change,economic growth
JEL:	E01 J23 J24 J31 E25 O33 O4
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:zbw:ifwedp:201871&r=big

Reinforcement learning in financial markets - a survey

By:	Fischer, Thomas G.
Abstract:	The advent of reinforcement learning (RL) in financial markets is driven by several advantages inherent to this field of artificial intelligence. In particular, RL allows to combine the "prediction" and the "portfolio construction" task in one integrated step, thereby closely aligning the machine learning problem with the objectives of the investor. At the same time, important constraints, such as transaction costs, market liquidity, and the investor's degree of risk-aversion, can be conveniently taken into account. Over the past two decades, and albeit most attention still being devoted to supervised learning methods, the RL research community has made considerable advances in the finance domain. The present paper draws insights from almost 50 publications, and categorizes them into three main approaches, i.e., critic-only approach, actor-only approach, and actor-critic approach. Within each of these categories, the respective contributions are summarized and reviewed along the representation of the state, the applied reward function, and the action space of the agent. This cross-sectional perspective allows us to identify recurring design decisions as well as potential levers to improve the agent's performance. Finally, the individual strengths and weaknesses of each approach are discussed, and directions for future research are pointed out.
Keywords:	financial markets,reinforcement learning,survey,trading systems,machine learning
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:zbw:iwqwdp:122018&r=big

Identifikation von empirischen Unternehmenscharakteristika mittels Machine Learning Verfahren: Gemeinsames Projekt von DATAlovers, Institut der deutschen Wirtschaft und IW Consult

By:	Fritsch, Manuel; Goecke, Henry; Kulpa, Andreas
Abstract:	[Einleitung] Dieser Projektabschlussbericht fasst die Ergebnisse des gemeinsamen Projektes der DATAlovers AG, dem Institut der deutschen Wirtschaft und der IW Consult zusammen. Das Konsortium hat sich zusammengeschlossen um zu evaluieren, inwieweit Machine Learning Ansätze in Kombination mit den Inhalten von Unternehmensinternetseiten als primäre Informationsquelle für die wissenschaftliche Forschung angewendet werden können. Die grundlegende Aufgabenstellung des Projektes besteht darin zu klären, ob die Kombination aus den neuen Methoden des maschinellen Lernens und den Texten von Internetseiten bei der Identifikation von unternehmerischen Zwillingen wissenschaftlichen Ansprüchen genügt. Hierbei sollen validierte Informationen zu einer vergleichsweise kleinen Zahl an Unternehmen auf die Gesamtheit der Unternehmen übertragen werden. Bei einem Erfolg des Ansatzes würde dies bedeuten, dass mit einer kostengünstigen Methode Ergebnisse für alle deutschen Unternehmen gewonnen werden können. Durch diese „quasi-Vollerhebung“ würden sich viele weitere Anwendungsmöglichkeiten für ein Forschungsinstitut eröffnen. Die Aufgabenteilung in diesem Projekt gestaltet sich wie folgt: Das Institut der deutschen Wirtschaft und die IW Consult liefern die originären Informationen der Unternehmen und übernehmen die Quantifizierung der Ergebnisse. DATAlovers bringt als originäre Daten die Texte der Internetseiten aller deutschen Unternehmen mit ein, trainiert mit der gesamten Datenmenge einen Algorithmus und bestimmt die Prognosen des Algorithmus. Allgemein stellt die beschriebene Aufgabe ein Klassifizierungsproblem dar: Mit Hilfe einer großen Datenmenge soll für jedes deutsche Unternehmen entschieden werden, ob es zu einer spezifischen Gruppe gehört oder nicht. Für derartige Fragestellungen bietet sich die Verwendung von Machine Learning Methoden an. Da die Zielgrößen (die jeweiligen Gruppen) bekannt sind, ist die Klasse des überwachten maschinellen Lernens (im Gegensatz zum unüberwachten maschinellen Lernen) anzuwenden. Hierzu gehören beispielsweise die Methoden Logistic Regression, Random Forest, Support Vector Machine oder Decision Tree (vgl. ausführlicher dazu Brownlee, 2016; Provost/Fawcett, 2013). Diese Ansätze werden vermehrt in der aktuellen Forschung und in der Statistik eingesetzt. Beispielsweise werden die Methoden von Feuerhake/Dumpert (2016) bei der Klassifizierung von Unternehmen in die deutsche Handwerksstatistik verwendet. Dumpert et al. (2016) verwenden diese Ansätze, um Unternehmen in den sogenannten Dritten Unternehmenssektor einzusortieren, bei Finke et al. (2017) erfolgt eine Zuordnung der Müttereigenschaft bei Frauen und Dumpert/Beck (2017) verwenden diese Methoden zur Klassifikation der Staatsangehörigkeit bei Personen. Des Weiteren werden aktuell Machine Learning Algorithmen verwendet, um Datensätze miteinander zu verknüpfen (z. B. Schild et. al., 2017). Damit lässt sich dieses Projekt, von der Methode her, den aktuellen Ansätzen in der amtlichen Statistik zuordnen.
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:zbw:iwkrep:352018&r=big

Breaking Audio Captchas for IRCTC Booking Automization

By:	Nipun Bansal (Delhi Technological University); Mukul Sachdeva (DTU); Tanisha Mittal (DTU)
Abstract:	CAPTCHAs are computer generated tests in the form of images, audios and object recognition that world can communicate easily and computer systems cannot. Internet sites present users with captchas to set apart human users from false computer programs, often referred to as bots. Their purpose is to obstruct attackers from performing automatic registration, online polling and other such actions. IRCTC, being the website to reserve tickets for Indian railways, one of the biggest railway network, has also employed both image and audio captchas for security purposes. However, the audio captchas used on the website are not effective in distinguishing between humans and bots. Most of the visual CAPTCHAs and some audio CAPTCHAs on different websites have been cracked using various methods of machine learning and we propound an identical idea to examine the security of audio CAPTCHAs on IRCTC website. In this paper, we show that our bot is able to break the IRCTC audio captchas with a success rate of 98%, 96.04% and 80.3% using three different models. Along with breaking the captcha, another python script written by us was able to automate the process of ticket booking. Thus, combining all of it into a single package could result in a system which would login and reserve tickets only by a single click. Travel brokers can easily use such a system for easy and fast booking of tatkal tickets which would lead to commercializing this activity for deriving huge profit from needy travelers.
Keywords:	Audio Captchas, Automatic Speech Recognition, IRCTC, Security, MFCC, Deep Learning
JEL:	L86 C80 D85
Date:	2018–07
URL:	http://d.repec.org/n?u=RePEc:sek:iacpro:8209601&r=big

The quality of spatial variety of urban tourism and hotel room rates

By:	Tony ShunTe Yuo
Abstract:	Tourism has become one of the major strategies for urban or city authority to generate competitiveness. The benefits include increasing incoming tourists, conference and conventional business opportunities, foreign direct investments, establishing operational sites or even headquarters. For local citizens, successful tourism could enhance job opportunities, innovations and normally will induce higher infrastructures qualities. Under current trend of IoT and big data applications, smart tourism is meant to be one of the crucial movements of this century. This research established a spatial database of tourism resources of Taiwan and its major competitors from open data sources to evaluate and reveal the gaps between users’ demands and current tourism data supply. This research focused on Taiwan and its surrounding competitive cities, and examining the methodologies and proper open data environment. The results suggested a more user- orientation open data platform should be considered and the 3D visualized measurements of spatial variety could help users to identify and making tourism decisions. The empirical results also reveal some interesting implications to the determinants of hotel room rates.
Keywords:	Agglomeration Economies; Data Mining; Spatial Analysis; Visualizing measurements
JEL:	R3
Date:	2018–01–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2018_95&r=big

Organizational culture and public diplomacy in the digital sphere: The case of South Korea

By:	Jeffrey Robertson
Abstract:	Digital diplomacy is the latest technological advance to push change in diplomatic practice. It relates to the application of digital technologies, including information and communication technologies, software engineering and big data, and artificial intelligence, to the practice of diplomacy. Positioned in the top ranks of connectivity, internet speed, smartphone ownership, and social media usage, South Korea should be a leader in the use of digital technologies in diplomatic practice. However, South Korea is not a leader; indeed, it has been left behind. I explore digital diplomacy as a â€œdisruptive technologyâ€ and look at criteria for organizational adaptation. I then use these criteria to assess South Korea's adaptation and draw from these the specific policy challenges facing South Korea. To conclude, I propose four core criteria to aid digital diplomacy adaptation in South Korea and other similar states.
Keywords:	digital diplomacy, diplomacy, South Korea, foreign policy, foreign ministry
Date:	2018–10–08
URL:	http://d.repec.org/n?u=RePEc:een:appswp:201848&r=big

The role of government in science and technology legislation to prepare for the era of artificial intelligence

By:	Soyoung Park (Korea Institute of S&T evaluation and planning)
Abstract:	With the rapid development of AI(Artificial Intelligence) technology, new types of accidents that have not happened before are occurring. An autonomous vehicle causes a crash, and a guard robot attacks a child. However, most countries have not established a legal framework for coping with these accidents. If such a situation continues, the legal risk will increase, which will hinder the development and utilization of AI.So it is time for the government to worry about the legislation to prepare for the AI era. To build confidence in AI technology, it is necessary to construct a system for coexistence with existing systems. As a part of this preparation, this study analyzes the legal issues of AI disputes and draws the government 's task in the field of science and technology legislation for the preparation of AI era.
Keywords:	Artificial Intelligence, legislation, liability system
Date:	2018–07
URL:	http://d.repec.org/n?u=RePEc:sek:ilppro:7909652&r=big

IoT measurement and applications

By:	OECD
Abstract:	The Cancun Ministerial mandate on the Digital Economy (2016) highlighted the importance of developing Internet of Things (IoT) metrics to assess the effects of the IoT in different policy areas. Accordingly, this report reviews different definitions of IoT in view of establishing an operational definition for the CDEP work, and proposes a taxonomy for IoT measurement. The report also explores potential challenges for communication infrastructures due to the exponential growth of IoT devices through the application of connected and automated vehicles. This IoT application was chosen as the data transmission requirements of fully automated vehicles may have substantial implications for network infrastructure, and therefore may require prioritisation in terms of measurement.
Date:	2018–10–23
URL:	http://d.repec.org/n?u=RePEc:oec:stiaab:271-en&r=big

Different automated valuation modelling techniques evaluated over time.

By:	Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Scognamiglio
Abstract:	We use a rich data set consisting of 123,000 houses sold in Switzerland between 2004 and 2017 to investigate different automated valuation techniques in settings where the models are updated regularly. We apply six methods (linear regression, robust regression, mixed effects regression, gradient boosting, random forests, and neural networks) to both moving window and extending window models. With respect to the criteria of appraisal accuracy and stability, the preferred methods are robust regression using moving windows, gradient boosting using extending windows, or mixed effects regression for either strategy.
Keywords:	automated valuation; Machine Learning; Statistics
JEL:	R3
Date:	2018–01–01
URL:	http://d.repec.org/n?u=RePEc:arz:wpaper:eres2018_40&r=big

Privacy concerns in China's smart city campaign: The deficit of China's Cybersecurity Law

By:	Fan Yang and Jian Xu
Abstract:	Many cities around the world are increasingly embedding technological infrastructure in urban spaces. These infrastructures aim to collect vast amounts of data from citizens with an apparent purpose of improving public services. This article discusses privacy concerns generated by China's nationwide smart city campaign and further investigates why China's latest Cybersecurity Law is not adequate to address the risks to citizens' privacy. We argue that there is no functional privacy law in China that would apply to most data collected by smart city infrastructure; nor is there any law that would protect any personal data collected under this framework. We therefore propose practical suggestions to better protect citizens' data in China's ongoing smart city campaign.
Keywords:	big data, China, Cybersecurity Law, privacy, smart cities
Date:	2018–10–05
URL:	http://d.repec.org/n?u=RePEc:een:appswp:201839&r=big

Nearly exact Bayesian estimation of non-linear no-arbitrage term structure models

By:	Marcello Pericoli (Bank of Italy); Marco Taboga (Bank of Italy)
Abstract:	We propose a general method for the Bayesian estimation of nonlinear no-arbitrage term structure models. The main innovations we introduce are: 1) a computationally efficient method, based on deep learning techniques, for approximating no-arbitrage model-implied bond yields to any desired degree of accuracy; and 2) computational graph optimizations for accelerating the MCMC sampling of the model parameters and of the unobservable state variables that drive the short-term interest rate. We apply the proposed techniques for estimating a shadow rate model with a time-varying lower bound, in which the shadow rate can be driven by both spanned unobservable factors and unspanned macroeconomic factors.
Keywords:	yield curve, shadow rate, deep learning, artificial intelligence
JEL:	C32 E43 G12
Date:	2018–09
URL:	http://d.repec.org/n?u=RePEc:bdi:wptemi:td_1189_18&r=big

Harnessing the opportunities of inclusive technologies in a global economy

By:	Beliz, Gustavo; Basco, Ana Inés; de Azevedo, Belisario
Abstract:	In this paper the authors propose that G20 countries endorse and facilitate the creation of a T20 digital platform for "Accelerating the Jobs of the Future". In a world driven by a new wave of technological change, the platform would revalue the role of think tanks, research institutions and knowledge hubs to move the global agenda in an issue of central importance for the future of society: the creation of the jobs of the future. Building on and complementing existing experiences, the T20 platform would be a digital hub for producing knowledge, informing policies and connecting potential partners to accelerate the jobs of the future, within the context of an increasing integrated global economy. It would also contribute to the development of consensual views among the research community, allowing to discard extreme visions about the jobs of the future, dispelling both overly optimistic visions with no evidence base and unwarranted fears.
Keywords:	employment,future,digitalization,technology,industry 4.0,artificial intelligence,skills,inequality,gender gap,G20
JEL:	J01 E24 O30
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:zbw:ifwedp:201872&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.