nep-big New Economics Papers
on Big Data
Issue of 2018‒10‒29
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Spread the Word: International Spillovers from Central Bank Communication By Armelius, Hanna; Bertsch, Christoph; Hull, Isaiah; Zhang, Xin
  2. Credit Risk Analysis Using Machine and Deep Learning Models By Dominique Guegan; Peter Addo; Bertrand Hassani
  3. Data, measurement and initiatives for inclusive digitalization and future of work By Nofal, María B.; Coremberg, Ariel; Sartorio, Luca
  4. Reinforcement learning in financial markets - a survey By Fischer, Thomas G.
  5. Identifikation von empirischen Unternehmenscharakteristika mittels Machine Learning Verfahren: Gemeinsames Projekt von DATAlovers, Institut der deutschen Wirtschaft und IW Consult By Fritsch, Manuel; Goecke, Henry; Kulpa, Andreas
  6. Breaking Audio Captchas for IRCTC Booking Automization By Nipun Bansal; Mukul Sachdeva; Tanisha Mittal
  7. The quality of spatial variety of urban tourism and hotel room rates By Tony ShunTe Yuo
  8. Organizational culture and public diplomacy in the digital sphere: The case of South Korea By Jeffrey Robertson
  9. The role of government in science and technology legislation to prepare for the era of artificial intelligence By Soyoung Park
  10. IoT measurement and applications By OECD
  11. Different automated valuation modelling techniques evaluated over time. By Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Scognamiglio
  12. Privacy concerns in China's smart city campaign: The deficit of China's Cybersecurity Law By Fan Yang and Jian Xu
  13. Nearly exact Bayesian estimation of non-linear no-arbitrage term structure models By Marcello Pericoli; Marco Taboga
  14. Harnessing the opportunities of inclusive technologies in a global economy By Beliz, Gustavo; Basco, Ana Inés; de Azevedo, Belisario

  1. By: Armelius, Hanna (Payments Department); Bertsch, Christoph (Research Department, Central Bank of Sweden); Hull, Isaiah (Research Department, Central Bank of Sweden); Zhang, Xin (Research Department, Central Bank of Sweden)
    Abstract: We use text analysis and a novel dataset to measure the sentiment component of central bank communications in 23 countries over the 2002-2017 period. Our analysis yields three key results. First, using directed networks, we show that comovement in sentiment across central banks is not reducible to trade or financial ow exposure. Second, we find that geographic distance is a robust and economically significant determinant of comovement in central bank sentiment, while shared language and colonial ties are economically significant, but less robust. Third, we use structural VARs to show that sentiment shocks generate cross-country spillovers in sentiment, policy rates, and macroeconomic variables. We also find that the Fed plays a uniquely in uential role in generating such sentiment spillovers, while the ECB is primarily in uenced by other central banks. Overall, our results suggest that central bank communication contains systematic biases that could lead to suboptimal policy outcomes.
    Keywords: communication; monetary policy; international policy transmission
    JEL: E52 E58 F42
    Date: 2018–09–01
  2. By: Dominique Guegan (UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - CNRS - Centre National de la Recherche Scientifique - UP1 - Université Panthéon-Sorbonne, Labex ReFi - UP1 - Université Panthéon-Sorbonne, IPAG Business School - IPAG BUSINESS SCHOOL PARIS, University of Ca’ Foscari [Venice, Italy]); Peter Addo (AFD - Agence française de développement, Labex ReFi - UP1 - Université Panthéon-Sorbonne); Bertrand Hassani (Labex ReFi - UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - CNRS - Centre National de la Recherche Scientifique - UP1 - Université Panthéon-Sorbonne, Capgemini Consulting [Paris], UCL-CS - Computer science department [University College London] - UCL - University College of London [London])
    Abstract: Due to the advanced technology associated with Big Data, data availability and computing power, most banks or lending institutions are renewing their business models. Credit risk predictions, monitoring, model reliability and effective loan processing are key to decision-making and transparency. In this work, we build binary classifiers based on machine and deep learning models on real data in predicting loan default probability. The top 10 important features from these models are selected and then used in the modeling process to test the stability of binary classifiers by comparing their performance on separate data. We observe that the tree-based models are more stable than the models based on multilayer artificial neural networks. This opens several questions relative to the intensive use of deep learning systems in enterprises.
    Keywords: financial regulation,deep learning,Big data,data science,credit risk
    Date: 2018
  3. By: Nofal, María B.; Coremberg, Ariel; Sartorio, Luca
    Abstract: As the pace of digitalization and automation accelerates globally, and more disruptive innovations in machine learning, artificial intelligence and robotics are expected, new data sources and measurement tools are needed to complement existing valuable statistics and administrative data. This is necessary to better understand the impact of technological change on the labor market and the economy and better inform policy decisions for inclusive people centered growth. In accordance with G20 Roadmap for Digitalisation (2017), points 10, 5 and 7, the authors propose to: i) track technological developments globally in a multidisciplinary and coordinated fashion; ii) develop new methods of measurement for the digital economy; iii) harmonize occupational taxonomies and develop new sources of data and indicators at the international level; iv) Build International Collaborative Platforms for Digital Skills and the Digital Transformation of SMES.
    Keywords: globalization,labor markets,employment polarization,labor share,skills,productivity,innovation,technological change,economic growth
    JEL: E01 J23 J24 J31 E25 O33 O4
    Date: 2018
  4. By: Fischer, Thomas G.
    Abstract: The advent of reinforcement learning (RL) in financial markets is driven by several advantages inherent to this field of artificial intelligence. In particular, RL allows to combine the "prediction" and the "portfolio construction" task in one integrated step, thereby closely aligning the machine learning problem with the objectives of the investor. At the same time, important constraints, such as transaction costs, market liquidity, and the investor's degree of risk-aversion, can be conveniently taken into account. Over the past two decades, and albeit most attention still being devoted to supervised learning methods, the RL research community has made considerable advances in the finance domain. The present paper draws insights from almost 50 publications, and categorizes them into three main approaches, i.e., critic-only approach, actor-only approach, and actor-critic approach. Within each of these categories, the respective contributions are summarized and reviewed along the representation of the state, the applied reward function, and the action space of the agent. This cross-sectional perspective allows us to identify recurring design decisions as well as potential levers to improve the agent's performance. Finally, the individual strengths and weaknesses of each approach are discussed, and directions for future research are pointed out.
    Keywords: financial markets,reinforcement learning,survey,trading systems,machine learning
    Date: 2018
  5. By: Fritsch, Manuel; Goecke, Henry; Kulpa, Andreas
    Abstract: [Einleitung] Dieser Projektabschlussbericht fasst die Ergebnisse des gemeinsamen Projektes der DATAlovers AG, dem Institut der deutschen Wirtschaft und der IW Consult zusammen. Das Konsortium hat sich zusammengeschlossen um zu evaluieren, inwieweit Machine Learning Ansätze in Kombination mit den Inhalten von Unternehmensinternetseiten als primäre Informationsquelle für die wissenschaftliche Forschung angewendet werden können. Die grundlegende Aufgabenstellung des Projektes besteht darin zu klären, ob die Kombination aus den neuen Methoden des maschinellen Lernens und den Texten von Internetseiten bei der Identifikation von unternehmerischen Zwillingen wissenschaftlichen Ansprüchen genügt. Hierbei sollen validierte Informationen zu einer vergleichsweise kleinen Zahl an Unternehmen auf die Gesamtheit der Unternehmen übertragen werden. Bei einem Erfolg des Ansatzes würde dies bedeuten, dass mit einer kostengünstigen Methode Ergebnisse für alle deutschen Unternehmen gewonnen werden können. Durch diese „quasi-Vollerhebung“ würden sich viele weitere Anwendungsmöglichkeiten für ein Forschungsinstitut eröffnen. Die Aufgabenteilung in diesem Projekt gestaltet sich wie folgt: Das Institut der deutschen Wirtschaft und die IW Consult liefern die originären Informationen der Unternehmen und übernehmen die Quantifizierung der Ergebnisse. DATAlovers bringt als originäre Daten die Texte der Internetseiten aller deutschen Unternehmen mit ein, trainiert mit der gesamten Datenmenge einen Algorithmus und bestimmt die Prognosen des Algorithmus. Allgemein stellt die beschriebene Aufgabe ein Klassifizierungsproblem dar: Mit Hilfe einer großen Datenmenge soll für jedes deutsche Unternehmen entschieden werden, ob es zu einer spezifischen Gruppe gehört oder nicht. Für derartige Fragestellungen bietet sich die Verwendung von Machine Learning Methoden an. Da die Zielgrößen (die jeweiligen Gruppen) bekannt sind, ist die Klasse des überwachten maschinellen Lernens (im Gegensatz zum unüberwachten maschinellen Lernen) anzuwenden. Hierzu gehören beispielsweise die Methoden Logistic Regression, Random Forest, Support Vector Machine oder Decision Tree (vgl. ausführlicher dazu Brownlee, 2016; Provost/Fawcett, 2013). Diese Ansätze werden vermehrt in der aktuellen Forschung und in der Statistik eingesetzt. Beispielsweise werden die Methoden von Feuerhake/Dumpert (2016) bei der Klassifizierung von Unternehmen in die deutsche Handwerksstatistik verwendet. Dumpert et al. (2016) verwenden diese Ansätze, um Unternehmen in den sogenannten Dritten Unternehmenssektor einzusortieren, bei Finke et al. (2017) erfolgt eine Zuordnung der Müttereigenschaft bei Frauen und Dumpert/Beck (2017) verwenden diese Methoden zur Klassifikation der Staatsangehörigkeit bei Personen. Des Weiteren werden aktuell Machine Learning Algorithmen verwendet, um Datensätze miteinander zu verknüpfen (z. B. Schild et. al., 2017). Damit lässt sich dieses Projekt, von der Methode her, den aktuellen Ansätzen in der amtlichen Statistik zuordnen.
    Date: 2018
  6. By: Nipun Bansal (Delhi Technological University); Mukul Sachdeva (DTU); Tanisha Mittal (DTU)
    Abstract: CAPTCHAs are computer generated tests in the form of images, audios and object recognition that world can communicate easily and computer systems cannot. Internet sites present users with captchas to set apart human users from false computer programs, often referred to as bots. Their purpose is to obstruct attackers from performing automatic registration, online polling and other such actions. IRCTC, being the website to reserve tickets for Indian railways, one of the biggest railway network, has also employed both image and audio captchas for security purposes. However, the audio captchas used on the website are not effective in distinguishing between humans and bots. Most of the visual CAPTCHAs and some audio CAPTCHAs on different websites have been cracked using various methods of machine learning and we propound an identical idea to examine the security of audio CAPTCHAs on IRCTC website. In this paper, we show that our bot is able to break the IRCTC audio captchas with a success rate of 98%, 96.04% and 80.3% using three different models. Along with breaking the captcha, another python script written by us was able to automate the process of ticket booking. Thus, combining all of it into a single package could result in a system which would login and reserve tickets only by a single click. Travel brokers can easily use such a system for easy and fast booking of tatkal tickets which would lead to commercializing this activity for deriving huge profit from needy travelers.
    Keywords: Audio Captchas, Automatic Speech Recognition, IRCTC, Security, MFCC, Deep Learning
    JEL: L86 C80 D85
    Date: 2018–07
  7. By: Tony ShunTe Yuo
    Abstract: Tourism has become one of the major strategies for urban or city authority to generate competitiveness. The benefits include increasing incoming tourists, conference and conventional business opportunities, foreign direct investments, establishing operational sites or even headquarters. For local citizens, successful tourism could enhance job opportunities, innovations and normally will induce higher infrastructures qualities. Under current trend of IoT and big data applications, smart tourism is meant to be one of the crucial movements of this century. This research established a spatial database of tourism resources of Taiwan and its major competitors from open data sources to evaluate and reveal the gaps between users’ demands and current tourism data supply. This research focused on Taiwan and its surrounding competitive cities, and examining the methodologies and proper open data environment. The results suggested a more user- orientation open data platform should be considered and the 3D visualized measurements of spatial variety could help users to identify and making tourism decisions. The empirical results also reveal some interesting implications to the determinants of hotel room rates.
    Keywords: Agglomeration Economies; Data Mining; Spatial Analysis; Visualizing measurements
    JEL: R3
    Date: 2018–01–01
  8. By: Jeffrey Robertson
    Abstract: Digital diplomacy is the latest technological advance to push change in diplomatic practice. It relates to the application of digital technologies, including information and communication technologies, software engineering and big data, and artificial intelligence, to the practice of diplomacy. Positioned in the top ranks of connectivity, internet speed, smartphone ownership, and social media usage, South Korea should be a leader in the use of digital technologies in diplomatic practice. However, South Korea is not a leader; indeed, it has been left behind. I explore digital diplomacy as a “disruptive technology†and look at criteria for organizational adaptation. I then use these criteria to assess South Korea's adaptation and draw from these the specific policy challenges facing South Korea. To conclude, I propose four core criteria to aid digital diplomacy adaptation in South Korea and other similar states.
    Keywords: digital diplomacy, diplomacy, South Korea, foreign policy, foreign ministry
    Date: 2018–10–08
  9. By: Soyoung Park (Korea Institute of S&T evaluation and planning)
    Abstract: With the rapid development of AI(Artificial Intelligence) technology, new types of accidents that have not happened before are occurring. An autonomous vehicle causes a crash, and a guard robot attacks a child. However, most countries have not established a legal framework for coping with these accidents. If such a situation continues, the legal risk will increase, which will hinder the development and utilization of AI.So it is time for the government to worry about the legislation to prepare for the AI era. To build confidence in AI technology, it is necessary to construct a system for coexistence with existing systems. As a part of this preparation, this study analyzes the legal issues of AI disputes and draws the government 's task in the field of science and technology legislation for the preparation of AI era.
    Keywords: Artificial Intelligence, legislation, liability system
    Date: 2018–07
  10. By: OECD
    Abstract: The Cancun Ministerial mandate on the Digital Economy (2016) highlighted the importance of developing Internet of Things (IoT) metrics to assess the effects of the IoT in different policy areas. Accordingly, this report reviews different definitions of IoT in view of establishing an operational definition for the CDEP work, and proposes a taxonomy for IoT measurement. The report also explores potential challenges for communication infrastructures due to the exponential growth of IoT devices through the application of connected and automated vehicles. This IoT application was chosen as the data transmission requirements of fully automated vehicles may have substantial implications for network infrastructure, and therefore may require prioritisation in terms of measurement.
    Date: 2018–10–23
  11. By: Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Scognamiglio
    Abstract: We use a rich data set consisting of 123,000 houses sold in Switzerland between 2004 and 2017 to investigate different automated valuation techniques in settings where the models are updated regularly. We apply six methods (linear regression, robust regression, mixed effects regression, gradient boosting, random forests, and neural networks) to both moving window and extending window models. With respect to the criteria of appraisal accuracy and stability, the preferred methods are robust regression using moving windows, gradient boosting using extending windows, or mixed effects regression for either strategy.
    Keywords: automated valuation; Machine Learning; Statistics
    JEL: R3
    Date: 2018–01–01
  12. By: Fan Yang and Jian Xu
    Abstract: Many cities around the world are increasingly embedding technological infrastructure in urban spaces. These infrastructures aim to collect vast amounts of data from citizens with an apparent purpose of improving public services. This article discusses privacy concerns generated by China's nationwide smart city campaign and further investigates why China's latest Cybersecurity Law is not adequate to address the risks to citizens' privacy. We argue that there is no functional privacy law in China that would apply to most data collected by smart city infrastructure; nor is there any law that would protect any personal data collected under this framework. We therefore propose practical suggestions to better protect citizens' data in China's ongoing smart city campaign.
    Keywords: big data, China, Cybersecurity Law, privacy, smart cities
    Date: 2018–10–05
  13. By: Marcello Pericoli (Bank of Italy); Marco Taboga (Bank of Italy)
    Abstract: We propose a general method for the Bayesian estimation of nonlinear no-arbitrage term structure models. The main innovations we introduce are: 1) a computationally efficient method, based on deep learning techniques, for approximating no-arbitrage model-implied bond yields to any desired degree of accuracy; and 2) computational graph optimizations for accelerating the MCMC sampling of the model parameters and of the unobservable state variables that drive the short-term interest rate. We apply the proposed techniques for estimating a shadow rate model with a time-varying lower bound, in which the shadow rate can be driven by both spanned unobservable factors and unspanned macroeconomic factors.
    Keywords: yield curve, shadow rate, deep learning, artificial intelligence
    JEL: C32 E43 G12
    Date: 2018–09
  14. By: Beliz, Gustavo; Basco, Ana Inés; de Azevedo, Belisario
    Abstract: In this paper the authors propose that G20 countries endorse and facilitate the creation of a T20 digital platform for "Accelerating the Jobs of the Future". In a world driven by a new wave of technological change, the platform would revalue the role of think tanks, research institutions and knowledge hubs to move the global agenda in an issue of central importance for the future of society: the creation of the jobs of the future. Building on and complementing existing experiences, the T20 platform would be a digital hub for producing knowledge, informing policies and connecting potential partners to accelerate the jobs of the future, within the context of an increasing integrated global economy. It would also contribute to the development of consensual views among the research community, allowing to discard extreme visions about the jobs of the future, dispelling both overly optimistic visions with no evidence base and unwarranted fears.
    Keywords: employment,future,digitalization,technology,industry 4.0,artificial intelligence,skills,inequality,gender gap,G20
    JEL: J01 E24 O30
    Date: 2018

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.