nep-big 2018-07-16 papers

on Big Data

Issue of 2018‒07‒16
fifteen papers chosen by
Tom Coupé
University of Canterbury

ECB vs Bundesbank: Diverging Tones and Policy E ectiveness By Peter Tillmann; Andreas Walter
Non-linear Time Series and Artificial Neural Networks of Red Hat Volatility By Jos\'e Igor Morlanes
Landmines and Spatial Development By Chiovelli, Giorgio; Michalopoulos, Stelios; Papaioannou, Elias
Financial Risk and Returns Prediction with Modular Networked Learning By Carlos Pedro Gon\c{c}alves
Analyzing Business Conditions by Quantitative Text Analysis–Time Series Analysis Using Appearance Rate and Principal Component By Nariyasu YAMAZAWA
Machine Learning for Yield Curve Feature Extraction: Application to Illiquid Corporate Bonds (Preliminary Draft) By Greg Kirczenow; Ali Fathi; Matt Davison
A hybrid econometric-machine learning approach for relative importance analysis: Food inflation By Akash Malhotra
Classifying occupations using web-based job advertisements: an application to STEM and creative occupations By Antonio Lima; Hasan Bakhshi
Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation By Miruna Oprescu; Vasilis Syrgkanis; Zhiwei Steven Wu
China's digital transformation. Why is artificial intelligence a priority for chinese R&D? By Guilhem Fabre
Two Examples of Convex-Programming-Based High-Dimensional Econometric Estimators By Zhan Gao; Zhentao Shi
Secure personal data administration in the social networks: the case of voluntary sharing of personal data on the Facebook By Tadas Limba; Aurimas Šidlauskas
Mapping the Geometry of Law using Document Embeddings By Ash, Elliott; Chen, Daniel L.
Mapping the Geometry of Law using Document Embeddings By Ash, Elliott; Chen, Daniel L.
The Nature of Firm Growth By Benjamin W. Pugsley; Peter Sedlacek; Vincent Sterk

ECB vs Bundesbank: Diverging Tones and Policy E ectiveness

By:	Peter Tillmann (Justus-Liebig-University Giessen); Andreas Walter (Justus-Liebig-University Giessen)
Abstract:	The present paper studies the consequences of con flicting narratives for the transmission of monetary policy shocks. We focus on con flict between the presidents of the ECB and the Bundesbank, the main protagonists of monetary policy in the euro area, who often disagreed on policy over the past two decades. This con flict received much attention on financial markets. We use over 900 speeches of both institutions' presidents since 1999 and quantify the tone conveyed in speeches and the divergence of tone among both both presidents. We find (i) a drop towards more negative tone in 2009 for both institutions and (ii) a large divergence of tone after 2009. The ECB communication becomes persistently more optimistic and less uncertain than the Bundesbank's after 2009, and this gap widens after the SMP, OMT and APP announcements. We show that long-term interest rates respond less strongly to a monetary policy shock if ECB-Bundesbank communication is more cacophonous than on average, in which case the ECB loses its ability to drive the slope of the yield curve. The weaker transmission under high divergence re ects a muted adjustment of the expectations component of long-term rates.
Keywords:	Central bank communication, diverging tones, speeches, text analysis, monetary transmission
JEL:	E52 E43 E32
Date:	2018
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:201820&r=big

Non-linear Time Series and Artificial Neural Networks of Red Hat Volatility

By:	Jos\'e Igor Morlanes
Abstract:	We extend the empirical results published in article "Empirical Evidence on Arbitrage by Changing the Stock Exchange" by means of machine learning and advanced econometric methodologies based on Smooth Transition Regression models and Artificial Neural Networks.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.01070&r=big

Landmines and Spatial Development

By:	Chiovelli, Giorgio; Michalopoulos, Stelios; Papaioannou, Elias
Abstract:	Landmine contamination affects the lives of millions in many conflict-ridden countries long after the cessation of hostilities. Yet, little research exists on its impact on post-conflict recovery. In this study, we explore the economic consequences of landmine clearance in Mozambique, the only country that has moved from "heavily-contaminated" in 1992 to "mine-free" status in 2015. First, we compile a dataset detailing the evolution of clearance, collecting thousands of reports from the numerous demining actors. Second, we exploit the timing of demining to assess its impact on local economic activity, as reflected in satellite images of light density at night. The analysis reveals a moderate positive association that masks sizeable heterogeneity. Economic activity responds strongly to clearance of the transportation network, trade hubs, and more populous areas, while the demining-development association is weak in rural areas of low population density. Third, recognizing that landmine removal reconË figured the accessibility to the transportation infrastructure, we apply a "market-access" approach to quantify both its direct and indirect effects. The market-access estimates reveal substantial improvements on aggregate economic activity. The market-access benefits of demining are also present in localities without any contamination. Fourth, counterfactual policy simulations project considerable gains had the fragmented process of clearance in Mozambique been centrally coordinated, prioritizing clearance of the colonial transportation routes.
Keywords:	Civil War; infrastructure network; landmines; post-conflict recovery; Trade
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:13021&r=big

Financial Risk and Returns Prediction with Modular Networked Learning

By:	Carlos Pedro Gon\c{c}alves
Abstract:	An artificial agent for financial risk and returns' prediction is built with a modular cognitive system comprised of interconnected recurrent neural networks, such that the agent learns to predict the financial returns, and learns to predict the squared deviation around these predicted returns. These two expectations are used to build a volatility-sensitive interval prediction for financial returns, which is evaluated on three major financial indices and shown to be able to predict financial returns with higher than 80% success rate in interval prediction in both training and testing, raising into question the Efficient Market Hypothesis. The agent is introduced as an example of a class of artificial intelligent systems that are equipped with a Modular Networked Learning cognitive system, defined as an integrated networked system of machine learning modules, where each module constitutes a functional unit that is trained for a given specific task that solves a subproblem of a complex main problem expressed as a network of linked subproblems. In the case of neural networks, these systems function as a form of an "artificial brain", where each module is like a specialized brain region comprised of a neural network with a specific architecture.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.05876&r=big

Analyzing Business Conditions by Quantitative Text Analysis–Time Series Analysis Using Appearance Rate and Principal Component

By:	Nariyasu YAMAZAWA
Abstract:	We present a procedure for analyzing the current business conditions and forecasting GDP growth rate by quantitative text analysis. We use text data of Economy Watcher Survey conducted by Cabinet Office. We extract words from 190 thousands sentence, and construct time series data by counting appearance rate every month. The analyses consist of four parts: (1) visualizing appearance rate by drawing graphs, (2) correlation analysis, (3) principal component analysis, and (4) forecasting GDP growth rate. First, we draw graphs of the appearance rate of words which are influenced by business conditions. We find that the graphs show the effect of policy on business conditions clearly. Second, we construct word lists which correlate business conditions by computing correlation coefficients. And we also construct lists which reversely correlate business conditions. Third, we extract principal component from 150 frequent words. We find that the 1st principal component move together with business conditions. The last, we forecast quarterly real GDP growth rate by text data. We find that forecast accuracy improved by adding the text data. It shows that text data have useful information about GDP forecasting.
Date:	2018–03
URL:	http://d.repec.org/n?u=RePEc:esj:esridp:345&r=big

Machine Learning for Yield Curve Feature Extraction: Application to Illiquid Corporate Bonds (Preliminary Draft)

By:	Greg Kirczenow; Ali Fathi; Matt Davison
Abstract:	This paper studies the application of machine learning in extracting the market implied features from historical risk neutral corporate bond yields. We consider the example of a hypothetical illiquid fixed income market. After choosing a surrogate liquid market, we apply the Denoising Autoencoder algorithm from the field of computer vision and pattern recognition to learn the features of the missing yield parameters from the historically implied data of the instruments traded in the chosen liquid market. The results of the trained machine learning algorithm are compared with the outputs of a point in- time 2 dimensional interpolation algorithm known as the Thin Plate Spline. Finally, the performances of the two algorithms are compared.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.01731&r=big

A hybrid econometric-machine learning approach for relative importance analysis: Food inflation

By:	Akash Malhotra
Abstract:	A measure of relative importance of variables is often desired by researchers when the explanatory aspects of econometric methods are of interest. To this end, the author briefly reviews the limitations of conventional econometrics in constructing a reliable measure of variable importance. The author highlights the relative stature of explanatory and predictive analysis in economics and the emergence of fruitful collaborations between econometrics and computer science. Learning lessons from both, the author proposes a hybrid approach based on conventional econometrics and advanced machine learning (ML) algorithms, which are otherwise, used in predictive analytics. The purpose of this article is two-fold, to propose a hybrid approach to assess relative importance and demonstrate its applicability in addressing policy priority issues with an example of food inflation in India, followed by a broader aim to introduce the possibility of conflation of ML and conventional econometrics to an audience of researchers in economics and social sciences, in general.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.04517&r=big

Classifying occupations using web-based job advertisements: an application to STEM and creative occupations

By:	Antonio Lima; Hasan Bakhshi
Abstract:	Rapid technological, social and economic change is having significant impacts on the nature of jobs. In fast-changing environments it is crucial that policymakers have a clear and timely picture of the labour market. Policymakers use standardised occupational classifications, such as the Office for National Statisticsâ€™ Standard Occupational Classification (SOC) in the UK to analyse the labour market. These permit the occupational composition of the workforce to be tracked on a consistent and transparent basis over time and across industrial sectors. However, such systems are by their nature costly to maintain, slow to adapt and not very flexible. For that reason, additional tools are needed. At the same time, policymakers over the world are revisiting how active skills development policies can be used to equip workers with the capabilities needed to meet the new labour market realities. There is in parallel a desire for more granular understandings of what skills combinations are required of occupations, in part so that policymakers are better sighted on how individuals can redeploy these skills as and when employer demands change further. In this paper, we investigate the possibility of complementing traditional occupational classifications with more flexible methods centred around employersâ€™ characterisations of the skills and knowledge requirements of occupations as presented in job advertisements. We use data science methods to classify job advertisements as STEM or non-STEM (Science, Technology, Engineering and Mathematics) and creative or non-creative, based on the content of ads in a database of UK job ads posted online belonging to Boston-based job market analytics company, Burning Glass Technologies. In doing so, we first characterise each SOC code in terms of its skill make-up; this step allows us to describe each SOC skillset as a mathematical object that can be compared with other skillsets. Then we develop a classifier that predicts the SOC code of a job based on its required skills. Finally, we develop two classifiers that decide whether a job vacancy is STEM/non-STEM and creative/non-creative, based again on its skill requirements.
Keywords:	labour demand, occupational classification, online job adverts, big data, machine learning, STEM, STEAM, creative economy
JEL:	C18 J23 J24
Date:	2018–07
URL:	http://d.repec.org/n?u=RePEc:nsr:escoed:escoe-dp-2018-07&r=big

Orthogonal Random Forest for Heterogeneous Treatment Effect Estimation

By:	Miruna Oprescu; Vasilis Syrgkanis; Zhiwei Steven Wu
Abstract:	We study the problem of estimating heterogeneous treatment effects from observational data, where the treatment policy on the collected data was determined by potentially many confounding observable variables. We propose orthogonal random forest1, an algorithm that combines orthogonalization, a technique that effectively removes the confounding effect in two-stage estimation, with generalized random forests [Athey et al., 2017], a flexible method for estimating treatment effect heterogeneity. We prove a consistency rate result of our estimator in the partially linear regression model, and en route we provide a consistency analysis for a general framework of performing generalized method of moments (GMM) estimation. We also provide a comprehensive empirical evaluation of our algorithms, and show that they consistently outperform baseline approaches.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.03467&r=big

China's digital transformation. Why is artificial intelligence a priority for chinese R&D?

By: Guilhem Fabre (EHESS - Ecole des Hautes Etudes en Sciences Sociales)

Date: 2018–06–19

URL: http://d.repec.org/n?u=RePEc:hal:wpaper:halshs-01818508&r=big

Two Examples of Convex-Programming-Based High-Dimensional Econometric Estimators

By:	Zhan Gao; Zhentao Shi
Abstract:	Economists specify high-dimensional models to address heterogeneity in empirical studies with complex big data. Estimation of these models calls for optimization techniques to handle a large number of parameters. Convex problems can be effectively executed in modern statistical programming languages. We complement Koenker and Mizera (2014)'s work on numerical implementation of convex optimization, with focus on high-dimensional econometric estimators. In particular, we replicate the simulation exercises in Su, Shi, and Phillips (2016) and Shi (2016) to show the robust performance of convex optimization cross platforms. Combining R and the convex solver MOSEK achieves faster speed and equivalent accuracy as in the original papers. The convenience and reliability of convex optimization in R make it easy to turn new ideas into prototypes.
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1806.10423&r=big

Secure personal data administration in the social networks: the case of voluntary sharing of personal data on the Facebook

By:	Tadas Limba (Mykolas Romeris University); Aurimas Šidlauskas (Mykolas Romeris University)
Abstract:	In view of the changes taking place in society, social progress and the achievements of science and technology, the protection of fundamental rights must be strengthened. The aim of the article is to analyse the principles and peculiarities of safe management of the personal data in social networks. In this scientific article, methods of document analysis, scientific literature review, case study and generalization are used. Consumers themselves decide how much and what kind of information to publicize on the Facebook social network. In order to use the third-party applications, users at the time of authorization must confirm that they agree to give access to their personal data otherwise the service will not be provided. Personal data of the Facebook user comprise his/her public profile including user's photo, age, gender, and other public information; a list of friends; e-mail mail; time zone records; birthday; photos; hobbies, etc. Which personal data will be requested from the user depends on the third-party application. Analysis of the legal protection of personal data in the internet social networks reveals that it is limited to the international and European Union legal regulation on protection of the personal data in the online social networks. Users who make publicly available a large amount of personal information on the Facebook social network should decide on the issue if they want to share that information with third parties for the use of their services (applications). This article presents a model for user and third party application interaction, and an analysis of risks and recommendations to ensure the security of personal data of the user.
Keywords:	security of the data,social network,personal data,third-party applications
Date:	2018–03–30
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-01773973&r=big

Mapping the Geometry of Law using Document Embeddings

By:	Ash, Elliott; Chen, Daniel L.
Abstract:	Recent work in natural language processing represents language objects (words and documents) as dense vectors that encode the relations between those objects. This paper explores the application of these methods to legal language, with the goal of understanding judicial reasoning and the relations between judges. In an application to federal appellate courts, we show that these vectors encode information that distinguishes courts, time, and legal topics. The vectors do not reveal spatial distinctions in terms of political party or law school attended, but they do highlight generational differences across judges. We conclude the paper by outlining a range of promising future applications of these methods.
Date:	2018–07
URL:	http://d.repec.org/n?u=RePEc:tse:iastwp:32766&r=big

Mapping the Geometry of Law using Document Embeddings

By:	Ash, Elliott; Chen, Daniel L.
Abstract:	Recent work in natural language processing represents language objects (words and documents) as dense vectors that encode the relations between those objects. This paper explores the application of these methods to legal language, with the goal of understanding judicial reasoning and the relations between judges. In an application to federal appellate courts, we show that these vectors encode information that distinguishes courts, time, and legal topics. The vectors do not reveal spatial distinctions in terms of political party or law school attended, but they do highlight generational differences across judges. We conclude the paper by outlining a range of promising future applications of these methods.
Date:	2018–07
URL:	http://d.repec.org/n?u=RePEc:tse:wpaper:32764&r=big

The Nature of Firm Growth

By:	Benjamin W. Pugsley; Peter Sedlacek; Vincent Sterk
Abstract:	Only half of all startups survive past the age of five and surviving businesses grow at vastly different speeds. Using micro data on employment in the population of U.S. Businesses, we estimate that the lion's share of these differences is driven by ex-ante heterogeneity across firms, rather than by ex-post shocks. We embed such heterogeneity in a firm dynamics model and study how ex-ante differences shape the distribution of firm size, "up-or-out" dynamics, and the associated gains in aggregate output. "Gazelles" - a small subset of startups with particularly high growth potential - emerge as key drivers of these outcomes. Analyzing changes in the distribution of ex-ante firm heterogeneity over time reveals that the birth rate and growth potential of gazelles has declined, creating substantial aggregate losses.
Keywords:	Firm Dynamics, Startups, Macroeconomics, Big Data
JEL:	D22 E23 E24
Date:	2018–06
URL:	http://d.repec.org/n?u=RePEc:cen:wpaper:18-30&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Guilhem Fabre (EHESS - Ecole des Hautes Etudes en Sciences Sociales)
Date:	2018–06–19
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:halshs-01818508&r=big