nep-big 2018-03-12 papers

on Big Data

Issue of 2018–03–12
eleven papers chosen by
Tom Coupé, University of Canterbury

The Hidden Information Content: Evidence from the Tone of Independent Director Reports By Jiao Ji; Oleksandr Talavera; Shuxing Yin
Nowcasting economic activity with electronic payments data: A predictive modeling approach By Carlos León; Fabio Ortega
FISS - News- Based Indices on Country Fundamentals: Do They Help Explain Sovereign Credit Spread Fluctuations? By András Fülöp; Zalán Kocsis
Public Communication and Collusion in the Airline Industry By Aryal, Gaurab; Ciliberto, Federico; Leyden, Benjamin
A relative measure of urban sprawl for Italian municipalities using satellite Light Images By Bergantino, Angela Stefania; Di Liddo, Giuseppe; Porcelli,Francesco
Record Linkage in the Cape of Good Hope Panel By Rijpma, Auke; Cilliers, Jeanne; Fourie, Johan
Credit Risk Analysis using Machine and Deep learning models By Peter Martey Addo; Dominique Guegan; Bertrand Hassani
The Future of Retail Financial Services: What policy mix for a balanced digital transformation? By Bouyon, Sylvain
The Use of Individual Firm Databases to Respond to the Limits of Spatially Aggregated Databases :The Case of the Estimation of Regional Exports By Moritz Lennert
Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology: Connections By Chang, C-L.; McAleer, M.J.; Wong, W.-K.
Pronósticos para la tasa de desempleo en Colombia a partir de Google Trends By Luisa Fernanda CARDONA ROJAS; Javier Andrés ROJAS AGUILERA

The Hidden Information Content: Evidence from the Tone of Independent Director Reports

By:	Jiao Ji (Management School, University of Sheffield); Oleksandr Talavera (School of Management, Swansea University); Shuxing Yin (Management School, University of Sheffield)
Abstract:	The paper investigates the link between the information content of independent directors’ re-ports (IDRs) and future firm performance. By conducting sentiment analysis of 23,984 IDRs of the Chinese listed companies from 2004-2012, we find that the tone of IDRs is positively related with future firm performance. We also posit that the tone of IDRs and its association with firm performance depends on director’s incentives to monitor. Our results suggest that independent directors with greater career concerns (i.e., young directors, an expert in ac-counting or finance) are more critical in evaluating firm fundamentals and express more neg-ative tone in their reports. The relationship between the negative tone of IDRs and future firm performance is stronger for firms with greater monitoring needs. Overall, our evidence is consistent with the conjecture that career concerns motivate independent directors to dissem-inate information to external stakeholders.
Keywords:	Brexit, Text Analysis, Tone, Independent Director Report, Corporate Governance
JEL:	G30
Date:	2018–03–05
URL:	https://d.repec.org/n?u=RePEc:swn:wpaper:2018-28

Nowcasting economic activity with electronic payments data: A predictive modeling approach

By:	Carlos León; Fabio Ortega (Banco de la República de Colombia; Banco de la República de Colombia)
Abstract:	Economic activity nowcasting (i.e. making current-period estimates) is convenient because most traditional measures of economic activity come with substantial lags. We aim at nowcasting ISE, a short-term economic activity indicator in Colombia. Inputs are ISE’s lags and a dataset of payments made with electronic transfers and cheques among individuals, firms, and the central government. Under a predictive modeling approach, we employ a nonlinear autoregressive exogenous neural network model. Results suggest that our choice of inputs and predictive method enable us to nowcast economic activity with fair accuracy. Also, we validate that electronic payments data significantly reduces the nowcast error of a benchmark non-linear autoregressive neural network model. Nowcasting economic activity from electronic payment instruments data not only contributes to agents’ decision making and economic modeling, but also supports new research paths on how to use retail payments data for appending current models. Classification JEL: C45, C53, E27
Keywords:	forecasting, machine learning, neural networks, retail payments, NARX.
Date:	2018–02
URL:	https://d.repec.org/n?u=RePEc:bdr:borrec:1037

FISS - News- Based Indices on Country Fundamentals: Do They Help Explain Sovereign Credit Spread Fluctuations?

By:	András Fülöp (ESSEC Business School); Zalán Kocsis (Magyar Nemzeti Bank (Central Bank of Hungary))
Abstract:	This paper revisits the discussion about the role that fundamentals play in asset prices using sovereign credit spread data. We augment the standard macroeconomic proxy set by text-based measures of country and global fundamentals from a database of Reuters news articles between 2007 and 2016. We use a novel methodology that matches fundamental topic expressions and directly links them to tonality and geography information within the text. Our approach resolves several problems of extant text mining methods. We verify that our news indices capture fundamental information within news articles and are uncorrelated with measures of liquidity and investor sentiment. These news indices explain a large part of sovereign credit spread changes not captured by traditional fundamental proxies and thus support a significantly larger role for fundamentals. This additional information derives primarily from omitted expectations and concerns about global fundamentals. We also show that a large part of the covariance between the VIX index and sovereign spreads is related to these global fundamentals.
Keywords:	Financial media, textual data, regular expressions, sovereign credit risk.
JEL:	C8 E44 F34 G1 H63
Date:	2018
URL:	https://d.repec.org/n?u=RePEc:mnb:wpaper:2018/1

Public Communication and Collusion in the Airline Industry

By:	Aryal, Gaurab; Ciliberto, Federico; Leyden, Benjamin
Abstract:	We investigate whether the top management of all legacy U.S. airlines used their quarterly earnings calls as a mode of communication with other airlines to coordinate output reduction (fewer passenger seats) on competitive routes. We build an original and novel dataset on the public communication content from the earnings calls, and use Natural Language Processing techniques from computational linguistics to parse and code the text from earnings calls by airline executives to measure communication. Then we determine if mentioning terms associated with ``capacity discipline'' is a way to sustain collusion on capacity. The estimates show that when all legacy carriers in a market communicate ``capacity discipline,'' it leads to a substantial reduction in the number of seats offered in the market. We find that the effect is driven entirely by legacy carriers, and also that the reduction is larger in smaller markets. Finally, we leverage our high-dimensional text data to develop novel approaches to implement falsification tests and check conditional exogeneity, and confirm that our finding ---legacy airlines use public communication regarding capacity discipline to collude ---is not spurious.
Keywords:	Airlines; Capacity Discipline; Collusion; communication; Text Data
JEL:	D22 L12 L41 L68
Date:	2018–02
URL:	https://d.repec.org/n?u=RePEc:cpr:ceprdp:12730

A relative measure of urban sprawl for Italian municipalities using satellite Light Images

By:	Bergantino, Angela Stefania; Di Liddo, Giuseppe; Porcelli,Francesco
Abstract:	At the local level, the lower the urban density, the higher the per-capita length of collector roads and the area covered by buildings and infrastructures. It follows that the lower the urban density, the higher the municipal luminosity. For this reason, night-time light is often used in order to evaluate the degree of urbanization and urban sprawl in a specific territory by means of specific indicators. However, to the best of our knowledge, these indicators are based on an absolute evaluation of the urban sprawl, without taking into account the peculiar economic and demographic characteristics of the urban centres. In this paper we propose a regression-based measure of urban sprawl “relative” to the economic activity and to other socio-demographic characteristics of municipalities. We apply this methodology to the Italian context, considering all Italian municipalities inside the 15 ordinary regions over the period 2004-2012. The measure we propose, thus, takes into account also a time element.
Date:	2018
URL:	https://d.repec.org/n?u=RePEc:sit:wpaper:18_3

Record Linkage in the Cape of Good Hope Panel

By:	Rijpma, Auke (Utrecht University); Cilliers, Jeanne (Department of Economic History, Lund University); Fourie, Johan (Stellenbosch University)
Abstract:	In this paper we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787-1828, a dataset of 42,354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84 percent of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.
Keywords:	census; machine learning; micro-data; record linkage; panel data; South Africa
JEL:	C81 N01
Date:	2018–02–28
URL:	https://d.repec.org/n?u=RePEc:hhs:luekhi:0172

Credit Risk Analysis using Machine and Deep learning models

By:	Peter Martey Addo (Data Scientist (Lead), Expert Synapses, SNCF Mobilite); Dominique Guegan (Université Paris1 Panthéon-Sorbonne, Centre d'Economie de la Sorbonne, LabEx ReFi and Ca' Foscari University of Venezia); Bertrand Hassani (VP, Chief Data Scientist, Capgemini Consulting and LabEx ReFi)
Abstract:	Due to the hyper technology associated to Big Data, data availability and computing power, most banks or lending financial institutions are renewing their business models. Credit risk predictions, monitoring, model reliability and effective loan processing are key to decision making and transparency. In this work, we build binary classifiers based on machine and deep learning models on real data in predicting loan default probability. The top 10 important features from these models are selected and then used in the modelling process to test the stability of binary classifiers by comparing performance on separate data. We observe that tree-based models are more stable than models based on multilayer artificial neural networks. This opens several questions relative to the intensive used of deep learning systems in the enterprises
Keywords:	Credit risk; Financial regulation; Data Science; Bigdata; Deep learning
JEL:	C02 C13 C19 G01 G21 G28 D81 G31
Date:	2018–02
URL:	https://d.repec.org/n?u=RePEc:mse:cesdoc:18003

The Future of Retail Financial Services: What policy mix for a balanced digital transformation?

By:	Bouyon, Sylvain
Abstract:	In recent years, the digitalisation of retail financial services – retail payments, current/savings accounts, consumer/housing credit, car insurance, property insurance and health insurance – has accelerated significantly. While policy-makers are gradually creating the necessary conditions to strengthen this digital transformation, there remain numerous policy issues and unanswered questions to resolve. Against this background, CEPS-ECRI formed a Task Force to explore four specific core questions: What type of level playing field is needed to ensure a successful transition to the digital transformation? What are the opportunities and risks related to big (alternative) data and increasingly sophisticated algorithms? What kind of regulatory framework is the most appropriate for pre-contractual information duties in a digital era? How can the regulatory framework for digital authentication be improved?
Date:	2017–02
URL:	https://d.repec.org/n?u=RePEc:eps:cepswp:12265

The Use of Individual Firm Databases to Respond to the Limits of Spatially Aggregated Databases :The Case of the Estimation of Regional Exports

By:	Moritz Lennert
Abstract:	La thèse explore l'opportunité d'utiliser des micro-données, sous la forme de données individuelles de firmes, pour dépasser les limites imposées par les données spatialement agrégées généralement utilisées en géographie économique. Le cas d'étude est l'estimation des exportations régionales, y compris les exportations vers d'autres régions du même pays. Prenant la Belgique comme exemple, la thèse présente un nouveau modèle d'estimation de ces exportations qui intègre un modèle gravitaire d'estimation des flux entre lieux de production et lieux de consommation avec les informations contenues dans les tables d'entrée-sortie à l'échelle nationale. Les résultats du modèle confirment l'hypothèse de départ sur l'importance de la consommation locale ou à courte distance du lieu de production.Ces résultats sont analysés devant l'arrière-fond des débats passés et actuels en géographie économique et en politique de développement régional en Europe. Un regard critique est posé sur la notion des politiques « place-based », généralement focalisées sur des politiques de l'offre. Avec le soutien des estimations sortant du modèle l'argument est avancé que la demande locale joue un rôle important pour les économies régionales. Cet argument est renforcé par une revue des débats concernant l'importance de la distance géographique dans les relations économiques.La thèse met également un grand accent sur les questions de méthodes et de données. Elle présente en détail les difficultés liées à l'utilisation de données individuelles, notamment le problème du géocodage. L'utilisation de système d'information géographiques existants dans la construction du modèle est montré, argumentant que de tels systèmes facilitent la vie aux chercheurs en géographie économique dès lors qu'ils utilisent des données massives positionnées dans l'espace réel. L'utilisation de telles données est aussi analysée dans le contexte de la naissance du mouvement du « Big Data » qui pose des questions sur les paradigmes actuels et futurs de la recherche en géographie économique.
Keywords:	exportations régionales; modèle gravitaire; micro-données; méthodes SIG
Date:	2018–02–23
URL:	https://d.repec.org/n?u=RePEc:ulb:ulbeco:2013/267467

Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology: Connections

By:	Chang, C-L.; McAleer, M.J.; Wong, W.-K.
Abstract:	The paper provides a review of the literature that connects Big Data, Computational Science, Economics, Finance, Marketing, Management, and Psychology, and discusses some research that is related to the seven disciplines. Academics could develop theoretical models and subsequent econometric and statistical models to estimate the parameters in the associated models, as well as conduct simulation to examine whether the estimators in their theories on estimation and hypothesis testing have good size and high power. Thereafter, academics and practitioners could apply theory to analyse some interesting issues in the seven disciplines and cognate areas.
Keywords:	Big Data, Computational science, Economics, Finance, Management, Theoretical, models, Econometric and statistical models, Applications
JEL:	A10 G00 G31 O32
Date:	2018–01–01
URL:	https://d.repec.org/n?u=RePEc:ems:eureir:104260

Pronósticos para la tasa de desempleo en Colombia a partir de Google Trends

By:	Luisa Fernanda CARDONA ROJAS; Javier Andrés ROJAS AGUILERA
Abstract:	Este trabajo propone utilizar la información de la plataforma Google Trends para mejorar las predicciones de corto plazo para la tasa de desempleo en Colombia. Para esto, se seleccionan los términos de búsqueda de Google Trends que más se relacionan con la tasa de desempleo, siguiendo un criterio de correlación simple y de coherencia media entre estas series. Una vez seleccionados los términos, se estiman modelos de regresión lineal simple, modelos autoregresivos integrados de media móvil (ARIMA) y su versión ampliada por variables exógenas (ARIMAX). Se encuentra que el volumen de consultas mejora el ajuste de los modelos y en particular que las búsquedas de los términos “Trabajo”, “Ofertas de trabajo” y “Busco trabajo” mejoran los pronósticos del comportamiento del mercado laboral.
Keywords:	Desempleo, Google Trends, regresiones dinámicas, ARIMA
Date:	2017–12–19
URL:	https://d.repec.org/n?u=RePEc:col:000118:016050

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.