nep-big 2020-12-07 papers

on Big Data

Issue of 2020‒12‒07
fourteen papers chosen by
Tom Coupé
University of Canterbury

Interpreting Big Data in the Macro Economy: A Bayesian Mixed Frequency Estimator By David Kohns; Arnab Bhattacharjee
Financial Conditions and Economic Activity: Insights from Machine Learning By Michael T. Kiley
Application of deep quantum neural networks to finance By Takayuki Sakuma
Nowcasting business cycle turning points with stock networks and machine learning By Azqueta-Gavaldon, Andres; Hirschbühl, Dominik; Onorante, Luca; Saiz, Lorena
Deep Neural Networks and Neuro-Fuzzy Networks for Intellectual Analysis of Economic Systems By Alexey Averkin; Sergey Yarushev
Understanding the Distributional Aspects of Microcredit Expansions By Melvyn Weeks; Tobias Gabel Christiansen
Fertile Soil for Intrapreneurship: Impartial Institutions and Human Capital By Ljunge, Martin; Stenkula, Mikael
Predicting county-scale maize yields with publicly available data By Jiang, Zehui; Liu, Chao; Ganapathysubramanian, Baskar; Hayes, Dermot J.; Sarkar, Soumik
Predicting the price of second-hand vehicles using data mining techniques By Jafari Kang, Masood; Zohoori, Sepideh; Abbasi, Elahe; Li, Yueqing; Hamidi, Maryam
New(spaper) Evidence of a Reduction in Suicide Mentions during the 19th‐century US Gold Rush By Kronenberg, Christoph
Text-Based Linkages and Local Risk Spillovers in the Equity Market By Ge, S.
Predicting Disaggregated CPI Inflation Components via Hierarchical Recurrent Neural Networks By Oren Barkan; Itamar Caspi; Allon Hammer; Noam Koenigstein
Giving by taking away: big tech, data colonialism and the reconfiguration of social good By Viera Magalhães, João; Couldry, Nick
“No Central Stage”: Telegram-based activity during the 2019 protests in Hong Kong By Urman, Aleksandra; Ho, Justin Chun-ting; Katz, Stefan

Interpreting Big Data in the Macro Economy: A Bayesian Mixed Frequency Estimator

By:	David Kohns; Arnab Bhattacharjee (Centre for Energy Economics Research and Policy, Heriot-Watt University)
Abstract:	More and more are Big Data sources, such as Google Trends, being used to augment nowcast models. An often neglected issue within the previous literature, which is especially pertinent to policy environments, is the interpretability of the Big Data source included in the model. We provide a Bayesian modeling framework which is able to handle all usual econometric issues involved in combining Big Data with traditional macroeconomic time series such as mixed frequency and ragged edges, while remaining computationally simple and allowing for a high degree of interpretability. In our model, we explicitly account for the possibility that the Big Data and macroeconomic data set included have different degreesof sparsity. We test our methodology by investigating whether Google trends in real time increase nowcast fit of US real GDP growth compared to traditional macroeconomic time series. We find that search terms improve performance of both point forecast accuracy as well as forecast density calibration not only before official information is released but alsolater into GDP reference quarters. Our transparent methodology shows that the increased fit stems from search terms acting as early warning signals to large turning points in GDP.
Keywords:	Big Data; Machine Learning; Interpretability; Illusion of Sparsity; Density Nowcast; Google Search Terms
JEL:	C31 C53
Date:	2019–10
URL:	http://d.repec.org/n?u=RePEc:hwc:wpaper:010&r=all

Financial Conditions and Economic Activity: Insights from Machine Learning

By:	Michael T. Kiley
Abstract:	Machine learning (ML) techniques are used to construct a financial conditions index (FCI). The components of the ML-FCI are selected based on their ability to predict the unemployment rate one-year ahead. Three lessons for macroeconomics and variable selection/dimension reduction with large datasets emerge. First, variable transformations can drive results, emphasizing the need for transparency in selection of transformations and robustness to a range of reasonable choices. Second, there is strong evidence of nonlinearity in the relationship between financial variables and economic activityâ€”tight financial conditions are associated with sharp deteriorations in economic activity and accommodative conditions are associated with only modest improvements in activity. Finally, the ML-FCI places sizable weight on equity prices and term spreads, in contrast to other measures. These lessons yield an ML-FCI showing tightening in financial conditions before the early 1990s and early 2000s recessions, in contrast to the National Financial Conditions Index (NFCI).
Keywords:	Big Data; Recession Prediction; Variable Selection
JEL:	E50 E17 C55 E44
Date:	2020–11–16
URL:	http://d.repec.org/n?u=RePEc:fip:fedgfe:2020-95&r=all

Application of deep quantum neural networks to finance

By:	Takayuki Sakuma
Abstract:	Use of the deep quantum neural network proposed by Beer et al. (2020) could grant new perspectives on solving numerical problems arising in the field of finance. We discuss this potential in the context of simple experiments such as learning implied volatilites and differential machine proposed by Huge and Savine (2020). The deep quantum neural network is considered to be a promising candidate for developing highly powerful methods in finance.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2011.07319&r=all

Nowcasting business cycle turning points with stock networks and machine learning

By:	Azqueta-Gavaldon, Andres; Hirschbühl, Dominik; Onorante, Luca; Saiz, Lorena
Abstract:	We propose a granular framework that makes use of advanced statistical methods to approximate developments in economy-wide expected corporate earnings. In particular, we evaluate the dynamic network structure of stock returns in the United States as a proxy for the transmission of shocks through the economy and identify node positions (firms) whose connectedness provides a signal for economic growth. The nowcasting exercise, with both the in-sample and the out-of-sample consistent feature selection, highlights which firms are contemporaneously exposed to aggregate downturns and provides a more complete narrative than is usually provided by more aggregate data. The two-state model for predicting periods of negative growth can remarkably well predict future states by using information derived from the node-positions of manufacturing, transportation and financial (particularly insurance) firms. The three-states model, which identifies high, low and negative growth, successfully predicts economic regimes by making use of information from the financial, insurance, and retail sectors. JEL Classification: C45, C51, D85, E32, N1
Keywords:	early warning signal, Granger-causality networks, real-time, turning point prediction
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:ecb:ecbwps:20202494&r=all

Deep Neural Networks and Neuro-Fuzzy Networks for Intellectual Analysis of Economic Systems

By:	Alexey Averkin; Sergey Yarushev
Abstract:	In tis paper we consider approaches for time series forecasting based on deep neural networks and neuro-fuzzy nets. Also, we make short review of researches in forecasting based on various models of ANFIS models. Deep Learning has proven to be an effective method for making highly accurate predictions from complex data sources. Also, we propose our models of DL and Neuro-Fuzzy Networks for this task. Finally, we show possibility of using these models for data science tasks. This paper presents also an overview of approaches for incorporating rule-based methodology into deep learning neural networks.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2011.05588&r=all

Understanding the Distributional Aspects of Microcredit Expansions

By:	Melvyn Weeks; Tobias Gabel Christiansen
Abstract:	Various poverty reduction strategies are being implemented in the pursuit of eliminating extreme poverty. One such strategy is increased access to microcredit in poor areas around the world. Microcredit, typically defined as the supply of small loans to underserved entrepreneurs that originally aimed at displacing expensive local money-lenders, has been both praised and criticized as a development tool (Banerjee et al., 2015b). This paper presents an analysis of heterogeneous impacts from increased access to microcredit using data from three randomised trials. In the spirit of recognising that in general the impact of a policy intervention varies conditional on an unknown set of factors, particular, we investigate whether heterogeneity presents itself as groups of winners and losers, and whether such subgroups share characteristics across RCTs. We find no evidence of impacts, neither average nor distributional, from increased access to microcredit on consumption levels. In contrast, the lack of average effects on profits seems to mask heterogeneous impacts. The findings are, however, not robust to the specific machine learning algorithm applied. Switching from the better performing Elastic Net to the worse performing Random Forest leads to a sharp increase in the variance of the estimates. In this context, methods to evaluate the relative performing machine learning algorithm developed by Chernozhukov et al. (2019) provide a disciplined way for the analyst to counter the uncertainty as to which algorithm to deploy.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2011.10509&r=all

Fertile Soil for Intrapreneurship: Impartial Institutions and Human Capital

By:	Ljunge, Martin (Research Institute of Industrial Economics (IFN)); Stenkula, Mikael (Research Institute of Industrial Economics (IFN))
Abstract:	Intrapreneurs, entrepreneurial employees, constitute an important force behind innovations in the economy. Yet, what factors that promote intrapreneurship at the country level are an underdeveloped research area. This paper provides a seminal contribution regarding the methodological approach and the broad set of potential explanatory factors studied. Based on machine-learning techniques (LASSO and EBA methods), we investigate the influence of over 60 factors capturing institutional, demographic, cultural, and developmental factors. We find that the quality of government measured as impartiality, i.e., that the political institutions treat the citizens in a non-discriminatory fashion and do not favor some groups or individuals, and the level of human capital, measured as the average years of schooling, are the most important factors predicting the level of intrapreneurship across countries. Instrumental variable results support a causal interpretation. The findings emphasize the importance of policy to establish well-functioning and impartial institutions as well as to promote higher education.
Keywords:	Intrapreneurship; Impartial institutions; Human capital; Machine-learning
JEL:	E02 I20 L26 O17 O30
Date:	2020–11–11
URL:	http://d.repec.org/n?u=RePEc:hhs:iuiwop:1368&r=all

Predicting county-scale maize yields with publicly available data

By:	Jiang, Zehui; Liu, Chao; Ganapathysubramanian, Baskar; Hayes, Dermot J.; Sarkar, Soumik
Abstract:	Maize (corn) is the dominant grain grown in the world. Total maize production in 2018 equaled 1.12 billion tons. Maize is used primarily as an animal feed in the production of eggs, dairy, pork and chicken. The US produces 32% of the worldâ€™s maize followed by China at 22% and Brazil at 9% (https://apps.fas.usda.gov/psdonline/app/index.html#/app/home). Accurate national-scale corn yield prediction critically impacts mercantile markets through providing essential information about expected production prior to harvest. Publicly available high-quality corn yield prediction can help address emergent information asymmetry problems and in doing so improve price efficiency in futures markets. We build a deep learning model to predict corn yields, specifically focusing on county-level prediction across 10 states of the Corn-Belt in the United States, and pre-harvest prediction with monthly updates from August. The results show promising predictive power relative to existing survey-based methods and set the foundation for a publicly available county yield prediction effort that complements existing public forecasts.
Date:	2020–09–11
URL:	http://d.repec.org/n?u=RePEc:isu:genstf:202009110700001775&r=all

Predicting the price of second-hand vehicles using data mining techniques

By:	Jafari Kang, Masood; Zohoori, Sepideh; Abbasi, Elahe; Li, Yueqing; Hamidi, Maryam
Abstract:	The electronic commerce, known as “E-commerce”, has been boosted rapidly in recent years, and makes it possible to record all information such as price, location, customer’s review, search history, discount options, competitor’s price, and so on. Accessing to such rich source of data, companies can analyze their users’ behavior to improve the customer satisfaction as well as the revenue. This study aims to estimate the price of used light vehicles in a commercial website, Divar, which is a popular website in Iran for trading second-handed goods. At first, highlighted features were extracted from the description column using the three methods of Bag of Words (BOW), Latent Dirichlet Allocation (LDA), and Hierarchical Dirichlet Process (HDP). Second, a multiple linear regression model was fit to predict the product price based on its attributes and the highlighted features. The accuracy index of Actuals-Predictions Correlation, the min-max index, and MAPE methods were used to validate the proposed methods. Results showed that the BOW model is the best model with an Adjusted R-square of 0.7841.
Keywords:	Text mining, Topic modeling, BOW, LDA, HDP, Linear regression
JEL:	C5 C8 Y10
Date:	2019–11–08
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:103933&r=all

New(spaper) Evidence of a Reduction in Suicide Mentions during the 19th‐century US Gold Rush

By:	Kronenberg, Christoph
Abstract:	I analyze the relationship between state‐level economic shocks and suicides using historical US gold discoveries (1840‐1860) as a large unexpected economic shock. Gold discoveries were an unexpected and large economic shock of up to 3.5% of GDP. They provide as good as random variation to the local economy, that I use to estimate the effect of economic changes on suicides. Comprehensive mortality data by state and year does not exist for the US for 1840 to 1860. I thus make use of web scraped data from a newspaper archive and use suicide mentions per 100,000 pages as a proxy for suicides. Results show that overall gold discoveries are linked with a clear reduction in newspaper suicide mentions. The results indicate that an economic shock changes the suicide rate by one for every $136,659 to $251,145. This is estimate implies a higher cost‐effectiveness than previous research but is still seven to fourteen times the size of modern, cost‐effective suicide prevention method.
Keywords:	Gold Rush, Economic Shock, Suicide, Newspaper
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:ajt:wcinch:73382&r=all

Text-Based Linkages and Local Risk Spillovers in the Equity Market

By:	Ge, S.
Abstract:	This paper uses extensive text data to construct firms' links via which local shocks transmit. Using the novel text-based linkages, I estimate a heterogeneous spatial-temporal model which accommodates the contemporaneous and dynamic spillover effects at the same time. I document a considerable degree of local risk spillovers in the market plus sector hierarchical factor model residuals of S&P 500 stocks. The method is found to outperform various previously studied methods in terms of out-of-sample fit. Network analysis of the spatial-temporal model identifies the major systemic risk contributors and receivers, which are of particular interest to microprudential policies. From a macroprudential perspective, a rolling-window analysis reveals that the strength of local risk spillovers increases during periods of crisis, when, on the other hand, the market factor loses its importance.
Keywords:	Excess co-movement, weak and strong cross-sectional dependence, local risk spillovers, networks, textual analysis, big data, systemic risk, heterogeneous spatial auto-regressive model (HSAR)
JEL:	C33 C58 G10 G12
Date:	2020–11–26
URL:	http://d.repec.org/n?u=RePEc:cam:camdae:20115&r=all

Predicting Disaggregated CPI Inflation Components via Hierarchical Recurrent Neural Networks

By:	Oren Barkan; Itamar Caspi; Allon Hammer; Noam Koenigstein
Abstract:	We present a hierarchical architecture based on Recurrent Neural Networks (RNNs) for predicting disaggregated inflation components of the Consumer Price Index (CPI). While the majority of existing research is focused mainly on predicting the inflation headline, many economic and financial entities are more interested in its partial disaggregated components. To this end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model that utilizes information from higher levels in the CPI hierarchy to improve predictions at the more volatile lower levels. Our evaluations, based on a large data-set from the US CPI-U index, indicate that the HRNN model significantly outperforms a vast array of well-known inflation prediction baselines.
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2011.07920&r=all

Giving by taking away: big tech, data colonialism and the reconfiguration of social good

By:	Viera Magalhães, João; Couldry, Nick
Abstract:	Big Tech companies have recently led and financed projects that claim to use datafication for the “social good”. This article explores what kind of social good it is that this sort of datafication engenders. Through the analysis of corporate public communications and patent applications, it finds that these initiatives hinge on the reconfiguration of social good as datafied, probabilistic, and profitable. These features, the article argues, are better understood within the framework of data colonialism. Rethinking “doing good” as a facet of data colonialism illuminates the inherent harm to freedom these projects produce and why, in order to “give”, Big Tech must often take away.
Keywords:	forthcoming
JEL:	R14 J01
Date:	2020–10–31
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:107516&r=all

“No Central Stage”: Telegram-based activity during the 2019 protests in Hong Kong

By:	Urman, Aleksandra; Ho, Justin Chun-ting; Katz, Stefan
Abstract:	We examine Telegram-based activities related to the 2019 protests in Hong Kong thus presenting the first study of a large Telegram-aided protest movement. We contribute to both - scholarship on Hong Kongese protests and research on social media-based protest mobilization. For that, we rely on the data collected through Telegram’s API and a combination of network analysis and computational text analysis. We find that the Telegram-based network was cohesive ensuring the efficient spread of protest-related information. Content spread through Telegram predominantly concerned discussions of future actions and protest-related on-site information (i.e., police presence in certain areas). We find that the Telegram network was dominated by different actors each month of the observation suggesting the absence of one single leader. Further, traditional protest leaders - those prominent during the 2014 Umbrella Movement, - such as media and civic organisations were less prominent in the network than local communities. Finally, we observe a cooldown in the level of Telegram activity after the enactment of the harsh National Security Law in July 2020. Further investigation is necessary to assess the persistence of this effect in a long-term perspective.
Date:	2020–11–17
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:ueds4&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.