nep-big New Economics Papers
on Big Data
Issue of 2020‒12‒07
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Interpreting Big Data in the Macro Economy: A Bayesian Mixed Frequency Estimator By David Kohns; Arnab Bhattacharjee
  2. Financial Conditions and Economic Activity: Insights from Machine Learning By Michael T. Kiley
  3. Application of deep quantum neural networks to finance By Takayuki Sakuma
  4. Nowcasting business cycle turning points with stock networks and machine learning By Azqueta-Gavaldon, Andres; Hirschbühl, Dominik; Onorante, Luca; Saiz, Lorena
  5. Deep Neural Networks and Neuro-Fuzzy Networks for Intellectual Analysis of Economic Systems By Alexey Averkin; Sergey Yarushev
  6. Understanding the Distributional Aspects of Microcredit Expansions By Melvyn Weeks; Tobias Gabel Christiansen
  7. Fertile Soil for Intrapreneurship: Impartial Institutions and Human Capital By Ljunge, Martin; Stenkula, Mikael
  8. Predicting county-scale maize yields with publicly available data By Jiang, Zehui; Liu, Chao; Ganapathysubramanian, Baskar; Hayes, Dermot J.; Sarkar, Soumik
  9. Predicting the price of second-hand vehicles using data mining techniques By Jafari Kang, Masood; Zohoori, Sepideh; Abbasi, Elahe; Li, Yueqing; Hamidi, Maryam
  10. New(spaper) Evidence of a Reduction in Suicide Mentions during the 19th‐century US Gold Rush By Kronenberg, Christoph
  11. Text-Based Linkages and Local Risk Spillovers in the Equity Market By Ge, S.
  12. Predicting Disaggregated CPI Inflation Components via Hierarchical Recurrent Neural Networks By Oren Barkan; Itamar Caspi; Allon Hammer; Noam Koenigstein
  13. Giving by taking away: big tech, data colonialism and the reconfiguration of social good By Viera Magalhães, João; Couldry, Nick
  14. “No Central Stage”: Telegram-based activity during the 2019 protests in Hong Kong By Urman, Aleksandra; Ho, Justin Chun-ting; Katz, Stefan

  1. By: David Kohns; Arnab Bhattacharjee (Centre for Energy Economics Research and Policy, Heriot-Watt University)
    Abstract: More and more are Big Data sources, such as Google Trends, being used to augment nowcast models. An often neglected issue within the previous literature, which is especially pertinent to policy environments, is the interpretability of the Big Data source included in the model. We provide a Bayesian modeling framework which is able to handle all usual econometric issues involved in combining Big Data with traditional macroeconomic time series such as mixed frequency and ragged edges, while remaining computationally simple and allowing for a high degree of interpretability. In our model, we explicitly account for the possibility that the Big Data and macroeconomic data set included have different degreesof sparsity. We test our methodology by investigating whether Google trends in real time increase nowcast fit of US real GDP growth compared to traditional macroeconomic time series. We find that search terms improve performance of both point forecast accuracy as well as forecast density calibration not only before official information is released but alsolater into GDP reference quarters. Our transparent methodology shows that the increased fit stems from search terms acting as early warning signals to large turning points in GDP.
    Keywords: Big Data; Machine Learning; Interpretability; Illusion of Sparsity; Density Nowcast; Google Search Terms
    JEL: C31 C53
    Date: 2019–10
  2. By: Michael T. Kiley
    Abstract: Machine learning (ML) techniques are used to construct a financial conditions index (FCI). The components of the ML-FCI are selected based on their ability to predict the unemployment rate one-year ahead. Three lessons for macroeconomics and variable selection/dimension reduction with large datasets emerge. First, variable transformations can drive results, emphasizing the need for transparency in selection of transformations and robustness to a range of reasonable choices. Second, there is strong evidence of nonlinearity in the relationship between financial variables and economic activity—tight financial conditions are associated with sharp deteriorations in economic activity and accommodative conditions are associated with only modest improvements in activity. Finally, the ML-FCI places sizable weight on equity prices and term spreads, in contrast to other measures. These lessons yield an ML-FCI showing tightening in financial conditions before the early 1990s and early 2000s recessions, in contrast to the National Financial Conditions Index (NFCI).
    Keywords: Big Data; Recession Prediction; Variable Selection
    JEL: E50 E17 C55 E44
    Date: 2020–11–16
  3. By: Takayuki Sakuma
    Abstract: Use of the deep quantum neural network proposed by Beer et al. (2020) could grant new perspectives on solving numerical problems arising in the field of finance. We discuss this potential in the context of simple experiments such as learning implied volatilites and differential machine proposed by Huge and Savine (2020). The deep quantum neural network is considered to be a promising candidate for developing highly powerful methods in finance.
    Date: 2020–11
  4. By: Azqueta-Gavaldon, Andres; Hirschbühl, Dominik; Onorante, Luca; Saiz, Lorena
    Abstract: We propose a granular framework that makes use of advanced statistical methods to approximate developments in economy-wide expected corporate earnings. In particular, we evaluate the dynamic network structure of stock returns in the United States as a proxy for the transmission of shocks through the economy and identify node positions (firms) whose connectedness provides a signal for economic growth. The nowcasting exercise, with both the in-sample and the out-of-sample consistent feature selection, highlights which firms are contemporaneously exposed to aggregate downturns and provides a more complete narrative than is usually provided by more aggregate data. The two-state model for predicting periods of negative growth can remarkably well predict future states by using information derived from the node-positions of manufacturing, transportation and financial (particularly insurance) firms. The three-states model, which identifies high, low and negative growth, successfully predicts economic regimes by making use of information from the financial, insurance, and retail sectors. JEL Classification: C45, C51, D85, E32, N1
    Keywords: early warning signal, Granger-causality networks, real-time, turning point prediction
    Date: 2020–11
  5. By: Alexey Averkin; Sergey Yarushev
    Abstract: In tis paper we consider approaches for time series forecasting based on deep neural networks and neuro-fuzzy nets. Also, we make short review of researches in forecasting based on various models of ANFIS models. Deep Learning has proven to be an effective method for making highly accurate predictions from complex data sources. Also, we propose our models of DL and Neuro-Fuzzy Networks for this task. Finally, we show possibility of using these models for data science tasks. This paper presents also an overview of approaches for incorporating rule-based methodology into deep learning neural networks.
    Date: 2020–11
  6. By: Melvyn Weeks; Tobias Gabel Christiansen
    Abstract: Various poverty reduction strategies are being implemented in the pursuit of eliminating extreme poverty. One such strategy is increased access to microcredit in poor areas around the world. Microcredit, typically defined as the supply of small loans to underserved entrepreneurs that originally aimed at displacing expensive local money-lenders, has been both praised and criticized as a development tool (Banerjee et al., 2015b). This paper presents an analysis of heterogeneous impacts from increased access to microcredit using data from three randomised trials. In the spirit of recognising that in general the impact of a policy intervention varies conditional on an unknown set of factors, particular, we investigate whether heterogeneity presents itself as groups of winners and losers, and whether such subgroups share characteristics across RCTs. We find no evidence of impacts, neither average nor distributional, from increased access to microcredit on consumption levels. In contrast, the lack of average effects on profits seems to mask heterogeneous impacts. The findings are, however, not robust to the specific machine learning algorithm applied. Switching from the better performing Elastic Net to the worse performing Random Forest leads to a sharp increase in the variance of the estimates. In this context, methods to evaluate the relative performing machine learning algorithm developed by Chernozhukov et al. (2019) provide a disciplined way for the analyst to counter the uncertainty as to which algorithm to deploy.
    Date: 2020–11
  7. By: Ljunge, Martin (Research Institute of Industrial Economics (IFN)); Stenkula, Mikael (Research Institute of Industrial Economics (IFN))
    Abstract: Intrapreneurs, entrepreneurial employees, constitute an important force behind innovations in the economy. Yet, what factors that promote intrapreneurship at the country level are an underdeveloped research area. This paper provides a seminal contribution regarding the methodological approach and the broad set of potential explanatory factors studied. Based on machine-learning techniques (LASSO and EBA methods), we investigate the influence of over 60 factors capturing institutional, demographic, cultural, and developmental factors. We find that the quality of government measured as impartiality, i.e., that the political institutions treat the citizens in a non-discriminatory fashion and do not favor some groups or individuals, and the level of human capital, measured as the average years of schooling, are the most important factors predicting the level of intrapreneurship across countries. Instrumental variable results support a causal interpretation. The findings emphasize the importance of policy to establish well-functioning and impartial institutions as well as to promote higher education.
    Keywords: Intrapreneurship; Impartial institutions; Human capital; Machine-learning
    JEL: E02 I20 L26 O17 O30
    Date: 2020–11–11
  8. By: Jiang, Zehui; Liu, Chao; Ganapathysubramanian, Baskar; Hayes, Dermot J.; Sarkar, Soumik
    Abstract: Maize (corn) is the dominant grain grown in the world. Total maize production in 2018 equaled 1.12 billion tons. Maize is used primarily as an animal feed in the production of eggs, dairy, pork and chicken. The US produces 32% of the world’s maize followed by China at 22% and Brazil at 9% ( Accurate national-scale corn yield prediction critically impacts mercantile markets through providing essential information about expected production prior to harvest. Publicly available high-quality corn yield prediction can help address emergent information asymmetry problems and in doing so improve price efficiency in futures markets. We build a deep learning model to predict corn yields, specifically focusing on county-level prediction across 10 states of the Corn-Belt in the United States, and pre-harvest prediction with monthly updates from August. The results show promising predictive power relative to existing survey-based methods and set the foundation for a publicly available county yield prediction effort that complements existing public forecasts.
    Date: 2020–09–11
  9. By: Jafari Kang, Masood; Zohoori, Sepideh; Abbasi, Elahe; Li, Yueqing; Hamidi, Maryam
    Abstract: The electronic commerce, known as “E-commerce”, has been boosted rapidly in recent years, and makes it possible to record all information such as price, location, customer’s review, search history, discount options, competitor’s price, and so on. Accessing to such rich source of data, companies can analyze their users’ behavior to improve the customer satisfaction as well as the revenue. This study aims to estimate the price of used light vehicles in a commercial website, Divar, which is a popular website in Iran for trading second-handed goods. At first, highlighted features were extracted from the description column using the three methods of Bag of Words (BOW), Latent Dirichlet Allocation (LDA), and Hierarchical Dirichlet Process (HDP). Second, a multiple linear regression model was fit to predict the product price based on its attributes and the highlighted features. The accuracy index of Actuals-Predictions Correlation, the min-max index, and MAPE methods were used to validate the proposed methods. Results showed that the BOW model is the best model with an Adjusted R-square of 0.7841.
    Keywords: Text mining, Topic modeling, BOW, LDA, HDP, Linear regression
    JEL: C5 C8 Y10
    Date: 2019–11–08
  10. By: Kronenberg, Christoph
    Abstract: I analyze the relationship between state‐level economic shocks and suicides using historical US gold discoveries (1840‐1860) as a large unexpected economic shock. Gold discoveries were an unexpected and large economic shock of up to 3.5% of GDP. They provide as good as random variation to the local economy, that I use to estimate the effect of economic changes on suicides. Comprehensive mortality data by state and year does not exist for the US for 1840 to 1860. I thus make use of web scraped data from a newspaper archive and use suicide mentions per 100,000 pages as a proxy for suicides. Results show that overall gold discoveries are linked with a clear reduction in newspaper suicide mentions. The results indicate that an economic shock changes the suicide rate by one for every $136,659 to $251,145. This is estimate implies a higher cost‐effectiveness than previous research but is still seven to fourteen times the size of modern, cost‐effective suicide prevention method.
    Keywords: Gold Rush, Economic Shock, Suicide, Newspaper
    Date: 2020
  11. By: Ge, S.
    Abstract: This paper uses extensive text data to construct firms' links via which local shocks transmit. Using the novel text-based linkages, I estimate a heterogeneous spatial-temporal model which accommodates the contemporaneous and dynamic spillover effects at the same time. I document a considerable degree of local risk spillovers in the market plus sector hierarchical factor model residuals of S&P 500 stocks. The method is found to outperform various previously studied methods in terms of out-of-sample fit. Network analysis of the spatial-temporal model identifies the major systemic risk contributors and receivers, which are of particular interest to microprudential policies. From a macroprudential perspective, a rolling-window analysis reveals that the strength of local risk spillovers increases during periods of crisis, when, on the other hand, the market factor loses its importance.
    Keywords: Excess co-movement, weak and strong cross-sectional dependence, local risk spillovers, networks, textual analysis, big data, systemic risk, heterogeneous spatial auto-regressive model (HSAR)
    JEL: C33 C58 G10 G12
    Date: 2020–11–26
  12. By: Oren Barkan; Itamar Caspi; Allon Hammer; Noam Koenigstein
    Abstract: We present a hierarchical architecture based on Recurrent Neural Networks (RNNs) for predicting disaggregated inflation components of the Consumer Price Index (CPI). While the majority of existing research is focused mainly on predicting the inflation headline, many economic and financial entities are more interested in its partial disaggregated components. To this end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model that utilizes information from higher levels in the CPI hierarchy to improve predictions at the more volatile lower levels. Our evaluations, based on a large data-set from the US CPI-U index, indicate that the HRNN model significantly outperforms a vast array of well-known inflation prediction baselines.
    Date: 2020–11
  13. By: Viera Magalhães, João; Couldry, Nick
    Abstract: Big Tech companies have recently led and financed projects that claim to use datafication for the “social good”. This article explores what kind of social good it is that this sort of datafication engenders. Through the analysis of corporate public communications and patent applications, it finds that these initiatives hinge on the reconfiguration of social good as datafied, probabilistic, and profitable. These features, the article argues, are better understood within the framework of data colonialism. Rethinking “doing good” as a facet of data colonialism illuminates the inherent harm to freedom these projects produce and why, in order to “give”, Big Tech must often take away.
    Keywords: forthcoming
    JEL: R14 J01
    Date: 2020–10–31
  14. By: Urman, Aleksandra; Ho, Justin Chun-ting; Katz, Stefan
    Abstract: We examine Telegram-based activities related to the 2019 protests in Hong Kong thus presenting the first study of a large Telegram-aided protest movement. We contribute to both - scholarship on Hong Kongese protests and research on social media-based protest mobilization. For that, we rely on the data collected through Telegram’s API and a combination of network analysis and computational text analysis. We find that the Telegram-based network was cohesive ensuring the efficient spread of protest-related information. Content spread through Telegram predominantly concerned discussions of future actions and protest-related on-site information (i.e., police presence in certain areas). We find that the Telegram network was dominated by different actors each month of the observation suggesting the absence of one single leader. Further, traditional protest leaders - those prominent during the 2014 Umbrella Movement, - such as media and civic organisations were less prominent in the network than local communities. Finally, we observe a cooldown in the level of Telegram activity after the enactment of the harsh National Security Law in July 2020. Further investigation is necessary to assess the persistence of this effect in a long-term perspective.
    Date: 2020–11–17

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.