nep-big New Economics Papers
on Big Data
Issue of 2022‒05‒16
fourteen papers chosen by
Tom Coupé
University of Canterbury

  1. Visual Representation and Stereotypes in News Media By Elliott Ash; Ruben Durante; Maria Grebenshchikova; Carlo Schwarz
  2. Cryptocurrency Return Prediction Using Investor Sentiment Extracted by BERT-Based Classifiers from News Articles, Reddit Posts and Tweets By Duygu Ider
  3. New Approaches to Forecasting Growth and Inflation: Big Data and Machine Learning By Sabyasachi Kar; Amaani Bashir; Mayank Jain
  4. Data Production and the coevolving AI trajectories: An attempted evolutionary model. By Andrea Borsato; Andre Lorentz
  5. On Parametric Optimal Execution and Machine Learning Surrogates By Tao Chen; Mike Ludkovski; Moritz Vo{\ss}
  6. Artificial Intelligence and work: a critical review of recent research from the social sciences By Jean-Philippe Deranty; Thomas Corbin
  7. Can Algorithms Reliably Predict Long-Term Unemployment in Times of Crisis? – Evidence from the COVID-19 Pandemic By Kunaschk, Max; Lang, Julia
  8. Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment By Augustine Denteh; Helge Liebert
  9. Digitizing Historical Balance Sheet Data: A Practitioner's Guide By Sergio Correia; Stephan Luck
  10. A Dual Generalized Long Memory Modelling for Forecasting Electricity Spot Price: Neural Network and Wavelet Estimate By Souhir Ben Amor; Heni Boubaker; Lotfi Belkacem
  11. Natural Disasters and Economic Dynamics: Evidence from the Kerala Floods. By Beyer, Robert C. M.; Narayanan, Abhinav; Thakur, Gogol Mitra
  12. Variational Heteroscedastic Volatility Model By Zexuan Yin; Paolo Barucca
  13. European enterprise survey on the use of technologies based on artificial intelligence By Snezha SK Kazakova; Allison AD Dunne; Daan DB Bijwaard; Julien Gosse; Charles Hoffreumon; Nicolas van Zeebroeck
  14. Tackling Large Outliers in Macroeconomic Data with Vector Artificial Neural Network Autoregression By Vito Polito; Yunyi Zhang

  1. By: Elliott Ash; Ruben Durante; Maria Grebenshchikova; Carlo Schwarz
    Abstract: We propose a new method for measuring gender and ethnic stereotypes in news reports. By combining computer vision and natural language processing tools, the method allows us to analyze both images and text as well as the interaction between the two. We apply this approach to over 2 million web articles published in the New York Times and Fox News between 2000 and 2020. We find that in both outlets, men and whites are generally over-represented relative to their population share, while women and Hispanics are under-represented. We also document that news content perpetuates common stereotypes such as associating Blacks and Hispanics with low-skill jobs, crime, and poverty, and Asians with high-skill jobs and science. For jobs, we show that the relationship between visual representation and racial stereotypes holds even after controlling for the actual share of a group in a given occupation. Finally, we find that group representation in the news is influenced by the gender and ethnic identity of authors and editors.
    Keywords: stereotypes, gender, race, media, computer vision, text analysis
    JEL: L82 J15 J16 Z10 C45
    Date: 2022
  2. By: Duygu Ider
    Abstract: This paper studies the extent at which investor sentiment contributes to cryptocurrency return prediction. Investor sentiment is extracted from news articles, Reddit posts and Tweets using BERT-based classifiers fine-tuned on this specific text data. As this data is unlabeled, a weak supervision approach by pseudo-labeling using a zero-shot classifier is used. Contribution of sentiment is then examined using a variety of machine learning models. Each model is trained on data with and without sentiment separately. The conclusion is that sentiment leads to higher prediction accuracy and additional investment profit when the models are analyzed collectively, although this does not hold true for every single model.
    Date: 2022–04
  3. By: Sabyasachi Kar; Amaani Bashir; Mayank Jain (Institute of Economic Growth, Delhi)
    Abstract: The use of big data and machine learning techniques is now very common in many spheres and there is growing popularity of these approaches in macroeconomic forecasting as well. Is big data and machine learning really useful in the prediction of macroeconomic outcomes? Are they superior in performance compared to their traditional counterparts? What are the tradeoffs that forecasters need to keep in mind, and what are the steps they need to take to use these resources effectively? We carry out a critical analysis of the existing literature in order to answer these questions. Our analysis suggests that the answer to most of these questions are nuanced, conditional on a number of factors identified in the study.
    Keywords: Forecasting, Big Data, Machine Learning, Supervised Learning, Meta-analysis, Growth, Inflation
    JEL: C14 C45 C52 C53 C55 E17 E37
    Date: 2021–10
  4. By: Andrea Borsato; Andre Lorentz
    Abstract: This paper contributes to the understanding of the relationship between the nature of data and the Artificial Intelligence (AI) technological trajectories. We develop an agentbased model in which firms are data producers that compete on the markets for data and AI. The model is enriched by a public sector that fuels the purchase of data and trains the scientists that will populate firms as workforce. Through several simulation experiments we analyze the determinants of each market structure, the corresponding relationships with innovation attainments, the pattern followed by labour and data productivity, and the quality of data traded in the economy. More precisely, we question the established view in the literature on industrial organization according to which technological imperatives are enough to experience divergent industrial dynamics on both the markets for data and AI blueprints. Although technical change behooves if any industry pattern is to emerge, the actual unfolding is not the outcome of a specific technological trajectory, but the result of the interplay between technology-related factors and the availability of data-complementary inputs such as labour and AI capital, the market size, preferences and public policies.
    Keywords: Artificial Intelligence, Data Markets, Industrial Dynamics, Agent-based Models.
    JEL: L10 L60 O33 O38
    Date: 2022
  5. By: Tao Chen; Mike Ludkovski; Moritz Vo{\ss}
    Abstract: We investigate optimal execution problems with instantaneous price impact and stochastic resilience. First, in the setting of linear price impact function we derive a closed-form recursion for the optimal strategy, generalizing previous results with deterministic transient price impact. Second, we develop a numerical algorithm for the case of nonlinear price impact. We utilize an actor-critic framework that constructs two neural-network surrogates for the value function and the feedback control. One advantage of such functional approximators is the ability to do parametric learning, i.e. to incorporate some of the model parameters as part of the input space. Precise calibration of price impact, resilience, etc., is known to be extremely challenging and hence it is critical to understand sensitivity of the strategy to these parameters. Our parametric neural network (NN) learner organically scales across 3-6 input dimensions and is shown to accurately approximate optimal strategy across a range of parameter configurations. We provide a fully reproducible Jupyter Notebook with our NN implementation, which is of independent pedagogical interest, demonstrating the ease of use of NN surrogates in (parametric) stochastic control problems.
    Date: 2022–04
  6. By: Jean-Philippe Deranty; Thomas Corbin
    Abstract: This review seeks to present a comprehensive picture of recent discussions in the social sciences of the anticipated impact of AI on the world of work. Issues covered include technological unemployment, algorithmic management, platform work an the politics of AI work. The review identifies the major disciplinary and methodological perspectives on AI's impact on work, and the obstacles they face in making predictions. Two parameters influencing the development and deployment of AI in the economy are highlighted, the capitalist imperative and nationalistic pressures.
    Date: 2022–02
  7. By: Kunaschk, Max (Institute for Employment Research (IAB), Nuremberg, Germany ; University of Groningen); Lang, Julia (Institute for Employment Research (IAB), Nuremberg, Germany ; Univ. Regensburg)
    Abstract: "In this paper, we compare two popular statistical learning techniques, logistic regression and random forest, with respect to their ability to classify jobseekers by their likelihood to become long-term unemployed. We study the performance of the two methods before the COVID-19 pandemic as well as the impact of the pandemic and its associated containment measures on their prediction performance. Our results show that random forest consistently out-performs logistic regression in terms of prediction performance, both, before and after the beginning of the pandemic. During the lockdowns of the first wave, the number of unemployment entries and the fraction of individuals that become long-term unemployed strongly increases, and the prediction performance of both methods declines. Finally, while the composition of the (long-term) unemployed changed at the beginning of the COVID-19 pandemic, we do not find systematic patterns across groups with different levels of labor market attachment or across different sectors of previous employment in terms of declines in prediction performance." (Author's abstract, IAB-Doku) ((en))
    Keywords: IAB-Open-Access-Publikation
    Date: 2022–05–09
  8. By: Augustine Denteh; Helge Liebert
    Abstract: We provide new insights regarding the finding that Medicaid increased emergency department (ED) use from the Oregon experiment. We find meaningful heterogeneous impacts of Medicaid on ED use using causal machine learning methods. The treatment effect distribution is widely dispersed, and the average effect is not representative of most individualized treatment effects. A small group—about 14% of participants—in the right tail of the distribution drives the overall effect. We identify priority groups with economically significant increases in ED usage based on demographics and prior utilization. Intensive margin effects are an important driver of increases in ED utilization.
    Keywords: Medicaid, ED use, effect heterogeneity, causal machine learning, optimal policy
    JEL: H75 I13 I38
    Date: 2022
  9. By: Sergio Correia; Stephan Luck
    Abstract: This paper discusses how to successfully digitize large-scale historical micro-data by augmenting optical character recognition (OCR) engines with pre- and post-processing methods. Although OCR software has improved dramatically in recent years due to improvements in machine learning, off-the-shelf OCR applications still present high error rates which limits their applications for accurate extraction of structured information. Complementing OCR with additional methods can however dramatically increase its success rate, making it a powerful and cost-efficient tool for economic historians. This paper showcases these methods and explains why they are useful. We apply them against two large balance sheet datasets and introduce "quipucamayoc", a Python package containing these methods in a unified framework.
    Date: 2022–03
  10. By: Souhir Ben Amor; Heni Boubaker; Lotfi Belkacem
    Abstract: In this paper, dual generalized long memory modelling has been proposed to predict the electricity spot price. First, we focus on modelling the conditional mean of the series so we adopt a generalized fractional k-factor Gegenbauer process ( k-factor GARMA). Secondly, the residual from the k-factor GARMA model has been used as a proxy for the conditional variance; these residuals were predicted using two different approaches. In the first approach, a local linear wavelet neural network model (LLWNN) has developed to predict the conditional variance using two different learning algorithms, so we estimate the hybrid k- factor GARMA-LLWNN based backpropagation (BP) algorithm and based particle swarm optimization (PSO) algorithm. In the second approach, the Gegenbauer generalized autoregressive conditional heteroscedasticity process (G-GARCH) has been adopted, and the parameters of the k-factor GARMAG- GARCH model have been estimated using the wavelet methodology based on the discrete wavelet packet transform (DWPT) approach. To illustrate the usefulness of our methodology, we carry out an empirical application using the hourly returns of electricity prices from the Nord Pool market. The empirical results have shown that the k-factor GARMA-G-GARCH model has the best prediction accuracy in terms of forecasting criteria, and find that this is more appropriate for forecasts.
    Date: 2022–04
  11. By: Beyer, Robert C. M. (World Bank); Narayanan, Abhinav (Asian Infrastructure Investment Bank); Thakur, Gogol Mitra (Centre for Development Studies)
    Abstract: Exceptionally high rainfall in the Indian state of Kerala caused major flooding in 2018. This paper estimates the short-run causal impact of the disaster on the economy, using a difference-in-difference approach. Monthly nighttime light intensity, a proxy for aggregate economic activity, suggests that activity declined for three months during the disaster but boomed subsequently. Automated teller machine transactions, a proxy for consumer demand, declined and credit disbursal increased, with households borrowing more for housing and less for consumption. In line with other results, both household income and expenditure declined during the floods. Despite a strong wage recovery after the floods, spending remained lower relative to the unaffected districts. The paper argues that increased labor demand due to reconstruction efforts increased wages after the floods and provides corroborating evidence: (i) rural labor markets tightened, (ii) poorer households benefited more, and (iii) wages increased most where government relief was strongest. The findings confirm the presence of interesting economic dynamics during and right after natural disasters that remain in the shadow when analyzed with annual data.
    Keywords: natural disasters ; aggregate activity ; household behavior ; spatial analysis
    JEL: Q54 R22 D12 O44
    Date: 2022–04
  12. By: Zexuan Yin; Paolo Barucca
    Abstract: We propose Variational Heteroscedastic Volatility Model (VHVM) -- an end-to-end neural network architecture capable of modelling heteroscedastic behaviour in multivariate financial time series. VHVM leverages recent advances in several areas of deep learning, namely sequential modelling and representation learning, to model complex temporal dynamics between different asset returns. At its core, VHVM consists of a variational autoencoder to capture relationships between assets, and a recurrent neural network to model the time-evolution of these dependencies. The outputs of VHVM are time-varying conditional volatilities in the form of covariance matrices. We demonstrate the effectiveness of VHVM against existing methods such as Generalised AutoRegressive Conditional Heteroscedasticity (GARCH) and Stochastic Volatility (SV) models on a wide range of multivariate foreign currency (FX) datasets.
    Date: 2022–04
  13. By: Snezha SK Kazakova; Allison AD Dunne; Daan DB Bijwaard; Julien Gosse; Charles Hoffreumon; Nicolas van Zeebroeck
    Date: 2020–07–28
  14. By: Vito Polito (Department of Economics, University of Sheffield, UK); Yunyi Zhang (Xiamen University, China)
    Abstract: We develop a regime switching vector autoregression where artificial neural networks drive time variation in the coefficients of the conditional mean of the endogenous variables and the variance covariance matrix of the disturbances. The model is equipped with a stability constraint to ensure non-explosive dynamics. As such, it is employable to account for nonlinearity in macroeconomic dynamics not only during typical business cycles but also in a wide range of extreme events, like deep recessions and strong expansions. The methodology is put to the test using aggregate data for the United States that include the abnormal realizations during the recent Covid-19 pandemic. The model delivers plausible and stable structural inference, and accurate out-of-sample forecasts. This performance compares favourably against a number of alternative methodologies recently proposed to deal with large outliers in macroeconomic data caused by the pandemic.
    Keywords: Tax avoidance; Nonlinear time series; Regime switching models; Extreme events; Covid-19; Macroeconomic forecasting
    JEL: C45 C5 E37
    Date: 2022–03

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.