nep-big New Economics Papers
on Big Data
Issue of 2023‒02‒06
seventeen papers chosen by
Tom Coupé
University of Canterbury

  1. The Impact of Uncertainty on Fan Interest Surrounding Multiple Outcomes in Open European Football Leagues By Pedro Garcia-del-Barrio; J. James Reade
  2. Removing Non-Stationary Knowledge From Pre-Trained Language Models for Entity-Level Sentiment Classification in Finance By Guijin Son; Hanwool Lee; Nahyeon Kang; Moonjeong Hahm
  3. Greenhouse gases emissions: estimating corporate non-reported emissions using interpretable machine learning By Jeremi Assael; Thibaut Heurtebize; Laurent Carlier; François Soupé
  4. Robust machine learning pipelines for trading market-neutral stock portfolios By Thomas Wong; Mauricio Barahona
  5. Do Bishops Matter for Politics? Evidence From Italy By Gianandrea Lanzara; Sara Lazzaroni; Paolo Masella; Mara P. Squicciarini
  6. Central Bank Communication of Uncertainty By Rayane Hanifi; Klodiana Istrefi; Adrian Penalver
  7. Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves By Xiaohong Chen; Yuan Liao; Weichen Wang
  8. Consumer acceptance of the use of artificial intelligence in online shopping: evidence from Hungary By Szabolcs Nagy; Noemi Hajdu
  9. Mean-field neural networks-based algorithms for McKean-Vlasov control problems * By Huyên Pham; Xavier Warin
  10. Stock market forecasting using DRAGAN and feature matching By Fateme Shahabi Nejad; Mohammad Mehdi Ebadzadeh
  11. On the causality-preservation capabilities of generative modelling By Yves-C\'edric Bauwelinckx; Jan Dhaene; Tim Verdonck; Milan van den Heuvel
  12. Deep Reinforcement Learning for Asset Allocation: Reward Clipping By Jiwon Kim; Moon-Ju Kang; KangHun Lee; HyungJun Moon; Bo-Kwan Jeon
  13. Predicting Companies' ESG Ratings from News Articles Using Multivariate Timeseries Analysis By Tanja Aue; Adam Jatowt; Michael F\"arber
  14. "Satellite-Based Vehicle Flow Data to Assess Local Economic Activities" By Eugenia Go; Kentaro Nakajima; Yasuyuki Sawada; Kiyoshi Taniguchi
  15. Measuring Transparency in the Social Sciences: Political Science and International Relations By Scoggins, Bermond; Robertson, Matthew P.
  16. ESG INVESTING: A SENTIMENT ANALYSIS APPROACH By Stéphane Goutte; Viet Hoang Le; Fei Liu; Hans-Jörg Mettenheim, Von
  17. DEEP LEARNING AND TECHNICAL ANALYSIS IN CRYPTOCURRENCY MARKET By Stéphane Goutte; Viet Hoang Le; Fei Liu; Hans-Jörg Mettenheim, Von

  1. By: Pedro Garcia-del-Barrio (Department of Economics, Universidad de Navarra); J. James Reade
    Abstract: We introduce a new source of information to evaluate the importance of uncertainty in driving demand for particular types of entertainment events. We use web searches via Google, and consider various dimensions of uncertainty of outcome in sporting events. Most saliently, we consider whether the complete removal of uncertainty surrounding the winner of a competition, something that often happens before European soccer leagues have completed, reduces interest. We find that the decrease in interest is significant, but that it is mitigated by increased interest in secondary prizes in these league competitions: qualification for European competitions, and avoiding relegation. We conclude by affirming that such a diversified structure of competition, replete with an open structure of promotion and relegation, is desirable in the context of league competitions such as those in Europe that do not have a proment play-off system to conclude the season.
    Keywords: global sports, outcome certainty; Google trends; competitions’ multiple prizes; event analysis
    JEL: J24 J33 J71
    Date: 2023–01–11
  2. By: Guijin Son; Hanwool Lee; Nahyeon Kang; Moonjeong Hahm
    Abstract: Extraction of sentiment signals from news text, stock message boards, and business reports, for stock movement prediction, has been a rising field of interest in finance. Building upon past literature, the most recent works attempt to better capture sentiment from sentences with complex syntactic structures by introducing aspect-level sentiment classification (ASC). Despite the growing interest, however, fine-grained sentiment analysis has not been fully explored in non-English literature due to the shortage of annotated finance-specific data. Accordingly, it is necessary for non-English languages to leverage datasets and pre-trained language models (PLM) of different domains, languages, and tasks to best their performance. To facilitate finance-specific ASC research in the Korean language, we build KorFinASC, a Korean aspect-level sentiment classification dataset for finance consisting of 12, 613 human-annotated samples, and explore methods of intermediate transfer learning. Our experiments indicate that past research has been ignorant towards the potentially wrong knowledge of financial entities encoded during the training phase, which has overestimated the predictive power of PLMs. In our work, we use the term "non-stationary knowledge'' to refer to information that was previously correct but is likely to change, and present "TGT-Masking'', a novel masking pattern to restrict PLMs from speculating knowledge of the kind. Finally, through a series of transfer learning with TGT-Masking applied we improve 22.63% of classification accuracy compared to standalone models on KorFinASC.
    Date: 2023–01
  3. By: Jeremi Assael (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab, MICS - Mathématiques et Informatique pour la Complexité et les Systèmes - CentraleSupélec - Université Paris-Saclay); Thibaut Heurtebize (BNP Paribas Asset Management, Quantitative Research Group, Research Lab); Laurent Carlier (BNPP CIB GM Lab - BNP Paribas CIB Global Markets Data & AI Lab); François Soupé (BNP Paribas Asset Management, Quantitative Research Group, Research Lab)
    Abstract: As of 2022, greenhouse gases (GHG) emissions reporting and auditing are not yet compulsory for all companies and methodologies of measurement and estimation are not unified. We propose a machine learning-based model to estimate scope 1 and scope 2 GHG emissions of companies not reporting them yet. Our model, specifically designed to be transparent and completely adapted to this use case, is able to estimate emissions for a large universe of companies. It shows good out-of-sample global performances as well as good out-of-sample granular performances when evaluating it by sectors, by countries or by revenues buckets. We also compare our results to those of other providers and find our estimates to be more accurate. Thanks to the proposed explainability tools using Shapley values, our model is fully interpretable, the user being able to understand which factors split explain the GHG emissions for each particular company.
    Keywords: sustainability, disclosure, greenhouse gas emissions, machine learning, interpretability, carbon emissions, scope 1, scope 2, interpretable machine learning
    Date: 2022–12–18
  4. By: Thomas Wong; Mauricio Barahona
    Abstract: The application of deep learning algorithms to financial data is difficult due to heavy non-stationarities which can lead to over-fitted models that underperform under regime changes. Using the Numerai tournament data set as a motivating example, we propose a machine learning pipeline for trading market-neutral stock portfolios based on tabular data which is robust under changes in market conditions. We evaluate various machine-learning models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks with and without simple feature engineering, as the building blocks for the pipeline. We find that GBDT models with dropout display high performance, robustness and generalisability with relatively low complexity and reduced computational cost. We then show that online learning techniques can be used in post-prediction processing to enhance the results. In particular, dynamic feature neutralisation, an efficient procedure that requires no retraining of models and can be applied post-prediction to any machine learning model, improves robustness by reducing drawdown in volatile market conditions. Furthermore, we demonstrate that the creation of model ensembles through dynamic model selection based on recent model performance leads to improved performance over baseline by improving the Sharpe and Calmar ratios. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility of results.
    Date: 2022–12
  5. By: Gianandrea Lanzara; Sara Lazzaroni; Paolo Masella; Mara P. Squicciarini
    Abstract: This paper studies whether and how religious leaders affect politics. Focusing on Italian dioceses in the period from 1948 to 1992, we find that the identity of the bishop in office explains a significant amount of the variation in the vote share for the Christian Democracy party (DC). This result is robust to several exercises that use different samples and time windows. Zooming into the mechanism, we find that two characteristics of bishops matter: (i) his political culture, and (ii) his interaction with the population—the latter being measured using state-of-the-art text-analysis techniques.
    JEL: D72 Z12 D02
    Date: 2023–01
  6. By: Rayane Hanifi; Klodiana Istrefi; Adrian Penalver
    Abstract: In this paper, we examine how the monetary policy setting committees of the Federal Reserve, the Bank of England and the European Central Bank communicate their reaction to incoming data in their policy deliberation process by expressing confidence, surprise or uncertainty with respect to existing narratives. We use text analysis techniques to calculate forward and backward looking measures of relative surprise from the published Minutes of these decision-making bodies. We find many common patterns in this communication. Interestingly, policymakers tend to express more surprise and uncertainty with regard to developments in the real economy, whereas they are more likely to confirm their expectations with regard to inflation and monetary policy. When considering the monetary policy stance, we observe a tendency for policymakers to highlight surprise and uncertainty several meetings in advance of changes, particularly when easing monetary policy. Importantly, we document that a higher proportion of expressions of surprise and uncertainty increases the likelihood of an easier policy stance. By contrast, a higher proportion of expressions of confirmation tends to increase the likelihood of a tighter policy stance.
    Keywords: Central Banks, Monetary Policy, Communication, Minutes, Uncertainty
    JEL: E52 E58 C55
    Date: 2022
  7. By: Xiaohong Chen; Yuan Liao; Weichen Wang
    Abstract: General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametric function that satisfy conditional moment restrictions and are learned using multilayer neural networks. While the asymptotic normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is asymptotically Chi-square distributed, regardless whether the expectation functional is regular (root-$n$ estimable) or not. This holds when the data are weakly dependent beta-mixing condition. We apply our method to the off-policy evaluation in reinforcement learning, by formulating the Bellman equation into the conditional moment restriction framework, so that we can make inference about the state-specific value functional using the proposed GN-QLR method with time series data. In addition, estimating the averaged partial means and averaged partial derivatives of nonparametric instrumental variables and quantile IV models are also presented as leading examples. Finally, a Monte Carlo study shows the finite sample performance of the procedure
    Date: 2022–12
  8. By: Szabolcs Nagy; Noemi Hajdu
    Abstract: The rapid development of technology has drastically changed the way consumers do their shopping. The volume of global online commerce has significantly been increasing partly due to the recent COVID-19 crisis that has accelerated the expansion of e-commerce. A growing number of webshops integrate Artificial Intelligence (AI), state-of-the-art technology into their stores to improve customer experience, satisfaction and loyalty. However, little research has been done to verify the process of how consumers adopt and use AI-powered webshops. Using the technology acceptance model (TAM) as a theoretical background, this study addresses the question of trust and consumer acceptance of Artificial Intelligence in online retail. An online survey in Hungary was conducted to build a database of 439 respondents for this study. To analyse data, structural equation modelling (SEM) was used. After the respecification of the initial theoretical model, a nested model, which was also based on TAM, was developed and tested. The widely used TAM was found to be a suitable theoretical model for investigating consumer acceptance of the use of Artificial Intelligence in online shopping. Trust was found to be one of the key factors influencing consumer attitudes towards Artificial Intelligence. Perceived usefulness as the other key factor in attitudes and behavioural intention was found to be more important than the perceived ease of use. These findings offer valuable implications for webshop owners to increase customer acceptance
    Date: 2022–12
  9. By: Huyên Pham (UPD7 - Université Paris Diderot - Paris 7, LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique); Xavier Warin (EDF R&D - EDF R&D - EDF - EDF, FiME Lab - Laboratoire de Finance des Marchés d'Energie - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CREST - EDF R&D - EDF R&D - EDF - EDF)
    Abstract: This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.
    Keywords: McKean-Vlasov control mean-field neural networks learning on Wasserstein space dynamic programming backward SDE deep learning algorithms, McKean-Vlasov control, mean-field neural networks, learning on Wasserstein space, dynamic programming, backward SDE, deep learning algorithms
    Date: 2022–12–21
  10. By: Fateme Shahabi Nejad; Mohammad Mehdi Ebadzadeh
    Abstract: Applying machine learning methods to forecast stock prices has been one of the research topics of interest in recent years. Almost few studies have been reported based on generative adversarial networks (GANs) in this area, but their results are promising. GANs are powerful generative models successfully applied in different areas but suffer from inherent challenges such as training instability and mode collapse. Also, a primary concern is capturing correlations in stock prices. Therefore, our challenges fall into two main categories: capturing correlations and inherent problems of GANs. In this paper, we have introduced a novel framework based on DRAGAN and feature matching for stock price forecasting, which improves training stability and alleviates mode collapse. We have employed windowing to acquire temporal correlations by the generator. Also, we have exploited conditioning on discriminator inputs to capture temporal correlations and correlations between prices and features. Experimental results on data from several stocks indicate that our proposed method outperformed long short-term memory (LSTM) as a baseline method, also basic GANs and WGAN-GP as two different variants of GANs.
    Date: 2023–01
  11. By: Yves-C\'edric Bauwelinckx; Jan Dhaene; Tim Verdonck; Milan van den Heuvel
    Abstract: Modeling lies at the core of both the financial and the insurance industry for a wide variety of tasks. The rise and development of machine learning and deep learning models have created many opportunities to improve our modeling toolbox. Breakthroughs in these fields often come with the requirement of large amounts of data. Such large datasets are often not publicly available in finance and insurance, mainly due to privacy and ethics concerns. This lack of data is currently one of the main hurdles in developing better models. One possible option to alleviating this issue is generative modeling. Generative models are capable of simulating fake but realistic-looking data, also referred to as synthetic data, that can be shared more freely. Generative Adversarial Networks (GANs) is such a model that increases our capacity to fit very high-dimensional distributions of data. While research on GANs is an active topic in fields like computer vision, they have found limited adoption within the human sciences, like economics and insurance. Reason for this is that in these fields, most questions are inherently about identification of causal effects, while to this day neural networks, which are at the center of the GAN framework, focus mostly on high-dimensional correlations. In this paper we study the causal preservation capabilities of GANs and whether the produced synthetic data can reliably be used to answer causal questions. This is done by performing causal analyses on the synthetic data, produced by a GAN, with increasingly more lenient assumptions. We consider the cross-sectional case, the time series case and the case with a complete structural model. It is shown that in the simple cross-sectional scenario where correlation equals causation the GAN preserves causality, but that challenges arise for more advanced analyses.
    Date: 2023–01
  12. By: Jiwon Kim; Moon-Ju Kang; KangHun Lee; HyungJun Moon; Bo-Kwan Jeon
    Abstract: Recently, there are many trials to apply reinforcement learning in asset allocation for earning more stable profits. In this paper, we compare performance between several reinforcement learning algorithms - actor-only, actor-critic and PPO models. Furthermore, we analyze each models' character and then introduce the advanced algorithm, so called Reward clipping model. It seems that the Reward Clipping model is better than other existing models in finance domain, especially portfolio optimization - it has strength both in bull and bear markets. Finally, we compare the performance for these models with traditional investment strategies during decreasing and increasing markets.
    Date: 2023–01
  13. By: Tanja Aue; Adam Jatowt; Michael F\"arber
    Abstract: Environmental, social and governance (ESG) engagement of companies moved into the focus of public attention over recent years. With the requirements of compulsory reporting being implemented and investors incorporating sustainability in their investment decisions, the demand for transparent and reliable ESG ratings is increasing. However, automatic approaches for forecasting ESG ratings have been quite scarce despite the increasing importance of the topic. In this paper, we build a model to predict ESG ratings from news articles using the combination of multivariate timeseries construction and deep learning techniques. A news dataset for about 3, 000 US companies together with their ratings is also created and released for training. Through the experimental evaluation we find out that our approach provides accurate results outperforming the state-of-the-art, and can be used in practice to support a manual determination or analysis of ESG ratings.
    Date: 2022–11
  14. By: Eugenia Go (World Bank); Kentaro Nakajima (Institute of Innovation Research, Hitotsubashi University); Yasuyuki Sawada (Faculty of Economics, The University of Tokyo); Kiyoshi Taniguchi (Asian Development Bank)
    Abstract: Spatially and seasonally granular measures of local economic activities are increasingly required in a variety of economic analyses. We propose using novel vehicle density data obtained from daytime satellite images to quantify the local economic activity involving human and goods traffic flows. Validation exercises show that vehicle density is a good proxy for local economic levels. We then apply our data to evaluate the impact of a new international airport terminal opening in the Philippines on local economies. The results show that the opening of the new terminal has spatially and seasonally heterogeneous impacts that conventional data cannot ca
    Date: 2023–01
  15. By: Scoggins, Bermond; Robertson, Matthew P.
    Abstract: The scientific method is predicated on transparency - yet the pace at which transparent research practices are being adopted by the scientific community is slow. The replication crisis in psychology showed that published findings employing statistical inference are threatened by undetected errors, data manipulation, and data falsification. To mitigate these problems and bolster research credibility, open data and preregistration practices have gained traction in the natural and social sciences. However, the extent of their adoption in different disciplines are unknown. We introduce procedures to identify the transparency of a research field using large-scale text analysis and machine learning classifiers. Using political science and international relations as an illustrative case, we examine 93, 931 articles across the top 160 political science and international relations journals between 2010 and 2021. We find that approximately 21% of all statistical inference papers have open data and 5% of all experiments are preregistered. Despite this shortfall, the example of leading journals in the field shows that change is feasible and can be effected quickly.
    Date: 2023
  16. By: Stéphane Goutte (Université Paris-Saclay); Viet Hoang Le (Université Paris-Saclay); Fei Liu (IPAG Business School); Hans-Jörg Mettenheim, Von (IPAG Business School)
    Abstract: We analyze the predictability of news sentiment (both general news and ESG-related news) on the return of stocks from European and the potential of applying them as a proper trading strategy over seven years from 2015 to 2022. We find that sentiment indicators extracted from news supplied by GDELT such as Tone, Polarity, and Activity Density show significant relationships to the return of the stock price. Those relationships can be exploited, even in the most naive way, to create trading strategies that can be profitable and outperform the market. Furthermore, those indicators can be used as inputs for more sophisticated machine learning algorithms to create even better-performing trading strategies. Among the indicators, those extracted from ESG-related news tend to show better performance in both cases: when they are used naively or as inputs for machine learning algorithms.
    Keywords: ESG Stock Market Prediction Sentiment Analysis Machine Learning Big Data GDELT, ESG, Stock Market Prediction, Sentiment Analysis, Machine Learning, Big Data, GDELT
    Date: 2023–01–01
  17. By: Stéphane Goutte (Université Paris-Saclay); Viet Hoang Le (Université Paris-Saclay); Fei Liu (IPAG Business School); Hans-Jörg Mettenheim, Von (IPAG Business School)
    Abstract: A large number of modern practices in financial forecasting rely on technical analysis, which involves several heuristics techniques of price charts visual pattern recognition as well as other technical indicators. In this study, we aim to investigate the potential use of those technical information (candlestick information as well as technical indicators) as inputs for machine learning models, especially the state-of-the-art deep learning algorithms, to generate trading signals. To properly address this problem, empirical research is conducted which applies several machine learning methods to 5 years of Bitcoin hourly data from 2017 to 2022. From the result of our study, we confirm the potential of trading strategies using machine learning approaches. We also find that among several machine learning models, deep learning models, specifically the recurrent neural networks, tend to outperform the others in time-series prediction.
    Keywords: Bitcoin Technical Analysis Machine Learning Deep Learning Convolutional Neural Networks Recurrent Neural Network, Bitcoin, Technical Analysis, Machine Learning, Deep Learning, Convolutional Neural Networks, Recurrent Neural Network
    Date: 2023–01–01

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.