nep-big New Economics Papers
on Big Data
Issue of 2022‒08‒08
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Monetary policy press releases: an international comparison By Mario Gonzalez and Raul Cruz Tadle; Raul Cruz Tadle
  2. Using Artificial Intelligence in the workplace: What are the main ethical risks? By Angelica Salvi del Pero; Peter Wyckoff; Ann Vourc'h
  3. The Exports of Knowledge Intensive Services. A Complex Metric Approach By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
  4. Predicting Stock Price Movement after Disclosure of Corporate Annual Reports: A Case Study of 2021 China CSI 300 Stocks By Fengyu Han; Yue Wang
  5. Narrative-Driven Fluctuations in Sentiment: Evidence Linking Traditional and Social Media By Macaulay, Alistair; Song, Wenting
  6. Deep Reinforcement Learning for Optimal Investment and Saving Strategy Selection in Heterogeneous Profiles: Intelligent Agents working towards retirement By Fatih Ozhamaratli; Paolo Barucca
  7. Dynamics of a Binary Option Market with Exogenous Information and Price Sensitivity By Hannah Gampe; Christopher Griffin
  8. Deep Multiple Instance Learning For Forecasting Stock Trends Using Financial News By Yiqi Deng; Siu Ming Yiu
  9. Robust Knockoffs for Controlling False Discoveries With an Application to Bond Recovery Rates By Konstantin G\"orgen; Abdolreza Nazemi; Melanie Schienle
  10. A Constructive GAN-based Approach to Exact Estimate Treatment Effect without Matching By Boyang You; Kerry Papps
  11. Prediction Machines, Insurance, and Protection: An Alternative Perspective on AI's Role in Production By Ajay K. Agrawal; Joshua S. Gans; Avi Goldfarb
  12. The Prediction of Diabetes By Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Leogrande, Angelo
  13. Quantum Monte Carlo for Economics: Stress Testing and Macroeconomic Deep Learning By Vladimir Skavysh; Sofia Priazhkina; Diego Guala; Thomas Bromley
  14. On the universality of the volatility formation process: when machine learning and rough volatility agree By Mathieu Rosenbaum; Jianfei Zhang
  15. Dynamic Early Warning and Action Model By Mueller, H.; Rauh, C.; Ruggieri, A.;
  16. Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading By Zitao Song; Xuyang Jin; Chenliang Li
  17. Applications of Reinforcement Learning in Finance -- Trading with a Double Deep Q-Network By Frensi Zejnullahu; Maurice Moser; Joerg Osterrieder
  18. A Data Science Pipeline for Algorithmic Trading: A Comparative Study of Applications for Finance and Cryptoeconomics By Luyao Zhang; Tianyu Wu; Saad Lahrichi; Carlos-Gustavo Salas-Flores; Jiayi Li
  19. Communication, monetary policy, and financial markets in Mexico By Ana Aguilar; Fernando Pérez-Cervantes

  1. By: Mario Gonzalez and Raul Cruz Tadle; Raul Cruz Tadle
    Abstract: Around the world, several countries have adopted inflation targeting as their monetary policy framework. These institutions set their target interest rates in monetary policy meetings. These decisions are then circulated through press releases that explain the policy rationale. The information contained in the press releases includes current policies, economic outlook, and signals about likely future policies. In this paper, using linguistic methods, such as Latent Dirichlet Allocation (LDA) and semi-automated content analysis, we examine the information contained in the monetary press releases of inflation targeting countries. In addition, we build a custom dictionary for analyzing monetary policy press releases. Using Semi-automated Content Analysis, we then develop a measure, which we refer to as the Sentiment Score index, that quantifies the policy tilt implied in the information provided in the press releases. We find that for a significant majority of the in flation targeting countries, the index provides additional information that helps predict monetary policy rate movements.
    Keywords: central bank, financial market, monetary policy, communication
    JEL: E44 E52 E58
    Date: 2022–06
  2. By: Angelica Salvi del Pero (OECD); Peter Wyckoff (OECD); Ann Vourc'h
    Abstract: Artificial Intelligence (AI) systems are changing workplaces. AI systems have the potential to improve workplaces, but ensuring trustworthy use of AI in the workplace means addressing the ethical risks it can raise. This paper reviews possible risks in terms of human rights (privacy, fairness, agency and dignity); transparency and explainability; robustness, safety and security; and accountability. The paper also reviews ongoing policy action to promote trustworthy use of AI in the workplace. Existing legislation to ensure ethical workplaces must be enforced effectively, and serve as the foundation for new policy. Economy- and society-wide initiatives on AI, such as the EU AI Act and standard-setting, can also play a role. New workplace-specific measures and collective agreements can help fill remaining gaps.
    JEL: J01 J08 J2 J7 O3
    Date: 2022–07–08
  3. By: Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio
    Abstract: In the following article, the value of the "Knowledge Intensive Services Exports in Europe" in 36 European countries is estimated. The data were analyzed through a set of econometric models or: Poled OLS, Dynamic Panel, Panel Data with Fixed Effects, Panel Data with Random Effects, WLS. The results show that “Knowledge Intensive Services Exports” is negatively associated, among others, with "Buyer Sophistication", "Government Procurement of Advanced Technology Products", and positively associated with the following variables i.e. "Innovation Index", "Sales Impacts" and "Total Entrepreneurial Activity". Then a clusterization with k-Means algorithm was made with the Elbow method. The results show the presence of 3 clusters. A network analysis was later built and 4 complex network structures and three structures with simplified networks were detected. To predict the future trend of the variable, a comparison was made with eight different machine learning algorithms. The results show that prediction with Augmented Data-AD is more efficient than prediction with Original Data-AD with a reduction of the mean of statistical errors equal to 55,94%.
    Keywords: Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
    JEL: O3 O30 O31 O32 O33 O34
    Date: 2022–06–11
  4. By: Fengyu Han; Yue Wang
    Abstract: In the current stock market, computer science and technology are more and more widely used to analyse stocks. Not same as most related machine learning stock price prediction work, this work study the predicting the tendency of the stock price on the second day right after the disclosure of the companies' annual reports. We use a variety of different models, including decision tree, logistic regression, random forest, neural network, prototypical networks. We use two sets of financial indicators (key and expanded) to conduct experiments, these financial indicators are obtained from the EastMoney website disclosed by companies, and finally we find that these models are not well behaved to predict the tendency. In addition, we also filter stocks with ROE greater than 0.15 and net cash ratio greater than 0.9. We conclude that according to the financial indicators based on the just-released annual report of the company, the predictability of the stock price movement on the second day after disclosure is weak, with maximum accuracy about 59.6% and maximum precision about 0.56 on our test set by the random forest classifier, and the stock filtering does not improve the performance. And random forests perform best in general among all these models which conforms to some work's findings.
    Date: 2022–06
  5. By: Macaulay, Alistair; Song, Wenting
    Abstract: This paper studies the role of narratives for macroeconomic fluctuations. Microfounding narratives as directed acyclic graphs, we show how exposure to different narratives can affect expectations in an otherwise-standard macroeconomic framework. We identify such competing narratives in news media reports on the US yield curve inversion in 2019, using techniques in natural language processing. Linking this to data from Twitter, we show that exposure to the narrative of an imminent recession causes consumers to display a more pessimistic sentiment, while exposure to a more neutral narrative implies no such change in sentiment. Applying the same technique to media narratives on inflation, we estimate that a shift to a viral narrative of inflation damaging the real economy in 2021 accounts for 42% of the fall in consumer sentiment in the second half of the year.
    Keywords: conomic narratives, sentiment, yield curve, inflation, natural language processing, twitter, social media
    JEL: C8 D8 D84 D91 E3 E31 E32 E4 E43 E44 E5 G1
    Date: 2022–06–29
  6. By: Fatih Ozhamaratli (University College London); Paolo Barucca (University College London)
    Abstract: The transition from defined benefit to defined contribution pension plans shifts the responsibility for saving toward retirement from governments and institutions to the individuals. Determining optimal saving and investment strategy for individuals is paramount for stable financial stance and for avoiding poverty during work-life and retirement, and it is a particularly challenging task in a world where form of employment and income trajectory experienced by different occupation groups are highly diversified. We introduce a model in which agents learn optimal portfolio allocation and saving strategies that are suitable for their heterogeneous profiles. We use deep reinforcement learning to train agents. The environment is calibrated with occupation and age dependent income evolution dynamics. The research focuses on heterogeneous income trajectories dependent on agent profiles and incorporates the behavioural parameterisation of agents. The model provides a flexible methodology to estimate lifetime consumption and investment choices for heterogeneous profiles under varying scenarios.
    Date: 2022–06
  7. By: Hannah Gampe; Christopher Griffin
    Abstract: In this paper, we derive and analyze a continuous of a binary option market with exogenous information. The resulting non-linear system has a discontinuous right hand side, which can be analyzed using zero-dimensional Filippov surfaces. Under general assumptions on purchasing rules, we show that when exogenous information is constant in the binary asset market, the price always converges. We then investigate market prices in the case of changing information, showing empirically that price sensitivity has a strong effect on price lag vs. information. We conclude with open questions on general $n$-ary option markets. As a by-product of the analysis, we show that these markets are equivalent to a simple recurrent neural network, helping to explain some of the predictive power associated with prediction markets, which are usually designed as $n$-ary option markets.
    Date: 2022–05
  8. By: Yiqi Deng; Siu Ming Yiu
    Abstract: A major source of information can be taken from financial news articles, which have some correlations about the fluctuation of stock trends. In this paper, we investigate the influences of financial news on the stock trends, from a multi-instance view. The intuition behind this is based on the news uncertainty of varying intervals of news occurrences and the lack of annotation in every single financial news. Under the scenario of Multiple Instance Learning (MIL) where training instances are arranged in bags, and a label is assigned for the entire bag instead of instances, we develop a flexible and adaptive multi-instance learning model and evaluate its ability in directional movement forecast of Standard & Poors 500 index on financial news dataset. Specifically, we treat each trading day as one bag, with certain amounts of news happening on each trading day as instances in each bag. Experiment results demonstrate that our proposed multi-instance-based framework gains outstanding results in terms of the accuracy of trend prediction, compared with other state-of-art approaches and baselines.
    Date: 2022–06
  9. By: Konstantin G\"orgen; Abdolreza Nazemi; Melanie Schienle
    Abstract: We address challenges in variable selection with highly correlated data that are frequently present in finance, economics, but also in complex natural systems as e.g. weather. We develop a robustified version of the knockoff framework, which addresses challenges with high dependence among possibly many influencing factors and strong time correlation. In particular, the repeated subsampling strategy tackles the variability of the knockoffs and the dependency of factors. Simultaneously, we also control the proportion of false discoveries over a grid of all possible values, which mitigates variability of selected factors from ad-hoc choices of a specific false discovery level. In the application for corporate bond recovery rates, we identify new important groups of relevant factors on top of the known standard drivers. But we also show that out-of-sample, the resulting sparse model has similar predictive power to state-of-the-art machine learning models that use the entire set of predictors.
    Date: 2022–06
  10. By: Boyang You; Kerry Papps
    Abstract: Matching has become the mainstream in counterfactual inference, with which selection bias between sample groups can be significantly eliminated. However in practice, when estimating average treatment effect on the treated (ATT) via matching, no matter which method, the trade-off between estimation accuracy and information loss constantly exist. Attempting to completely replace the matching process, this paper proposes the GAN-ATT estimator that integrates generative adversarial network (GAN) into counterfactual inference framework. Through GAN machine learning, the probability density functions (PDFs) of samples in both treatment group and control group can be approximated. By differentiating conditional PDFs of the two groups with identical input condition, the conditional average treatment effect (CATE) can be estimated, and the ensemble average of corresponding CATEs over all treatment group samples is the estimate of ATT. Utilizing GAN-based infinite sample augmentations, problems in the case of insufficient samples or lack of common support domains can be easily solved. Theoretically, when GAN could perfectly learn the PDFs, our estimators can provide exact estimate of ATT. To check the performance of the GAN-ATT estimator, three sets of data are used for ATT estimations: Two toy data sets with 1/2 dimensional covariate inputs and constant/covariate-dependent treatment effect are tested. The estimates of GAN-ATT are proved close to the ground truth and are better than traditional matching approaches; A real firm-level data set with high-dimensional input is tested and the applicability towards real data sets is evaluated by comparing matching approaches. Through the evidences obtained from the three tests, we believe that the GAN-ATT estimator has significant advantages over traditional matching methods in estimating ATT.
    Date: 2022–06
  11. By: Ajay K. Agrawal; Joshua S. Gans; Avi Goldfarb
    Abstract: Recent advances in AI represent improvements in prediction. We examine how decision-making and risk management strategies change when prediction improves. The adoption of AI may cause substitution away from risk management activities used when rules are applied (rules require always taking the same action), instead allowing for decision-making (choosing actions based on the predicted state). We provide a formal model evaluating the impact of AI and how risk management, stakes, and inter-related tasks affect AI adoption. The broad conclusion is that AI adoption can be stymied by existing processes designed to address uncertainty. In particular, many processes are designed to enable coordinated decision-making among different actors in an organization. AI can make coordination even more challenging. However, when the cost of changing such processes falls, then the returns from AI adoption increase.
    JEL: D81 O32
    Date: 2022–06
  12. By: Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Leogrande, Angelo
    Abstract: The following article presents an analysis of the determinants of diabetes using a dataset containing the surveys of 2000 patients from the Frankfurt Hospital in Germany. The data were analyzed using the following models, namely: Tobit, Probit, Logit, Multinomial Logit, OLS, WLS with heteroskedasticity. The results show that the presence of diabetes is positively associated with "Pregnancies", "Glucose", "BMI", "Diabetes Pedigree Function", "Age" and negatively associated with "Blood Pressure". A cluster analysis is realized using the fuzzy c-Means algorithm optimized with the Elbow method and three clusters were found. Finally a confrontation among eight different machine learning algorithms is realized to select the best performing algorithm to predict the probability of patients to develop diabetes.
    Keywords: Machine Learning, Clusterization, Elbow Method, Prediction, Correlation Matrix, Principal Component Analysis, Binary and non-Binary regression models.
    JEL: I10 I11 I12 I13 I14 I15 I18
    Date: 2022–06–13
  13. By: Vladimir Skavysh; Sofia Priazhkina; Diego Guala; Thomas Bromley
    Abstract: Computational methods both open the frontiers of economic analysis and serve as a bottleneck in what can be achieved. Using the quantum Monte Carlo (QMC) algorithm, we are the first to study whether quantum computing can improve the run time of economic applications and challenges in doing so. We identify a large class of economic problems suitable for improvements. Then, we illustrate how to formulate and encode on quantum circuit two applications: (a) a bank stress testing model with credit shocks and fire sales and (b) a dynamic stochastic general equilibrium (DSGE) model solved with deep learning, and further demonstrate potential efficiency gain. We also present a few innovations in the QMC algorithm itself and in how to benchmark it to classical MC.
    Keywords: Business fluctuations and cycles; Central bank research; Econometric and statistical methods; Economic models; Financial stability
    Date: 2022–06
  14. By: Mathieu Rosenbaum; Jianfei Zhang
    Abstract: We train an LSTM network based on a pooled dataset made of hundreds of liquid stocks aiming to forecast the next daily realized volatility for all stocks. Showing the consistent outperformance of this universal LSTM relative to other asset-specific parametric models, we uncover nonparametric evidences of a universal volatility formation mechanism across assets relating past market realizations, including daily returns and volatilities, to current volatilities. A parsimonious parametric forecasting device combining the rough fractional stochastic volatility and quadratic rough Heston models with fixed parameters results in the same level of performance as the universal LSTM, which confirms the universality of the volatility formation process from a parametric perspective.
    Date: 2022–06
  15. By: Mueller, H.; Rauh, C.; Ruggieri, A.;
    Abstract: This document presents the outcome of two modules developed for the UK Foreign, Commonwealth Development Office (FCDO): 1) a forecast model which uses machine learning and text downloads to predict outbreaks and intensity of internal armed conflict. 2) A decision making module that embeds these forecasts into a model of preventing armed conflict damages. The outcome is a quantitative benchmark which should provide a testing ground for internal FCDO debates on both strategic levels (i.e. the process of deciding on country priorities) and operational levels (i.e. identifying critical periods by the country experts). Our method allows the FCDO to simulate policy interventions and changes in its strategic focus. We show, for example, that the FCDO should remain engaged in recently stabilized armed conflicts and re-think its development focus in countries with the highest risks. The total expected economic benefit of reinforced preventive efforts, as defined in this report, would bring monthly savings in expected costs of 26 billion USD with a monthly gain to the UK of 630 million USD.
    Keywords: dynamic optimisation, forecasting, internal armed conflict, prevention
    Date: 2022–06–14
  16. By: Zitao Song; Xuyang Jin; Chenliang Li
    Abstract: In recent years, many practitioners in quantitative finance have attempted to use Deep Reinforcement Learning (DRL) to build better quantitative trading (QT) strategies. Nevertheless, many existing studies fail to address several serious challenges, such as the non-stationary financial environment and the bias and variance trade-off when applying DRL in the real financial market. In this work, we proposed Safe-FinRL, a novel DRL-based high-freq stock trading strategy enhanced by the near-stationary financial environment and low bias and variance estimation. Our main contributions are twofold: firstly, we separate the long financial time series into the near-stationary short environment; secondly, we implement Trace-SAC in the near-stationary financial environment by incorporating the general retrace operator into the Soft Actor-Critic. Extensive experiments on the cryptocurrency market have demonstrated that Safe-FinRL has provided a stable value estimation and a steady policy improvement and reduced bias and variance significantly in the near-stationary financial environment.
    Date: 2022–06
  17. By: Frensi Zejnullahu; Maurice Moser; Joerg Osterrieder
    Abstract: This paper presents a Double Deep Q-Network algorithm for trading single assets, namely the E-mini S&P 500 continuous futures contract. We use a proven setup as the foundation for our environment with multiple extensions. The features of our trading agent are constantly being expanded to include additional assets such as commodities, resulting in four models. We also respond to environmental conditions, including costs and crises. Our trading agent is first trained for a specific time period and tested on new data and compared with the long-and-hold strategy as a benchmark (market). We analyze the differences between the various models and the in-sample/out-of-sample performance with respect to the environment. The experimental results show that the trading agent follows an appropriate behavior. It can adjust its policy to different circumstances, such as more extensive use of the neutral position when trading costs are present. Furthermore, the net asset value exceeded that of the benchmark, and the agent outperformed the market in the test set. We provide initial insights into the behavior of an agent in a financial domain using a DDQN algorithm. The results of this study can be used for further development.
    Date: 2022–06
  18. By: Luyao Zhang; Tianyu Wu; Saad Lahrichi; Carlos-Gustavo Salas-Flores; Jiayi Li
    Abstract: Recent advances in Artificial Intelligence (AI) have made algorithmic trading play a central role in finance. However, current research and applications are disconnected information islands. We propose a generally applicable pipeline for designing, programming, and evaluating the algorithmic trading of stock and crypto assets. Moreover, we demonstrate how our data science pipeline works with respect to four conventional algorithms: the moving average crossover, volume-weighted average price, sentiment analysis, and statistical arbitrage algorithms. Our study offers a systematic way to program, evaluate, and compare different trading strategies. Furthermore, we implement our algorithms through object-oriented programming in Python3, which serves as open-source software for future academic research and applications.
    Date: 2022–06
  19. By: Ana Aguilar; Fernando Pérez-Cervantes
    Abstract: We determine if the communication of private banks to their clients with financial interests in Mexico changes or not after Mexico's Central Bank communicates its monetary policy decision (MPD) and also two weeks later, with the publication of the minutes of Mexico's Central Bank monetary policy decision (MMPD) between 2011 and 2019. We use unsupervised Natural Language Processing (NLP) techniques to turn the text that private banks send to their clients about the Mexican economy into vectors of topics. We find that every time, private banks cover a large diversity of topics and words before the MMPD with no evident consensus of topics, and that almost always the quantities of terms and topics are reduced and repeated by almost every bank after the MMPD indicating some surprise (notable exception: the liftoff in December 2015), and that the topics vary depending on the date of the MMPD. The fact that private banks discuss the same topics and write to their clients with sentences that contain the exact same words indicates that the private banks react to the MMPD, independent of their opinion about the Central Bank's statements. We also found weak evidence that a measure of the size of the changes in the private bank's communication with their clients is positively correlated to changes in the long-term yields but negatively correlated to the size of exchange rate movements.
    Keywords: natural language processing, unsupervised sentence embedding, central bank communication, Mexico
    JEL: C6 E5 E6
    Date: 2022–06

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.