nep-big New Economics Papers
on Big Data
Issue of 2023‒02‒13
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Artificial Intelligence & Machine Learning in Finance: A literature review By Wassima Lakhchini; Rachid Wahabi; Mounime El Kabbouri; Casa Bp; Settat Hassan
  2. An overview of machine learning, deep learning, and artificial intelligence By Gebreel, Alia Youssef
  3. Quant 4.0: Engineering Quantitative Investment with Automated, Explainable and Knowledge-driven Artificial Intelligence By Jian Guo; Saizhuo Wang; Lionel M. Ni; Heung-Yeung Shum
  4. Deep Reinforcement Learning: Emerging Trends in Macroeconomics and Future Prospects By Tohid Atashbar; Rui Aruhan Shi
  5. What makes a satisfying life? Prediction and interpretation with machine-learning algorithms By Clark, Andrew E.; D'Ambrosio, Conchita; Gentile, Niccoló; Tkatchenko, Alexandre
  6. Human wellbeing and machine learning By Oparina, Ekaterina; Kaiser, Caspar; Gentile, Niccoló; Tkatchenko, Alexandre; Clark, Andrew E.; De Neve, Jan-Emmanuel; D'Ambrosio, Conchita
  7. Empirical Asset Pricing via Ensemble Gaussian Process Regression By Damir Filipović; Puneet Pasricha
  8. The Role of Government Effectiveness in the Light of ESG Data at Global Level By Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
  9. Measuring the digitalisation of firms: A novel text mining approach By Axenbeck, Janna; Breithaupt, Patrick
  10. Identifying artificial intelligence actors using online data By Hélène Dernis; Flavio Calvino; Laurent Moussiegt; Daisuke Nawa; Lea Samek; Mariagrazia Squicciarini
  11. Autocalibration by balance correction in nonlife insurance pricing By Denuit, Michel; Trufin, Julien
  12. Artificial intelligence and labour market matching By OECD
  13. Deep Reinforcement Learning for Gas Trading By Yuanrong Wang; Yinsen Miao; Alexander CY Wong; Nikita P Granger; Christian Michler
  14. What do we Learn from a Machine Understanding: News Content? Stock Market Reaction to News By Brière, Marie; Huynh, Karen; Laudy, Olav; Pouget, Sébastien
  15. Calibrating Agent-based Models to Microdata with Graph Neural Networks By Farmer, J. Doyne; Dyer, Joel; Cannon, Patrick; Schmon, Sebastian
  16. Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets By Peer Nagy; Jan-Peter Calliess; Stefan Zohren
  17. Nowcasting from Space: Impact of Tropical Cyclones on Fiji’s Agriculture By Noy, Ilan; Blanc, Elodie; Pundit, Madhavi; Uher, Tomas
  18. Bank manager sentiment, loan growth and bank risk By Brückbauer, Frank; Cezanne, Thibault
  19. Nowcasting Stock Implied Volatility with Twitter By Thomas Dierckx; Jesse Davis; Wim Schoutens

  1. By: Wassima Lakhchini (Université Hassan 1er [Settat], ENCGS - Ecole Nationale de Commerce et de Gestion de SETTAT); Rachid Wahabi (Université Hassan 1er [Settat]); Mounime El Kabbouri (Université Hassan 1er [Settat]); Casa Bp; Settat Hassan
    Abstract: In the 2020s, Artificial Intelligence (AI) has been increasingly becoming a dominant technology, and thanks to new computer technologies, Machine Learning (ML) has also experienced remarkable growth in recent years; however, Artificial Intelligence (AI) needs notable data scientist and engineers' innovation to evolve. Hence, in this paper, we aim to infer the intellectual development of AI and ML in finance research, adopting a scoping review combined with an embedded review to pursue and scrutinize the services of these concepts. For a technical literature review, we goose-step the five stages of the scoping review methodology along with Donthu et al.'s (2021) bibliometric review method. This article highlights the trends in AI and ML applications (from 1989 to 2022) in the financial field of both developed and emerging countries. The main purpose is to emphasize the minutiae of several types of research that elucidate the employment of AI and ML in finance. The findings of our study are summarized and developed into seven fields: (1) Portfolio Management and Robo-Advisory, (2) Risk Management and Financial Distress (3), Financial Fraud Detection and Anti-money laundering, (4) Sentiment Analysis and Investor Behaviour, (5) Algorithmic Stock Market Prediction and High-frequency Trading, (6) Data Protection and Cybersecurity, (7) Big Data Analytics, Blockchain, FinTech. Further, we demonstrate in each field, how research in AI and ML enhances the current financial sector, as well as their contribution in terms of possibilities and solutions for myriad financial institutions and organizations. We conclude with a global map review of 110 documents per the seven fields of AI and ML application.
    Keywords: Artificial Intelligence, Machine Learning, Finance, Scoping review, Casablanca Exchange Market
    Date: 2022–12–18
  2. By: Gebreel, Alia Youssef
    Abstract: An overview of machine learning, deep learning, and artificial intelligence
    Date: 2023–01–11
  3. By: Jian Guo; Saizhuo Wang; Lionel M. Ni; Heung-Yeung Shum
    Abstract: Quantitative investment (``quant'') is an interdisciplinary field combining financial engineering, computer science, mathematics, statistics, etc. Quant has become one of the mainstream investment methodologies over the past decades, and has experienced three generations: Quant 1.0, trading by mathematical modeling to discover mis-priced assets in markets; Quant 2.0, shifting quant research pipeline from small ``strategy workshops'' to large ``alpha factories''; Quant 3.0, applying deep learning techniques to discover complex nonlinear pricing rules. Despite its advantage in prediction, deep learning relies on extremely large data volume and labor-intensive tuning of ``black-box'' neural network models. To address these limitations, in this paper, we introduce Quant 4.0 and provide an engineering perspective for next-generation quant. Quant 4.0 has three key differentiating components. First, automated AI changes quant pipeline from traditional hand-craft modeling to the state-of-the-art automated modeling, practicing the philosophy of ``algorithm produces algorithm, model builds model, and eventually AI creates AI''. Second, explainable AI develops new techniques to better understand and interpret investment decisions made by machine learning black-boxes, and explains complicated and hidden risk exposures. Third, knowledge-driven AI is a supplement to data-driven AI such as deep learning and it incorporates prior knowledge into modeling to improve investment decision, in particular for quantitative value investing. Moreover, we discuss how to build a system that practices the Quant 4.0 concept. Finally, we propose ten challenging research problems for quant technology, and discuss potential solutions, research directions, and future trends.
    Date: 2022–12
  4. By: Tohid Atashbar; Rui Aruhan Shi
    Abstract: The application of Deep Reinforcement Learning (DRL) in economics has been an area of active research in recent years. A number of recent works have shown how deep reinforcement learning can be used to study a variety of economic problems, including optimal policy-making, game theory, and bounded rationality. In this paper, after a theoretical introduction to deep reinforcement learning and various DRL algorithms, we provide an overview of the literature on deep reinforcement learning in economics, with a focus on the main applications of deep reinforcement learning in macromodeling. Then, we analyze the potentials and limitations of deep reinforcement learning in macroeconomics and identify a number of issues that need to be addressed in order for deep reinforcement learning to be more widely used in macro modeling.
    Keywords: Reinforcement learning; Deep reinforcement learning; Artificial intelligence, RL; DRL; Learning algorithms; Macro modeling; RL algorithm overview; trust region policy optimization; DRL algorithm; decision process; RL algorithm; Machine learning; Artificial intelligence; Debt relief; General equilibrium models; Global
    Date: 2022–12–16
  5. By: Clark, Andrew E.; D'Ambrosio, Conchita; Gentile, Niccoló; Tkatchenko, Alexandre
    Abstract: Machine Learning (ML) methods are increasingly being used across a variety of fields and have led to the discovery of intricate relationships between variables. We here apply ML methods to predict and interpret life satisfaction using data from the UK British Cohort Study. We discuss the application of first Penalized Linear Models and then one non-linear method, Random Forests. We present two key model-agnostic interpretative tools for the latter method: Permutation Importance and Shapley Values. With a parsimonious set of explanatory variables, neither Penalized Linear Models nor Random Forests produce major improvements over the standard Non-penalized Linear Model. However, once we consider a richer set of controls these methods do produce a non-negligible improvement in predictive accuracy. Although marital status, and emotional health continue to be the most important predictors of life satisfaction, as in the existing literature, gender becomes insignificant in the non-linear analysis.
    Keywords: life satisfaction; well-being; machine learning; British cohort study
    JEL: I31 C63
    Date: 2022–06–07
  6. By: Oparina, Ekaterina; Kaiser, Caspar; Gentile, Niccoló; Tkatchenko, Alexandre; Clark, Andrew E.; De Neve, Jan-Emmanuel; D'Ambrosio, Conchita
    Abstract: There is a vast literature on the determinants of subjective wellbeing. International organisations and statistical offices are now collecting such survey data at scale. However, standard regression models explain surprisingly little of the variation in wellbeing, limiting our ability to predict it. In response, we here assess the potential of Machine Learning (ML) to help us better understand wellbeing. We analyse wellbeing data on over a million respondents from Germany, the UK, and the United States. In terms of predictive power, our ML approaches perform better than traditional models. Although the size of the improvement is small in absolute terms, it is substantial when compared to that of key variables like health. We moreover find that drastically expanding the set of explanatory variables doubles the predictive power of both OLS and the ML approaches on unseen data. The variables identified as important by our ML algorithms - i.e. material conditions, health, and meaningful social relations - are similar to those that have already been identified in the literature. In that sense, our data-driven ML results validate the findings from conventional approaches.
    Keywords: subjective wellbeing; prediction methods; machine learning
    JEL: C63 C53 I31
    Date: 2022–07–20
  7. By: Damir Filipović (Ecole Polytechnique Fédérale de Lausanne; Swiss Finance Institute); Puneet Pasricha (École Polytechnique Fédérale de Lausanne (EPFL))
    Abstract: We introduce an ensemble learning method based on Gaussian Process Regression (GPR) for predicting conditional expected stock returns given stock-level and macro-economic information. Our ensemble learning approach significantly reduces the computational complexity inherent in GPR inference and lends itself to general online learning tasks. We conduct an empirical analysis on a large cross-section of US stocks from 1962 to 2016. We find that our method dominates existing machine learning models statistically and economically in terms of out-of-sample R-squared and Sharpe ratio of prediction-sorted portfolios. Exploiting the Bayesian nature of GPR, we introduce the mean-variance optimal portfolio with respect to the predictive uncertainty distribution of the expected stock returns. It appeals to an uncertainty averse investor and significantly dominates the equal- and value-weighted prediction-sorted portfolios, which outperform the S&P 500.
    Keywords: empirical asset pricing, Gaussian process regression, portfolio selection, ensemble learning, machine learning, firm characteristics
    JEL: C11 C14 C52 C55 G11 G12
    Date: 2022–12
  8. By: Laureti, Lucio; Costantiello, Alberto; Leogrande, Angelo
    Abstract: In this article we estimate the level of Government Effectiveness-GE in 193 countries in the period 2011-2020 using data of the ESG World Bank Database. Different econometric techniques are used i.e. Panel Data with Random Effects, Panel Data with Fixed Effects, and Pooled OLS. Results show that GE is positively related among others to “Control of Corruption”, “Political Stability and Absence of Violence/Terrorism”, and negatively associated with “Percentage Annual GDP Growth”. We perform a cluster analysis with the k-Means algorithm optimized with the Elbow Method and we find the presence of four clusters. Finally, we confront eight machine learning algorithms for the prediction of GE. Results show that the Polynomial Regression is the best predictive algorithm. The value of GE is expected to growth on average by 15.97%.
    Keywords: Analysis of Collective Decision-Making, General, Political Processes: Rent-Seeking, Lobbying, Elections, Legislatures, and Voting Behavior, Bureaucracy, Administrative Processes in Public Organizations, Corruption, Positive Analysis of Policy Formulation, and Implementation.
    JEL: D7 D70 D72 D73 D78
    Date: 2023–01–16
  9. By: Axenbeck, Janna; Breithaupt, Patrick
    Abstract: Due to the omnipresence of digital technologies in the economy, measuring firm digitalisation is of high importance. However, current indicators show several shortcomings, e.g., they lack timeliness and regional granularity. In this study, we show that advances in text mining and comprehensive firm website content can be leveraged to generate real-time and large-scale estimates of firm digitalisation. We use a transfer learning approach to capture the latent definition of digitalisation. For this purpose, we train a random forest regression model on labeled German newspaper articles and apply it on firm's website content. The predictions are used as a continuous indicator for firm digitalisation. Plausibility checks confirm the link to established digitalisation indicators at the firm and sectoral level as well as for firm size classes and regions. Lastly, we illustrate the indicator's potential for giving timely answers to pressing economic issues by analysing the link between digitalisation and firm resilience during the Covid-19 shock.
    Keywords: web-mining, text as data, machine learning, digitalisation
    JEL: C53 C81 O30
    Date: 2022
  10. By: Hélène Dernis; Flavio Calvino; Laurent Moussiegt; Daisuke Nawa; Lea Samek; Mariagrazia Squicciarini
    Abstract: This paper uses information collected and provided by GlassAI to analyse the characteristics and activities of companies and universities in Canada, Germany, the United Kingdom and the United States that mention keywords related to Artificial Intelligence (AI) on their websites. The analysis finds that those companies tend to be young and small, mainly operate in the information and communication sector, have AI at the core of their business, and aim to provide customer solutions. It is noteworthy that the types of AI-related activities reported by them vary across sectors. Additionally, although universities are concentrated in and around large cities, this is not necessarily reflected in the intensity of AI-related activities. Taken together, this novel and timely evidence informs the debate on the most recent stages of digital transformation of the economy.
    Date: 2023–02–03
  11. By: Denuit, Michel (Université catholique de Louvain, LIDAM/ISBA, Belgium); Trufin, Julien (ULB)
    Abstract: By exploiting massive amounts of data, machine learning techniques provide actuaries with predictors exhibiting high correlation with claim frequencies and severities. However, these predictors generally fail to achieve financial equilibrium and thus do not qualify as pure premiums. Autocalibration effectively addresses this issue since it ensures that every group of policyholders paying the same premium is on average self-financing, as demonstrated by Denuit et al. (2021), Ciatto et al. (2022), Lindholm et al. (2022) and Wüthrich (2022). These authors proposed balance correction as a way to make any candidate premium autocalibrated. The present paper further studies the effect of balance correction on resulting pure premiums. It is shown that this method is also beneficial in terms of out-of-sample, or predictive Tweedie deviance, Bregman divergence as well as concentration curves. The paper then derives conditions ensuring that the initial predictor and its balance-corrected version are ordered in Lorenz order. Finally, criteria are proposed to rank the balance-corrected versions of two competing predictors in the convex order.
    Keywords: Tweedie deviance ; Bregman divergence ; financial equilibrium ; convex order ; Lorenz order
    Date: 2022–12–15
  12. By: OECD
    Abstract: While still in its infancy, Artificial Intelligence (AI) is increasingly used in labour market matching, whether by private recruiters, public and private employment services, or online jobs boards and platforms. Applications range from writing job descriptions, applicant sourcing, analysing CVs, chat bots, interview schedulers, shortlisting tools, all the way to facial and voice analysis during interviews. While many tools promise to bring efficiencies and cost savings, they could also improve the quality of matching and jobseeker experience, and even identify and mitigate human bias. There are nonetheless some barriers to a greater adoption of these tools. Some barriers relate to organisation and people readiness, while others reflect concerns about the technology and how it is used, including: robustness, bias, privacy, transparency and explainability. The present paper reviews the literature and some recent policy developments in this field, while bringing new evidence from interviews held with key stakeholders.
    Keywords: Artificial Intelligence, Employment Services, Human Resources, Matching, Recruitment
    JEL: J01 J20 J60 J70
    Date: 2023–01–30
  13. By: Yuanrong Wang; Yinsen Miao; Alexander CY Wong; Nikita P Granger; Christian Michler
    Abstract: Deep Reinforcement Learning (Deep RL) has been explored for a number of applications in finance and stock trading. In this paper, we present a practical implementation of Deep RL for trading natural gas futures contracts. The Sharpe Ratio obtained exceeds benchmarks given by trend following and mean reversion strategies as well as results reported in literature. Moreover, we propose a simple but effective ensemble learning scheme for trading, which significantly improves performance through enhanced model stability and robustness as well as lower turnover and hence lower transaction cost. We discuss the resulting Deep RL strategy in terms of model explainability, trading frequency and risk measures.
    Date: 2023–01
  14. By: Brière, Marie; Huynh, Karen; Laudy, Olav; Pouget, Sébastien
    Abstract: Using textual data extracted by Causality Link platform from a large variety of news sources (news stories, call transcripts, broker re-search, etc.), we build aggregate news signals that take into account the tone, the tense and the prominence of various news statements about a given firm. We test the informational content of these signals and examine how news is incorporated into stock prices. Our sample covers 1, 701, 789 news-based signals that were built on 4, 460 US stocks over the period January 2014 to December 2021. We document large and significant market reactions around the publication of news, with some evidence of return predictability at short horizons. News about the future drives much larger reactions than news about the present or the past. Stock returns also react more to high-coverage news, fresh news and purely financial news. Finally, firms’ size matters: stocks that are not components of the Russell 1000 index experience larger reactions to news compared to those that are Russell 1000 components. Implications of our results for financial analysts and investors are of-fered and related to the links between news, firms’ market value and investment strategies.
    Keywords: Natural Language Processing; Textual Analysis; Efficient Market Hypothesis; ESG
    Date: 2023–01–19
  15. By: Farmer, J. Doyne; Dyer, Joel; Cannon, Patrick; Schmon, Sebastian
    Abstract: Calibrating agent-based models (ABMs) to data is among the most fundamental requirements to ensure the model fulfils its desired purpose. In recent years, simulation-based inference methods have emerged as powerful tools for performing this task when the model likelihood function is intractable, as is often the case for ABMs. In some real-world use cases of ABMs, both the observed data and the ABM output consist of the agents' states and their interactions over time. In such cases, there is a tension between the desire to make full use of the rich information content of such granular data on the one hand, and the need to reduce the dimensionality of the data to prevent difficulties associated with high-dimensional learning tasks on the other. A possible resolution is to construct lower-dimensional time-series through the use of summary statistics describing the macrostate of the system at each time point. However, a poor choice of summary statistics can result in an unacceptable loss of information from the original dataset, dramatically reducing the quality of the resulting calibration. In this work, we instead propose to learn parameter posteriors associated with granular microdata directly using temporal graph neural networks. We will demonstrate that such an approach offers highly compelling inductive biases for Bayesian inference using the raw ABM microstates as output.
    Date: 2022–06
  16. By: Peer Nagy; Jan-Peter Calliess; Stefan Zohren
    Abstract: We employ deep reinforcement learning (RL) to train an agent to successfully translate a high-frequency trading signal into a trading strategy that places individual limit orders. Based on the ABIDES limit order book simulator, we build a reinforcement learning OpenAI gym environment and utilise it to simulate a realistic trading environment for NASDAQ equities based on historic order book messages. To train a trading agent that learns to maximise its trading return in this environment, we use Deep Duelling Double Q-learning with the APEX (asynchronous prioritised experience replay) architecture. The agent observes the current limit order book state, its recent history, and a short-term directional forecast. To investigate the performance of RL for adaptive trading independently from a concrete forecasting algorithm, we study the performance of our approach utilising synthetic alpha signals obtained by perturbing forward-looking returns with varying levels of noise. Here, we find that the RL agent learns an effective trading strategy for inventory management and order placing that outperforms a heuristic benchmark trading strategy having access to the same signal.
    Date: 2023–01
  17. By: Noy, Ilan (School of Economics and Finance, Victoria University of Wellington); Blanc, Elodie (Motu Economic and Public Policy Research, Wellington); Pundit, Madhavi (Asian Development Bank); Uher, Tomas (School of Economics and Finance, Victoria University of Wellington)
    Abstract: The standard approach to ‘nowcast’ disaster impacts, which relies on risk models, does not typically account for the compounding impact of various hazard phenomena (e.g., wind and rainfall associated with tropical storms). The alternative, traditionally, has been a team of experts sent to the affected areas to conduct a ground survey, but this is time-consuming, difficult, and costly. Satellite imagery may provide an easily available and accurate data source to gauge disasters’ specific impacts, which is both cheap, fast, and can account for compound and cascading effects. If accurate enough, it can potentially replace components of ground surveys altogether. An approach that has been calibrated with remote sensing imagery can also be used as a component in a nowcasting tool, to assess the impact of a cyclone, based only on its known trajectory, and even before post-event satellite imagery is available. We use one example to investigate the feasibility of this approach for nowcasting, and for post-disaster damage assessment. We focus on Fiji and on its agriculture sector, and on tropical cyclones (TCs). We link remote sensing data with available household surveys and the agricultural census data to obtain an improved assessment of TC impacts. We show that remote sensing data, when combined with pre-event socioeconomic and demographic data, can be used for both nowcasting and post-disaster damage assessments.1
    Keywords: satellite; cyclone; damage; impact; disaster; nowcasting
    JEL: C80 Q10 Q54
    Date: 2023–01–20
  18. By: Brückbauer, Frank; Cezanne, Thibault
    Abstract: We build a textual score measuring the tone of bank earnings press release documents. We use this measure to define bank manager sentiment as the variation in the textual tone score which is orthogonal to bank-specific and macroeconomic fundamentals. Using this definition of sentiment, we present evidence on how bank managers' systematic overoptimism affects the amount of credit that they supply to the real sector. Our empirical evidence suggests that decisions on the volume of new loans partially depend on past realizations of economic fundamentals, implying that loan growth and contemporaneous economic fundamentals might be systematically disconnected. Furthermore, we show that over-optimism on the part of bank managers spills over to their equity investors, who seem to perceive banks with high bank manager sentiment as having a lower systemic risk.
    Keywords: sentiment, text data, extrapolation, loan growth, systemic risk
    JEL: G00 G10 G21 G41
    Date: 2022
  19. By: Thomas Dierckx; Jesse Davis; Wim Schoutens
    Abstract: In this study, we predict next-day movements of stock end-of-day implied volatility using random forests. Through an ablation study, we examine the usefulness of different sources of predictors and expose the value of attention and sentiment features extracted from Twitter. We study the approach on a stock universe comprised of the 165 most liquid US stocks diversified across the 11 traditional market sectors using a sizeable out-of-sample period spanning over six years. In doing so, we uncover that stocks in certain sectors, such as Consumer Discretionary, Technology, Real Estate, and Utilities are easier to predict than others. Further analysis shows that possible reasons for these discrepancies might be caused by either excess social media attention or low option liquidity. Lastly, we explore how our proposed approach fares throughout time by identifying four underlying market regimes in implied volatility using hidden Markov models. We find that most added value is achieved in regimes associated with lower implied volatility, but optimal regimes vary per market sector.
    Date: 2022–12

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.