nep-big New Economics Papers
on Big Data
Issue of 2023‒10‒09
thirteen papers chosen by
Tom Coupé, University of Canterbury

  1. Novissi Togo - Harnessing Artificial Intelligence to Deliver Shock-Responsive Social Protection By Lawson, Cina; Koudeka, Morlé; Cardenas Martinez, Ana Lucia; Alberro Encinas, Luis Inaki; Karippacheril, Tina George
  2. Applying Deep Learning to Calibrate Stochastic Volatility Models By Abir Sridi; Paul Bilokon
  3. Media Moments and Corporate Connections: A Deep Learning Approach to Stock Movement Classification By Luke Sanborn; Matthew Sahagun
  4. Introducing the $\sigma$-Cell: Unifying GARCH, Stochastic Fluctuations and Evolving Mechanisms in RNN-based Volatility Forecasting By German Rodikov; Nino Antulov-Fantulin
  5. A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions By Christopher Bockel-Rickermann; Sam Verboven; Tim Verdonck; Wouter Verbeke
  6. Can Unbiased Predictive AI Amplify Bias? By Tanvir Ahmed Khan
  7. The effect of green energy, global environmental indexes, and stock markets in predicting oil price crashes: Evidence from explainable machine learning By Sami Ben Jabeur; Rabeh Khalfaoui; Wissal Ben Arfi
  8. What Predicts the Growth of Small Firms? Evidence from Tanzanian Commercial Loan Data By Mia Ellis; Cynthia Kinnan; Margaret S. McMillan; Sarah Shaukat
  9. DeepVol: A Deep Transfer Learning Approach for Universal Asset Volatility Modeling By Chen Liu; Minh-Ngoc Tran; Chao Wang; Richard Gerlach; Robert Kohn
  10. Executive AI Literacy: A Text-Mining Approach to Understand Existing and Demanded AI Skills of Leaders in Unicorn Firms By Pinski, Marc; Hofmann, Thomas; Benlian, Alexander
  11. A Novel Matching Algorithm for Academic Patent Paper Pairs: An Exploratory Study of Japan's national research universities and laboratories. By Van-Thien Nguyen; René Carraz
  12. Forecasting International Financial Stress: The Role of Climate Risks By Santino Del Fava; Rangan Gupta; Christian Pierdzioch; Lavinia Rognone
  13. On the Impact of Feeding Cost Risk in Aquaculture Valuation and Decision Making By Christian Oliver Ewald; Kevin Kamm

  1. By: Lawson, Cina; Koudeka, Morlé; Cardenas Martinez, Ana Lucia; Alberro Encinas, Luis Inaki; Karippacheril, Tina George
    Abstract: This case study, jointly authored by the Government of Togo and the World Bank, documents the innovative features of the NOVISSI program and posits some directions for the way forward. The study examines how Togo leveraged artificial intelligence and machine learning methods to prioritize the rural poor in the absence of a shock-responsive social protection delivery system and a dynamic social registry. It also discusses the main challenges of the model and the risks and implications of implementing such a program.
    Date: 2023–09–01
  2. By: Abir Sridi; Paul Bilokon
    Abstract: Stochastic volatility models, where the volatility is a stochastic process, can capture most of the essential stylized facts of implied volatility surfaces and give more realistic dynamics of the volatility smile or skew. However, they come with the significant issue that they take too long to calibrate. Alternative calibration methods based on Deep Learning (DL) techniques have been recently used to build fast and accurate solutions to the calibration problem. Huge and Savine developed a Differential Deep Learning (DDL) approach, where Machine Learning models are trained on samples of not only features and labels but also differentials of labels to features. The present work aims to apply the DDL technique to price vanilla European options (i.e. the calibration instruments), more specifically, puts when the underlying asset follows a Heston model and then calibrate the model on the trained network. DDL allows for fast training and accurate pricing. The trained neural network dramatically reduces Heston calibration's computation time. In this work, we also introduce different regularisation techniques, and we apply them notably in the case of the DDL. We compare their performance in reducing overfitting and improving the generalisation error. The DDL performance is also compared to the classical DL (without differentiation) one in the case of Feed-Forward Neural Networks. We show that the DDL outperforms the DL.
    Date: 2023–09
  3. By: Luke Sanborn; Matthew Sahagun
    Abstract: The financial industry poses great challenges with risk modeling and profit generation. These entities are intricately tied to the sophisticated prediction of stock movements. A stock forecaster must untangle the randomness and ever-changing behaviors of the stock market. Stock movements are influenced by a myriad of factors, including company history, performance, and economic-industry connections. However, there are other factors that aren't traditionally included, such as social media and correlations between stocks. Social platforms such as Reddit, Facebook, and X (Twitter) create opportunities for niche communities to share their sentiment on financial assets. By aggregating these opinions from social media in various mediums such as posts, interviews, and news updates, we propose a more holistic approach to include these "media moments" within stock market movement prediction. We introduce a method that combines financial data, social media, and correlated stock relationships via a graph neural network in a hierarchical temporal fashion. Through numerous trials on current S&P 500 index data, with results showing an improvement in cumulative returns by 28%, we provide empirical evidence of our tool's applicability for use in investment decisions.
    Date: 2023–09
  4. By: German Rodikov; Nino Antulov-Fantulin
    Abstract: This paper introduces the $\sigma$-Cell, a novel Recurrent Neural Network (RNN) architecture for financial volatility modeling. Bridging traditional econometric approaches like GARCH with deep learning, the $\sigma$-Cell incorporates stochastic layers and time-varying parameters to capture dynamic volatility patterns. Our model serves as a generative network, approximating the conditional distribution of latent variables. We employ a log-likelihood-based loss function and a specialized activation function to enhance performance. Experimental results demonstrate superior forecasting accuracy compared to traditional GARCH and Stochastic Volatility models, making the next step in integrating domain knowledge with neural networks.
    Date: 2023–09
  5. By: Christopher Bockel-Rickermann; Sam Verboven; Tim Verdonck; Wouter Verbeke
    Abstract: In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.
    Date: 2023–09
  6. By: Tanvir Ahmed Khan
    Abstract: Predictive AI is increasingly used to guide decisions on agents. I show that even a bias-neutral predictive AI can potentially amplify exogenous (human) bias in settings where the predictive AI represents a cost-adjusted precision gain to unbiased predictions, and the final judgments are made by biased human evaluators. In the absence of perfect and instantaneous belief updating, expected victims of bias become less likely to be saved by randomness under more precise predictions. An increase in aggregate discrimination is possible if this effect dominates. Not accounting for this mechanism may result in AI being unduly blamed for creating bias.
    Keywords: artificial intelligence, AI, algorithm, human-machine interactions, discrimination, bias, algorithmic bias, financial institutions
    JEL: O33 J15 G2
    Date: 2023–07
  7. By: Sami Ben Jabeur (ESDES - ESDES, Lyon Business School - UCLy - UCLy - Université Catholique de Lyon (UCLy), UR CONFLUENCE : Sciences et Humanités (EA 1598) - UCLy - Université Catholique de Lyon (UCLy)); Rabeh Khalfaoui (ICN Business School); Wissal Ben Arfi (EDC - EDC Paris Business School)
    Date: 2021–11
  8. By: Mia Ellis; Cynthia Kinnan; Margaret S. McMillan; Sarah Shaukat
    Abstract: Not all firms have equal capacity to absorb productive credit. Identifying those with higher potential may have large consequences for productivity. We collect detailed survey data on small- and medium-sized Tanzanian firms who borrow from a large commercial bank, which in turn raises funds via international capital markets. Using machine learning methods to identify predictors of loan growth, we document, first, that we achieve high rates of predictive power. Second, “soft” information (entrepreneurs’ motivations for entrepreneurship and constraints faced) has predictive power over and above administrative data (sector, age, etc.). Third, there is a different and larger set of predictors for women than men, consistent with greater barriers to efficient capital allocation among female entrepreneurs.
    JEL: G14 J16 O16
    Date: 2023–08
  9. By: Chen Liu; Minh-Ngoc Tran; Chao Wang; Richard Gerlach; Robert Kohn
    Abstract: This paper introduces DeepVol, a promising new deep learning volatility model that outperforms traditional econometric models in terms of model generality. DeepVol leverages the power of transfer learning to effectively capture and model the volatility dynamics of all financial assets, including previously unseen ones, using a single universal model. This contrasts to the prevailing practice in econometrics literature, which necessitates training separate models for individual datasets. The introduction of DeepVol opens up new avenues for volatility modeling and forecasting in the finance industry, potentially transforming the way volatility is understood and predicted.
    Date: 2023–09
  10. By: Pinski, Marc; Hofmann, Thomas; Benlian, Alexander
    Date: 2023
  11. By: Van-Thien Nguyen; René Carraz
    Abstract: This paper proposes a new method for matching patents with academic publications to create patent-paper pairs (PPP). These pairs can identify instances where a research result is both applied in a patent and published in a paper. The study focuses on a sample of top research-intensive universities and laboratories in Japan, utilizing a new dataset that contains patent-to-article citations and a machine learning model as part of the matching process. Expert consultations were conducted to enhance the robustness of the methodology. Focusing on a set of 14 Japanese universities and 3 national research laboratories, using patent (USPTO) and publication data (OpenAlex) between 1998 and 2018, we built a dataset of 3, 177 PPPs out of 7, 766 granted patents and 91, 213 publications. The results demonstrate that this phenomenon is widespread in academia and our data show the diversity of the academic disciplines and technical field involved, highlighting the intricate connections between scientific and technical concepts and communities. On the methodological side, we documented in-depth complementary validation techniques to enhance the precision and reliability of our matching algorithm. Using open-source data, our methodology is adaptable to diverse national contexts and can be readily adopted by other research teams investigating similar topics.
    Keywords: Patent Paper Pair; Methodology; Matching algorithm; Academic patent; Japan.
    Date: 2023
  12. By: Santino Del Fava (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany); Lavinia Rognone (University of Edinburgh Business School, 29 Buccleuch Place, Edinburgh, EH8 9JS, United Kingdom)
    Abstract: We study the predictive value of climate risks for subsequent financial stress in a sample of daily data running from October 2006 to December 2022 of thirteen countries, which include China, ten European Union (EU) countries, the United Kingdom (UK), and the United States (US). The climate risk indicators are the result of a text-based approach which combines the term frequency-inverse document frequency and the cosine-similarity techniques. Given the persistence of financial stress as well as the importance of spillover effects of financial stress from other countries, we use random forests, a machine-learning technique tailored to handle many predictors, to estimate our forecasting models. Our findings show that climate risks tend to have a moderate impact, albeit in several cases statistically significant, on predictive accuracy, which tends to be stronger, in our cross-section of countries, on a daily than at a weekly or monthly forecast horizon of financial stress. Furthermore, the predictive value of climate risks for financial stress is heterogeneous across the countries in our sample, implying that a univariate forecasting model appears to be better suited than a corresponding multivariate one. Finally, the predictive value of climate risks for financial stress appears to be stronger in several countries at the lower conditional quantiles of financial stress.
    Keywords: Financial stress, Climate risks, Random forests, Forecasting
    JEL: C22 C32 C53 G15 Q54
    Date: 2023–09
  13. By: Christian Oliver Ewald; Kevin Kamm
    Abstract: We study the effect of stochastic feeding costs on animal-based commodities with particular focus on aquaculture. More specifically, we use soybean futures to infer on the stochastic behaviour of salmon feed, which we assume to follow a Schwartz-2-factor model. We compare the decision of harvesting salmon using a decision rule assuming either deterministic or stochastic feeding costs, i.e. including feeding cost risk. We identify cases, where accounting for stochastic feeding costs leads to significant improvements as well as cases where deterministic feeding costs are a good enough proxy. Nevertheless, in all of these cases, the newly derived rules show superior performance, while the additional computational costs are negligible. From a methodological point of view, we demonstrate how to use Deep-Neural-Networks to infer on the decision boundary that determines harvesting or continuation, improving on more classical regression-based and curve-fitting methods. To achieve this we use a deep classifier, which not only improves on previous results but also scales well for higher dimensional problems, and in addition mitigates effects due to model uncertainty, which we identify in this article. effects due to model uncertainty, which we identify in this article.
    Date: 2023–09

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.