nep-big New Economics Papers
on Big Data
Issue of 2025–04–07
seven papers chosen by
Tom Coupé, University of Canterbury


  1. Machine Learning Methods in Algorithmic Trading: An Experimental Evaluation of Supervised Learning Techniques for Stock Price By Maheronnaghsh, Mohammad Javad; Gheidi, Mohammad Mahdi; Younesi, Abolfazl; Fazli, MohammadAmin
  2. Empowering financial supervision: a SupTech experiment using machine learning in an early warning system By Andrés Alonso-Robisco; Andrés Azqueta-Gavaldón; José Manuel Carbó; José Luis González; Ana Isabel Hernáez; José Luis Herrera; Jorge Quintana; Javier Tarancón
  3. A New Approach to Textual Analysis using Large Language Models: Application to the Analysis of Recent Wage and Price Developments in Japan By Kimihiko Izawa; Ikuo Kamei; Nao Shibata; Yusuke Takahashi; Shunichi Yoneyama
  4. Word2Prices: embedding central bank communications for inflation prediction By Douglas Kiarelly Godoy de Araujo; Nikola Bokan; Fabio Alberto Comazzi; Michele Lenza
  5. Unraveling Financial Fragility of Global Markets Using Machine Learning By Vasilios Plakandaras; Rangan Gupta; Qiang Ji
  6. Beyond six digits: Automated tariff line HS transposition using Natural Language Processing By Bayona, Pamela
  7. Pacific Islands Pacific Observatory, Monitoring Economic Activity by Nighttime Light Data By World Bank

  1. By: Maheronnaghsh, Mohammad Javad; Gheidi, Mohammad Mahdi; Younesi, Abolfazl; Fazli, MohammadAmin
    Abstract: In the dynamic world of financial markets, accurate price predictions are essential for informed decision-making. This research proposal outlines a comprehensive study aimed at forecasting stock and currency prices using state-of-the-art Machine Learning (ML) techniques. By delving into the intricacies of models such as Transformers, LSTM, Simple RNN, NHits, and NBeats, we seek to contribute to the realm of financial forecasting, offering valuable insights for investors, financial analysts, and researchers. This article provides an in-depth overview of our methodology, data collection process, model implementations, evaluation metrics, and potential applications of our research findings. The research indicates that NBeats and NHits models exhibit superior performance in financial forecasting tasks, especially with limited data, while Transformers require more data to reach full potential. Our findings offer insights into the strengths of different ML techniques for financial prediction, highlighting specialized models like NBeats and NHits as top performers - thus informing model selection for real-world applications. To enhance readability, all acronyms used in the paper are defined below: ML: Machine Learning LSTM: Long Short-Term Memory RNN: Recurrent Neural Network NHits: Neural Hierarchical Interpolation for Time Series Forecasting NBeats: Neural Basis Expansion Analysis for Time Series ARIMA: Autoregressive Integrated Moving Average GARCH: Generalized Autoregressive Conditional Heteroskedasticity SVMs: Support Vector Machines CNNs: Convolutional Neural Networks MSE: Mean Squared Error MAE: Mean Absolute Error RMSE: Recurrent Mean Squared Error API: Application Programming Interface F1-score: F1 Score GRU: Gated Recurrent Unit yfinance: Yahoo Finance (a Python library for fetching financial data)
    Date: 2023–09–30
    URL: https://d.repec.org/n?u=RePEc:osf:osfxxx:dzp26_v1
  2. By: Andrés Alonso-Robisco (BANCO DE ESPAÑA); Andrés Azqueta-Gavaldón (BANCO DE ESPAÑA); José Manuel Carbó (BANCO DE ESPAÑA); José Luis González (BANCO DE ESPAÑA); Ana Isabel Hernáez (BANCO DE ESPAÑA); José Luis Herrera (BANCO DE ESPAÑA); Jorge Quintana (BANCO DE ESPAÑA); Javier Tarancón (BANCO DE ESPAÑA)
    Abstract: New technologies have made available a vast amount of new data in the form of text, recording an exponentially increasing share of human and corporate behavior. For financial supervisors, the information encoded in text is a valuable complement to the more traditional balance sheet data typically used to track the soundness of financial institutions. In this study, we exploit several natural language processing (NLP) techniques as well as network analysis to detect anomalies in the Spanish corporate system, identifying both idiosyncratic and systemic risks. We use sentiment analysis at the corporate level to detect sentiment anomalies for specific corporations (idiosyncratic risks), while employing a wide range of network metrics to monitor systemic risks. In the realm of supervisory technology (SupTech), anomaly detection in sentiment analysis serves as a proactive tool for financial authorities. By continuously monitoring sentiment trends, SupTech applications can provide early warnings of potential financial distress or systemic risks.
    Keywords: suptech, natural language processing, machine learning, network analysis, sentiment
    JEL: C63 D81 G21
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:bde:opaper:2504
  3. By: Kimihiko Izawa (Bank of Japan); Ikuo Kamei (Bank of Japan); Nao Shibata (Bank of Japan); Yusuke Takahashi (Bank of Japan); Shunichi Yoneyama (Bank of Japan)
    Abstract: This paper examines whether textual data analysis using Large Language Models (LLMs) can be applied to assessing economic activity and prices in light of the rapid development of LLMs in recent years. LLMs have advantages in that there are a wide range of models available for use without large initial costs and that these models, which have already acquired basic knowledge of language, can analyze any topic or text and are beginning to be used in economic analysis more widely, including those of central banks. This paper, as an example, attempts to use LLMs to analyze recent wage and price developments in Japan using comments from the Cabinet Office's Economy Watchers Survey. The results suggest that the cause of increasing selling prices is gradually shifting from raw material costs to labor costs.
    Date: 2025–03–24
    URL: https://d.repec.org/n?u=RePEc:boj:bojrev:rev25e05
  4. By: Douglas Kiarelly Godoy de Araujo; Nikola Bokan; Fabio Alberto Comazzi; Michele Lenza
    Abstract: Word embeddings are vectors of real numbers associated with words, designed to capture semantic and syntactic similarity between the words in a corpus of text. We estimate the word embeddings of the European Central Bank's introductory statements at monetary policy press conferences by using a simple natural language processing model (Word2Vec), only based on the information and model parameters available as of each press conference. We show that a measure based on such embeddings contributes to improve core inflation forecasts multiple quarters ahead. Other common textual analysis techniques, such as dictionary-based metrics or sentiment metrics do not obtain the same results. The information contained in the embeddings remains valuable for out-of-sample forecasting even after controlling for the central bank inflation forecasts, which are an important input for the introductory statements.
    Keywords: embeddings, inflation, forecasting, central bank texts
    JEL: E31 E37 E58
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:bis:biswps:1253
  5. By: Vasilios Plakandaras (Department of Economics, Democritus University of Thrace, Komotini, Greece); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Qiang Ji (Institutes of Science and Development, Chinese Academy of Sciences, Beijing, China; School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing, 100049, China)
    Abstract: The study investigates systemic financial risk in global markets, attributing it to geopolitical instability, climate risks, and economic uncertainties. Utilizing a state-of-the-art machine learning heterogeneous panel regression framework capable of capturing cross-sectional dependencies and nonlinear patterns, we examine financial stress across multiple economies, including China, the U.S., the U.K., and ten EU nations. Through extensive out-of-sample rolling window analysis, we show that while geopolitical uncertainty enhances short-term predictions, long-term risk forecasting is better achieved using financial and economic data. The study underscores the limitations of conventional regression models in capturing financial risk dynamics and suggests that machine learning-based panel regressions provide a more nuanced and accurate forecasting tool. The findings bear significant policy implications, highlighting the necessity for regulatory bodies to reassess risk frameworks and the role of climate-related disclosures in financial markets.
    Keywords: Systemic financial risk, machine learning, forecasting, climate risk, geopolitical risk
    JEL: C45 C58 G17
    Date: 2025–03
    URL: https://d.repec.org/n?u=RePEc:pre:wpaper:202511
  6. By: Bayona, Pamela
    Abstract: This paper explores the application of Natural Language Processing (NLP) techniques to automate Harmonized System (HS) tariff line transposition, employing a three-stage process: unique 1:1 tariff code matching (Round 1), exact description matching (Round 2), and "smart" description matching (Round 3) using Artificial Intelligence (AI) and lexical similarity methods paired with harmonized 6- digit concordance and cosine similarity. Similarity is calculated using either Term Frequency Inverse Document Frequency (TF-IDF) vectors or Sentence-BERT (SBERT) embeddings, comparing two scenarios: a straightforward case (Economy A) with standardized descriptions, and a complex case (Economy B), with more detailed technical descriptions. Results indicate that automated HS transposition can significantly augment the efficiency of traditionally manual methods, reducing processing time from two to three weeks to approximately half a day (up to 30 times faster). The overall accuracy rate is 99.6% for the simpler scenario and 98.8% for the complex one, for a standard set of approximately 10, 000 HS codes. While non-AI techniques cover most of the accurate matches, AI-based Round 3 techniques address cases requiring the most manual effort. SBERT generally outperforms TF-IDF, however including subheadings tends to reduce its accuracy. In certain cases, particularly for highly technical tariffs, TF-IDF's straightforward approach provides an advantage over SBERT. Overall, NLP techniques hold significant potential for improving HS transposition methods and facilitating the development of richer tariffs and trade datasets to enable more in-depth analyses. Future research should focus on refining these techniques across diverse datasets to optimize their broader application in tariff and trade data analysis.
    Keywords: Harmonized System, tariff line, HS transposition, correlation tables, concordance, natural language processing
    JEL: F10 F13
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:wtowps:314422
  7. By: World Bank
    Keywords: Poverty Reduction-Small Area Estimation Poverty Mapping Industry-Industrial and Market Data and Reporting Macroeconomics and Economic Growth-Economic Theory & Research Poverty Reduction-Poverty and Policy
    Date: 2023–05
    URL: https://d.repec.org/n?u=RePEc:wbk:wboper:39810

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.