nep-big New Economics Papers
on Big Data
Issue of 2025–07–21
eleven papers chosen by
Tom Coupé, University of Canterbury


  1. Machine Learning for Stock Price Prediction on the Casablanca Stock Exchange: A Comparative Study of ANN and LSTM Approaches By Imad Talhartit; Sanae Ait Jillali; Mounime El Kabbouri
  2. Enhancing inflation nowcasting with online search data: a random forest application for Colombia By Felipe Roldán-Ferrín; Julián A. Parra-Polania
  3. Using DSGE and Machine Learning to Forecast Public Debt for France. By Emmanouil SOFIANOS; Thierry BETTI; Emmanouil Theophilos PAPADIMITRIOU; Amélie BARBIER-GAUCHARD; Periklis GOGAS
  4. The financial instability - Monetary policy nexus: Evidence from the FOMC minutes By Kanelis, Dimitrios; Kranzmann, Lars H.; Siklos, Pierre L.
  5. Trading Costs v. Indicative Liquidity in the Off-the-Run Treasury Market By Oleg Sokolinskiy
  6. Emotion in euro area monetary policy communication and bond yields: The Draghi era By Kanelis, Dimitrios; Siklos, Pierre L.
  7. Gauging the Sentiment of Federal Open Market Committee Communications through the Eyes of the Financial Press By Shantanu Banerjee; Paul Cordova; Michiel De Pooter; Olesya V. Grishchenko
  8. Nutritional Benefits of Fostering: Evidence from Longitudinal Data in South Africa By Dumas, Christelle; Gautrain, Elsa; Gosselin-Pali, Adrien
  9. Federal Reserve Communication and the COVID-19 Pandemic By Jonathan Benchimol; Sophia Kazinnik; Yossi Saadon
  10. Gradient-Based Reinforcement Learning for Dynamic Quantile By Lukas Janasek
  11. FEELING THE HEARTBEAT OF REGIONS: LOCAL NEWS AND ECONOMIC SENTIMENTS By Tom Broekel

  1. By: Imad Talhartit (Université Hassan 1er [Settat], Ecole Nationale de Commerce et Gestion - Settat, Laboratory of Finance, Audit and Organizational Governance Research); Sanae Ait Jillali (Université Hassan 1er [Settat], Ecole Nationale de Commerce et Gestion - Settat); Mounime El Kabbouri
    Abstract: Capital markets play a fundamental role in the economy by facilitating the flow of funds between investors with capital surpluses and those with financing needs. However, these markets' inherent complexity and high volatility-amplified by economic crises and geopolitical events-make decision-making particularly challenging. In this context, artificial intelligence (AI), especially machine learning (ML) and deep learning (DL), has become increasingly relevant for modeling complex financial time series such as stock prices. Among various learning approaches, Long Short-Term Memory (LSTM) networks stand out for their ability to capture long-term dependencies in sequential data. This study compares the predictive performance of LSTM and Artificial Neural Networks (ANN) models, on ten stocks comprising the MADEX index of the Casablanca Stock Exchange, across three forecasting horizons (10, 20, and 30 days). Results demonstrate that the LSTM model consistently outperforms the ANN model in terms of accuracy and trend detection. For instance, over a 30-day horizon, the LSTM correctly predicted 8 out of 10 stocks, compared to only 4 for the ANN. This work is part of a broader research effort aimed at identifying the most effective model for stock price forecasting. Building on the results of this and previous studies, particularly those involving LSTM models optimized using genetic algorithms, future research will explore other models such as Gated Recurrent Units (GRU) and Support Vector Machines (SVM) to further enhance prediction accuracy and robustness.
    Keywords: Stock price forecasting, Casablanca Stock Exchange, Long Short-Term Memory (LSTM), Artificial Neural Networks (ANN), Prediction accuracy
    Date: 2025–05–09
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-05063012
  2. By: Felipe Roldán-Ferrín; Julián A. Parra-Polania
    Abstract: This paper evaluates the predictive capacity of a machine learning model based on Random Forests (RF), combined with Google Trends (GT) data, for nowcasting monthly inflation in Colombia. The proposed RF-GT model is trained using historical inflation data, macroeconomic indicators, and internet search activity. After optimizing the model’s hyperparameters through time series cross-validation, we assess its out-of-sample performance over the period 2023–2024. The results are benchmarked against traditional approaches, including SARIMA, Ridge, and Lasso regressions, as well as professional forecasts from the Banco de la República’s monthly survey of financial analysts (MES). In terms of forecast accuracy, the RF-GT model consistently outperforms the statistical models and performs comparably to the analysts’ median forecast, while offering the additional advantage of producing predictions approximately one and a half weeks earlier. These findings highlight the practical value of integrating alternative data sources and machine learning techniques into the inflation monitoring toolkit of emerging economies. *****RESUMEN: Este artículo evalúa la capacidad predictiva de un modelo de aprendizaje automático basado en Random Forest (RF), combinado con datos de Google Trends (GT), para realizar nowcasting de la inflación mensual en Colombia. El modelo propuesto, denominado RF-GT, se entrena utilizando datos históricos de inflación, indicadores macroeconómicos y actividad de búsqueda en internet. Tras la optimización de los hiperparámetros mediante validación cruzada para series de tiempo, se evalúa su desempeño fuera de muestra durante el periodo 2023–2024. Los resultados se comparan con enfoques tradicionales, incluidos los modelos SARIMA, regresiones Ridge y Lasso, así como con los pronósticos profesionales de la Encuesta Mensual de Expectativas (EME) del Banco de la República. En términos de precisión predictiva, el modelo RF-GT supera de forma consistente a los modelos estadísticos y muestra un desempeño comparable al pronóstico mediano de los analistas, con la ventaja adicional de generar predicciones aproximadamente semana y media antes. Estos hallazgos destacan el valor práctico de integrar fuentes de datos alternativas y técnicas de aprendizaje automático en los sistemas de monitoreo de inflación de economías emergentes.
    Keywords: Inflation, Nowcasting, Forecasting, Random Forest, Google Trends, Machine Learning, Inflación, Pronóstico en Tiempo Real, Pronóstico, Bosques Aleatorios, Tendencias de Google, aprendizaje automático
    JEL: C14 C53 E17 E31 E37
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:bdr:borrec:1318
  3. By: Emmanouil SOFIANOS; Thierry BETTI; Emmanouil Theophilos PAPADIMITRIOU; Amélie BARBIER-GAUCHARD; Periklis GOGAS
    Abstract: Forecasting public debt is essential for effective policymaking and economic stability, yet traditional approaches face challenges due to data scarcity. While machine learning (ML) has demonstrated success in financial forecasting, its application to macroeconomic forecasting remains underexplored, hindered by short historical time series and low-frequency (e.g., quarterly/annual) data availability. This study proposes a novel hybrid framework integrating Dynamic Stochastic General Equilibrium (DSGE) modeling with ML techniques to address these limitations, focusing on the evolution of France’s public debt. We first generate a large synthetic macroeconomic dataset using an estimated DSGE model for France, which allows for efficient training of ML algorithms. These trained models are then applied to actual historical data for directional debt forecasting. The results show that the best machine learning model is an XGBoost achieving 90% accuracy. Our results highlight the viability of combining structural economic models with data-driven techniques to improve macroeconomic forecasting.
    Keywords: DSGE, Machine Learning, Public Debt, Forecasting, France.
    JEL: C53 E27 E37 H63 H68
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:ulp:sbbeta:2025-18
  4. By: Kanelis, Dimitrios; Kranzmann, Lars H.; Siklos, Pierre L.
    Abstract: We analyze how financial stability concerns discussed during Federal Open Market Committee (FOMC) meetings influence the Federal Reserve's monetary policy imple- mentation and communication. Utilizing large language models (LLMs) to analyze FOMC minutes from 1993 to 2022, we measure both mandate-related and financial stability-related sentiment within a unified framework, enabling a nuanced examina- tion of potential links between these two objectives. Our results indicate an increase in financial stability concerns following the Great Financial Crisis, particularly dur- ing periods of monetary tightening and the COVID-19 pandemic. Outside the zero lower bound (ZLB), heightened financial stability concerns are associated with a reduc- tion in the federal funds rate, while within the ZLB, they correlate with a tightening of unconventional measures. Methodologically, we introduce a novel labeled dataset that supports a contextualized LLM interpretation of FOMC documents and apply explainable AI techniques to elucidate the model's reasoning.
    Keywords: Explainable Artificial Intelligence, Financial Stability, FOMC Deliberations, Monetary Policy Communication, Natural Language Processing
    JEL: E44 E52 E58
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:bubdps:319627
  5. By: Oleg Sokolinskiy
    Abstract: This paper estimates trading costs in the off-the-run Treasury market using comprehensive transactions data and machine learning techniques. The analysis reveals several key findings that enhance the understanding of the off-the-run Treasury market liquidity. First, the indicative bid-ask spread is shown to be a biased measure of liquidity, even when not considering transaction volume. Specifically, bid-ask spreads systematically overstate trading costs of more seasoned Treasuries, and the liquidity of benchmark, on-the-run securities affects how off-the-run bid-ask spreads map to trading costs. Second, the paper demonstrates that trading costs may scale non-monotonically with transaction volume, which suggests selective, opportunistic liquidity-taking. Additionally, transaction size has greater effect on off-the-run securities' trading costs when benchmark, on-the-run liquidity is lower. Finally, indicative bid-ask spreads may notably overstate trading costs for larger orders of relatively less liquid securities. These findings contribute to our understanding of actual liquidity in the off-the-run Treasury market, while highlighting the limitations of a traditional liquidity measure. By providing a more nuanced view of trading costs, this study contributes valuable insights for supporting financial stability and optimal asset allocation.
    Keywords: Liquidity; Treasury market; Off-the-run; Effective bid-ask spread
    JEL: G10 G12
    Date: 2025–07–07
    URL: https://d.repec.org/n?u=RePEc:fip:fedgfe:2025-49
  6. By: Kanelis, Dimitrios; Siklos, Pierre L.
    Abstract: We combine modern methods from Speech Emotion Recognition and Natural Language Processing with high-frequency financial data to precisely analyze how the vocal emo- tions and language of ECB President Mario Draghi affect the yields and yield spreads of major euro area economies. This novel approach to central bank communication reveals that vocal and verbal emotions significantly impact the yield curve, with effects varying in magnitude and direction. Our results reveal an important asymmetry in yield changes with positive signals raising German, French, and Spanish yields, while negative cues increase Italian yields. Our analysis of bond spreads and equity mar- kets indicates that positive communication influences the risk-free yield component, whereas negative communication affects the risk premium. Additionally, our study contributes by constructing a synchronized dataset for voice and language analysis.
    Keywords: Artificial Intelligence, Asset Prices, Communication, ECB, High-Frequency Data, Speech Emotion Recognition
    JEL: E50 E58 G12 G14
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:bubdps:320429
  7. By: Shantanu Banerjee; Paul Cordova; Michiel De Pooter; Olesya V. Grishchenko
    Abstract: We apply natural language processing tools to news articles in the financial press to construct a sentiment index—an index of the perceived semantic orientation of monetary policy communications around scheduled Federal Open Market Committee (FOMC) meetings. To that end, we develop several dictionaries that capture various monetary policy tools: conventional monetary policy, asset purchases, and forward guidance. The surprises in the sentiment index around FOMC meetings announcements explain variation in major asset prices classes between May 1999 and November 2022. Sentiment index surprises are important for explaining the variation in asset prices beyond monetary policy surprises.
    Keywords: Textual analysis; Semantic orientation; Sentiment index; Federal Reserve; FOMC; Hawkish; Dovish; Asset prices; Policy expectations; Conventional monetary policy; Asset purchases; Forward guidance; Zero-lower-bound; COVID
    JEL: E00 E40 E58 G12
    Date: 2025–07–07
    URL: https://d.repec.org/n?u=RePEc:fip:fedgfe:2025-48
  8. By: Dumas, Christelle; Gautrain, Elsa; Gosselin-Pali, Adrien
    Abstract: In sub-Saharan Africa, child fostering-a widespread practice in which a child moves out of the household of her biological parents-can have significant implications for a child's overall well-being. Using longitudinal data from South Africa that includes individual tracking, we employ double machine learning techniques to evaluate the impact of fostering on nutrition, addressing biases related to selection into treatment and endogenous attrition, two common challenges in the literature. Our findings reveal that fostering reduces the probability of being stunted by 6.8 percentage points, corresponding to a 37 percent reduction compared to the mean prevalence. This improvement appears to be driven by foster children relocating to smaller, rural households, often including retired individuals, typically grandparents, who receive a pension. Furthermore, we find that it not only enhances the nutritional status of foster children but also benefits the nutrition of other children from sending households, suggesting that fostering can be mutually beneficial for both groups.
    Keywords: Child Fostering, Nutrition, Machine Learning, South Africa
    JEL: I15 J12 J13 O15 C14
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:glodps:1628
  9. By: Jonathan Benchimol; Sophia Kazinnik; Yossi Saadon
    Abstract: In this study, we examine the Federal Reserve’s communication strategies during the COVID-19 pandemic, comparing them with communication during previous periods of economic stress. Using specialized dictionaries tailored to COVID-19, unconventional monetary policy (UMP), and financial stability, combined with sentiment analysis and topic modeling techniques, we identify a distinct focus in Fed communication during the pandemic on financial stability, market volatility, social welfare, and UMP, characterized by notable contextual uncertainty. Through comparative analysis, we juxtapose the Fed’s communication during the COVID-19 crisis with its responses during the dot-com and global financial crises, examining content, sentiment, and timing dimensions. Our findings reveal that Fed communication and policy actions were more reactive to the COVID-19 crisis than to previous crises. Additionally, declining sentiment related to financial stability in interest rate announcements and minutes anticipated subsequent accommodative monetary policy decisions. We further document that communicating about UMP has become the “new normal†for the Fed’s Federal Open Market Committee meeting minutes and Chairman’s speeches since the Global Financial Crisis, reflecting an institutional adaptation in communication strategy following periods of economic distress. These findings contribute to our understanding of how central bank communication evolves during crises and how communication strategies adapt to exceptional economic circumstances.
    Keywords: central bank communication, unconventional monetary policy, financial stability, text mining, COVID-19
    JEL: C55 E44 E58 E63
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:een:camaaa:2025-38
  10. By: Lukas Janasek (Institute of Economic Studies, Charles University, Prague, Czech Republic)
    Abstract: This paper develops a novel gradient-based reinforcement learning algorithm for solving dynamic quantile models with uncertainty. Unlike traditional approaches that rely on expected utility maximization, we focus on agents who evaluate outcomes based on specific quantiles of the utility distribution, capturing intratemporal risk attitudes via a quantile level ? ? (0, 1). We formulate a recursive quantile value function associated with time consistent dynamic quantile preferences in Markov decision process. At each period, the agent aims to maximize the quantile of a distribution composed of instantaneous utility combined with the discounted future value, conditioned on the current state. Next, we adapt the Actor-Critic framework to learn ?-quantile of the distribution and policy maximizing the ?-quantile. We demonstrate the accuracy and robustness of the proposed algorithm using an quantile intertemporal consumption model with known analytical solutions. The results confirm the effectiveness of our algorithm in capturing optimal quantile-based behavior and stability of the algorithm.
    Keywords: Dynamic programming, Quantile preferences, Reinforcement learning
    JEL: C61 C63
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:fau:wpaper:wp2025_12
  11. By: Tom Broekel
    Abstract: Timely and spatially detailed indicators of regional economic activity are limited. This paper introduces the Regional Economic News Sentiments (REGENS) index, a high-frequency measure based on geocoded news headlines from 300+ German-language outlets since 2019. REGENS captures local economic sentiment, aligns with national indicators, and significantly leads regional unemployment by up to four months. While its link to GDP growth is weaker, it consistently reflects regional economic patterns. The study highlights how media signals contribute to understanding economic development and illustrates the potential of text-based indicators to sharpen the spatial and temporal resolution of regional monitoring.
    Keywords: news, media, sentiments, regions, regional development
    JEL: R11 C55 O33
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:egu:wpaper:2519

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.