nep-big New Economics Papers
on Big Data
Issue of 2025–10–06
twelve papers chosen by
Tom Coupé, University of Canterbury


  1. Affine Feedforward Stochastic (AFS) Neural Network By Gouriéroux, Christian; Monfort, Alain
  2. Harnessing artificial intelligence for monitoring financial markets By Matteo Aquilina; Douglas Kiarelly Godoy de Araujo; Gaston Gelos; Taejin Park; Fernando Perez-Cruz
  3. Structural changes and statistical causal relationships in agricultural commodities markets: the impact of public news sentiment and institutional announcements By Ioannis Chalkiadakis; Gareth W Peters; Guillaume Bagnarosa; Alexandre Gohin
  4. Systematic risk profiling: A novel approach with applications to Kenya, Rwanda, and Malawi By Mukashov, Askar; Robinson, Sherman; Thurlow, James; Arndt, Channing; Thomas, Timothy S.
  5. What Hinders Electric Vehicle Diffusion? Insights from a Neural Network Approach By Monica Bonacina; Mert Demir; Antonio Sileo; Angela Zanoni
  6. Measuring non-workers’ labor market attachment with machine learning By Nicolás Forteza; Sergio Puente
  7. What can newspaper articles reveal about the euro area economy? By Saiz, Lorena; Magro, Manuel Medina
  8. Optimal placement of wind farms via quantile constraint learning By Feng, Wenxiu; Alcántara Mata, Antonio; Ruiz Mora, Carlos
  9. The look of success: AI-measured face factors and venture financing By Liudmila Alekseeva; Silvia Dalla Fontana; Caroline Genc; Lin Peng
  10. From Job Titles to ISCO Codes: Enhancing Occupational Classification With RAG-based LLMs By Bach, Ruben L.; Klamm, Christopher; Heyne, Stefanie; Kogan, Irena; Kononykhina, Olga; Jarck, Jana
  11. Beyond content: Investors' chatter, interaction and earnings announcement returns By Gaul, Johannes; Schrader, Pascal
  12. Revealing the Power of Market-Based Energy Policy: Evidence from China’s Energy Quota Trading System Using Machine Learning By Yantuan Yu; Ning Zhang

  1. By: Gouriéroux, Christian; Monfort, Alain
    Abstract: The aim of this paper is to link the machine learning method of multilayer perceptron (MLP) neural network with the classical analysis of stochastic state space models. We consider a special class of state space models with multiple layers based on affine conditional Laplace transforms. This new class of Affine Feedforward Stochastic (AFS) neural network provides closed form recursive formulas for recursive filtering of the state variables of different layers. This approach is suitable for online inference by stochastic gradient ascent optimization and for recursive computation of scores such as backpropagation. The approach is extended to recurrent neural networks and identification issues are discussed.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:tse:wpaper:130941
  2. By: Matteo Aquilina; Douglas Kiarelly Godoy de Araujo; Gaston Gelos; Taejin Park; Fernando Perez-Cruz
    Abstract: Predicting financial market stress has long proven to be a largely elusive goal. Advances in artificial intelligence and machine learning offer new possibilities to tackle this problem, given their ability to handle large datasets and unearth hidden nonlinear patterns. In this paper, we develop a new approach based on a combination of a recurrent neural network (RNN) and a large language model. Focusing on deviations from triangular arbitrage parity (TAP) in the Euro-Yen currency pair, our RNN produces interpretable daily forecasts of market dysfunction 60 business days ahead. To address the "black box" limitations of RNNs, our model assigns data-driven, time-varying weights to the input variables, making its decision process transparent. These weights serve a dual purpose. First, their evolution in and of itself provides early signals of latent changes in market dynamics. Second, when the network forecasts a higher probability of market dysfunction, these variable-specific weights help identify relevant market variables that we use to prompt an LLM to search for relevant information about potential market stress drivers.
    Keywords: market dysfunction, liquidity, arbitrage, artificial intelligence, financial stability
    JEL: G14 G15 G17
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:bis:biswps:1291
  3. By: Ioannis Chalkiadakis (ISC-PIF - Institut des Systèmes Complexes - Paris Ile-de-France - ENS Cachan - École normale supérieure - Cachan - UP1 - Université Paris 1 Panthéon-Sorbonne - X - École polytechnique - IP Paris - Institut Polytechnique de Paris - Institut Curie [Paris] - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique); Gareth W Peters (UC Santa Barbara - University of California [Santa Barbara] - UC - University of California); Guillaume Bagnarosa (ESC [Rennes] - ESC Rennes School of Business); Alexandre Gohin (SMART - Structures et Marché Agricoles, Ressources et Territoires - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement - Institut Agro Rennes Angers - Institut Agro - Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement)
    Abstract: Novel empirical evidence is studied for the way the agricultural commodities futures markets process information. The significant effect of institutional announcements, such as those of the United States Department of Agriculture (USDA), on the participants in such markets has been well documented in the literature. However, existing studies consider measures of market ‘surprise' or analysts' ‘sentiment' that do not stem directly from unstructured text in official reports or public news. In this work, we aim to verify the structural changes incurred in the corn and wheat markets by the release of the USDA reports while considering higher-order structural information of several market-related processes. Furthermore, we investigate whether there is evidence for statistical causality relationships between the market reaction, in terms of price, volume and volatility, and market participants' sentiment induced by public news. To address these goals we rely on a recently published efficient algorithm for statistical causality analysis in multivariate time-series based on Gaussian Processes [Zaremba, A.B. and Peters, G.W., Statistical causality for multivariate nonlinear time series via Gaussian process models. Methodol. Comput. Appl. Probab., 2022, 1–46. https://doi.org/10.1007/s11009-022-09928-3.]. Market and public news text signals are jointly modeled as a Gaussian Process, whose properties we leverage to study linear and non-linear causal effects between the different time-series signals. The participants' sentiment is extracted from public news data via methods developed in the area of statistical machine learning known as Natural Language Processing (NLP). A novel framework for text-to-time-series embedding is employed [Chalkiadakis, I., Zaremba, A., Peters, G.W. and Chantler, M.J., On-chain analytics for sentiment-driven statistical causality in cryptocurrencies. Blockchain: Res Appl., 2022, 3(2), 100063. Available online at: https://www.sciencedirect.com/science/article/pii/S2096720922000033.] to construct a sentiment index from publicly available news articles. The conducted studies offer a more comprehensive perspective of the information that is available to investors and how that is incorporated into the agricultural commodities market
    Keywords: Text-as-data, Time-series, Agricultural commodities, Natural language processing, Multiple-Output Gaussian process, Statistical causality
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-05280276
  4. By: Mukashov, Askar; Robinson, Sherman; Thurlow, James; Arndt, Channing; Thomas, Timothy S.
    Abstract: This paper uses machine learning, simulation, and data mining methods to develop Systematic Risk Profiles of three developing economies: Kenya, Rwanda, and Malawi. We focus on three exogenous shocks with implications for economic performance: world market prices, capital flows, and climate-driven sectoral productivity. In these and other developing countries, recent decades have been characterized by increased risks associated with all these factors, and there is a demand for instruments that can help to disentangle them. For each country, we utilize historical data to develop multi-variate distributions of shocks. We then sample from these distributions to obtain a series of shock vectors, which we label economic uncertainty scenarios. These scenarios are then entered into economywide computable general equilibrium (CGE) simulation models for the three countries, which allow us to quantify the impact of increased uncertainty on major economic indicators. Finally, we utilize importance metrics from the random forest machine learning algorithm and relative importance metrics from multiple linear regression models to quantify the importance of country-specific risk factors for country performance. We find that Malawi and Rwanda are more vulnerable to sectoral productivity shocks, and Kenya is more exposed to external risks. These findings suggest that a country’s level of development and integration into the global economy are key driving forces defining their risk profiles. The methodology of Systematic Risk Profiling can be applied to many other countries, delineating country-specific risks and vulnerabilities.
    Keywords: climate; computable general equilibrium models; machine learning; risk; uncertainty; Kenya; Rwanda; Malawi; Africa; Eastern Africa; Sub-Saharan Africa
    Date: 2024–10–25
    URL: https://d.repec.org/n?u=RePEc:fpr:ifprid:158180
  5. By: Monica Bonacina (Fondazione Eni Enrico Mattei, Università degli Studi di Milano); Mert Demir (Fondazione Eni Enrico Mattei); Antonio Sileo (Fondazione Eni Enrico Mattei, GREEN – Università Bocconi); Angela Zanoni (Fondazione Eni Enrico Mattei, Università di Roma La Sapienza, Research Institute for Sustainable Economic Growth – National Research Council)
    Abstract: The transition to a zero-emission vehicle fleet represents a pivotal element of Europe’s decarbonization strategy, with Italy’s participation being particularly significant given the size of its automotive market. This study investigates the potential for battery electric cars (BEVs) to drive decarbonization of Italy’s passenger vehicle fleet, focusing on the feasibility of targets set in the National Integrated Plan for Energy and Climate (PNIEC). Leveraging artificial neural networks, we integrate macroeconomic indicators, market-specific variables, and policy instruments to predict fleet dynamics and identify key factors influencing BEV adoption. We forecast that while BEV registrations will continue growing through 2030, the growth rate is projected to decelerate, presenting challenges for meeting ambitious policy targets. Our feature importance analysis demonstrates that BEV adoption is driven by an interconnected set of economic, infrastructural, and behavioral factors. Specifically, our model highlights that hybrid vehicle registrations and the vehicle purchase index exert the strongest influence on BEV registrations, suggesting that policy interventions should prioritize these areas to maximize impact. By offering data-driven insights and methodological innovations, our findings contribute to more effective policy design for accelerating sustainable mobility adoption while accounting for market realities and consumer behavior.
    Keywords: sustainable mobility, electric vehicle, neural networks, shap interpretation
    JEL: N74 Q55 Q58 R40 C45
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:fem:femwpa:2025.16
  6. By: Nicolás Forteza (BANCO DE ESPAÑA); Sergio Puente (BANCO DE ESPAÑA)
    Abstract: Studying the labor market attachment (LMA) for the non-working population is crucial for several economic outcomes, such as real wages or long-term non-employment. Official statistics rely on self-reported variables and rule-based procedures to assign the labor market status of an individual. However, this classification does not take into account other individual-level characteristics, like variables related to reservation wages or the amount and type of job offers received, implying that estimates of non-worker status could be biased. In this paper, we propose a novel methodology to measure non-workers’ LMA. Using the Spanish Labor Force Survey (LFS), we define two groups (attached vs. non-attached), and estimate a probability distribution for each individual of belonging to such groups. To recover these probability distributions, we rely on unsupervised and supervised machine learning algorithms. We describe the differences between LFS unemployment, other measures of attachment in the literature, and our non-worker classification. We identify the instances in which our proposed methodology has a tighter relationship with measures like salaries, GDP and employment flows.
    Keywords: labor market attachment, unemployment, labor force
    JEL: J21 J82
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:bde:wpaper:2534
  7. By: Saiz, Lorena; Magro, Manuel Medina
    Abstract: This study introduces a novel approach to dictionary-based sentiment analysis that extracts valuable insights from economic newspaper articles in the euro area without requiring article translation. We develop sentiment indices that accurately measure economic, labour, and inflation perceptions in Germany, France, Italy, and Spain using native-language texts. The aggregation of these country-specific sentiments provides a reliable indicator for the euro area as a whole, demonstrating the effectiveness of our approach in several nowcasting and forecasting experiments. This translation-free method significantly reduces resource requirements, facilitates easy replication across various languages, and enables daily updates. By eliminating the translation bottleneck, our approach emerges as one of the most timely and cost-effective economic measures available, offering a powerful tool for monitoring and forecasting business cycles in the multilingual context of the euro area. JEL Classification: E32, E37, C53, C82
    Keywords: forecasting, inflation, output, recession, textual analysis
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:ecb:ecbwps:20253122
  8. By: Feng, Wenxiu; Alcántara Mata, Antonio; Ruiz Mora, Carlos
    Abstract: Wind farm placement arranges the size and the location of multiple wind farms within a given region. The power output is highly related to the wind speed on spatial and temporal levels, which can be modeled by advanced data-driven approaches. To this end, we use a probabilistic neural network as a surrogate that accounts for the spatiotemporal correlations of wind speed. This neural network uses ReLU activation functions so that it can be reformulated as mixed-integer linear set of constraints (constraint learning). We embed these constraints into the placement decision problem, formulated as a two-stage stochastic optimization problem. Specifically, conditional quantiles of the total electricity production are regarded as recursive decisions in the second stage. We use real high-resolution regional data from a northern region in Spain. We validate that the constraint learning approach outperforms the classical bilinear interpolation method. Numerical experiments are implemented on risk-averse investors. The results indicate that risk-averse investors concentrate on dominant sites with strong wind, while exhibiting spatial diversification and sensitive capacity spread in non-dominant sites. Furthermore, we show that if we introduce transmission line costs in the problem, risk-averse investors favor locations closer to the substations. On the contrary, risk-neutral investors are willing to move to further locations to achieve higher expected profits. Our results conclude that the proposed novel approach is able to tackle a portfolio of regional wind farm placements and further provide guidance for risk-averse investors.
    Keywords: Constraint learning; Optimal investment; Quantile neural network; Wind generation; Stochastic optimization
    Date: 2025–09–30
    URL: https://d.repec.org/n?u=RePEc:cte:wsrepe:48103
  9. By: Liudmila Alekseeva; Silvia Dalla Fontana; Caroline Genc; Lin Peng
    Abstract: This study presents the first large-scale analysis of face-based impression factors in the venture capital (VC) industry. Using machine learning to extract key impression factors from founders’ photos, we find that perceived trustworthiness, dominance, and youthfulness significantly predict VCs’ initial funding decisions, with relative importance varying by founder gender, team composition, and industry. These factors also predict the funding amount, follow-on financing, and longer-term outcomes, such as unicorn status and acquisitions. Therefore, even experienced investors rely on facial cues when evaluating founders, and such cues serve as imperfect but informative signals of venture success.
    Keywords: venture capital, investment selection, impressions, facial recognition, trustworthiness, dominance, attractiveness
    Date: 2025–09–24
    URL: https://d.repec.org/n?u=RePEc:ete:msiper:772779
  10. By: Bach, Ruben L.; Klamm, Christopher; Heyne, Stefanie; Kogan, Irena; Kononykhina, Olga; Jarck, Jana
    Abstract: Accurate occupational classification from open-ended survey responses is vital for research in sociology, economics, and political science, yet manual coding remains resource-intensive and difficult to scale. We propose a novel pipeline that leverages large language models (LLMs) augmented with retrieval (RAG) to automate the assignment of International Standard Classification of Occupations (ISCO) codes. Drawing on survey data from a sample of recently arrived Afghan and Syrian refugees in Germany, we preprocess noisy occupational descriptions using LLMs and apply vector-based similarity search to retrieve candidate ISCO codes. The final classification is selected by LLMs, constrained to the retrieved candidates and accompanied by interpretable justifications. We evaluate the system’s performance against expert-coded labels, demonstrating high agreement and robustness across languages. Our findings suggest that RAG-powered LLMs can substantially improve the accuracy, scalability, and accessibility of occupational classification, with particular benefits for multilingual and resource-constrained research settings. In addition, we describe a prototypical pipeline that other researchers can readily adapt for applying LLMs to similar classification tasks, facilitating transparency, reproducibility, and broader adoption.
    Date: 2025–09–24
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:ge56f_v1
  11. By: Gaul, Johannes; Schrader, Pascal
    Abstract: We study the relationship between investors' social media activity and earnings announcement returns. To distinguish between information contained in peer-to-peer interaction and user-posted content, we analyze conversation networks on Reddit using centrality metrics from network science and classify user sentiment with large language models. We show that pre-announcement sentiment is positively associated with short-term cumulative abnormal returns only if it does not spark pre-announcement controversy. If pre-announcement controversy arises, we document a negative association. Our findings present a more nuanced view on the wisdom of crowds hypothesis, highlighting that peer-to-peer interaction on social media exhibits a pattern of normalization, and thus contains informational value beyond content.
    Keywords: Information Processing, Reddit Wallstreet Bets, Wisdom of Crowds, Conversation Networks, Large Language Models, Eigenvector Centrality, High-frequency Data
    JEL: G12 G14
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:zewdip:327108
  12. By: Yantuan Yu (Guangdong University of Foreign Studies); Ning Zhang (Yonsei University)
    Abstract: The effect of market-based climate policy instruments on a just transition cannot be underestimated, especially for developing economies. In this study, we provide rigorous empirical evidence on how China’s Energy Quota Trading System(EQTS) can drive green technology innovation and support an equitable, low-carbon transition. Specifically, based on a quasi-experimental modeling framework, we use a Double Debiased Machine Learning method to estimate the casual effect of China’s EQTS on energy productivity. Further, we explore the mechanisms of impact and examine heterogeneity effects from regional, resource endowment, and environmental regulation stringency perspectives. The empirical findings show that EQTS significantly improves energy productivity, exhibiting an average marginal effect of 13.2%. Robustness checks confirm the validity of the results after controlling for potential confounders. Green technology innovation and energy transition function as critical pathways through which the policy enhances energy productivity. This study presents empirical evidence on how effective market-based regulatory mechanism are in the energy sector and offers practical policy recommendations for integrating innovation-driven strategies within national carbon mitigation frameworks.
    Keywords: Energy Quota Trading System; Energy Productivity; Natural-Experiment Modeling; Green Technology Innovation; Energy Transition
    JEL: O13 O47 Q43 R11
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:yon:wpaper:2025rwp-258

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.