nep-big New Economics Papers
on Big Data
Issue of 2025–12–15
fourteen papers chosen by
Tom Coupé, University of Canterbury


  1. Narratives to Numbers: Large Language Models and Economic Policy Uncertainty By Ethan Hartley
  2. Predicting Price Movements in High-Frequency Financial Data with Spiking Neural Networks By Brian Ezinwoke; Oliver Rhodes
  3. Sentiment Analysis of Financial Text Using Quantum Language Processing QDisCoCirc By Takayuki Sakuma
  4. ReLU-Based and DNN-Based Generalized Maximum Score Estimators By Xiaohong Chen; Wayne Yuan Gao; Likang Wen
  5. Investigating Factors Influencing Dietary Quality in China: Machine Learning Approaches By Feng, Yuan; Liu, Shuang; Zhang, Man; Jin, Yanhong; Yu, Xiaohua
  6. Using Machine Learning Method to Estimate the Heterogeneous Impacts of the Updated Nutrition Facts Panel By Zhang, Yuxiang; Liu, Yizao; Sears, James M.
  7. Standard Occupation Classifier -- A Natural Language Processing Approach By Sidharth Rony; Jack Patman
  8. Modelling the Doughnut of social and planetary boundaries with frugal machine learning By Stefano Vrizzi; Daniel W. O'Neill
  9. GDP Nowcasting Performance of Traditional Econometric Models vs Machine-Learning Algorithms: Simulation and Case Studies By Klakow Akepanidtaworn; Korkrid Akepanidtaworn
  10. Optimizing Information Asset Investment Strategies in the Exploratory Phase of the Oil and Gas Industry: A Reinforcement Learning Approach By Paulo Roberto de Melo Barros Junior; Monica Alexandra Vilar Ribeiro De Meireles; Jose Luis Lima de Jesus Silva
  11. Enhancing the Efficiency of National R&D Programs Using Machine Learning-Based Anomaly Detection By Sang-Kyu Lee
  12. Arbitrage-Free Bond and Yield Curve Forecasting with Neural Filters under HJM Constraints By Xiang Gao; Cody Hyndman
  13. Cryptocurrency Portfolio Management with Reinforcement Learning: Soft Actor--Critic and Deep Deterministic Policy Gradient Algorithms By Kamal Paykan
  14. Financial Text Classification Based On rLoRA Finetuning On Qwen3-8B model By Zhiming Lian

  1. By: Ethan Hartley
    Abstract: This study evaluates large language models as estimable classifiers and clarifies how modeling choices shape downstream measurement error. Revisiting the Economic Policy Uncertainty index, we show that contemporary classifiers substantially outperform dictionary rules, better track human audit assessments, and extend naturally to noisy historical and multilingual news. We use these tools to construct a new nineteenth-century U.S. index from more than 360 million newspaper articles and exploratory cross-country indices with a single multilingual model. Taken together, our results show that LLMs can systematically improve text-derived measures and should be integrated as explicit measurement tools in empirical economics.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.17866
  2. By: Brian Ezinwoke; Oliver Rhodes
    Abstract: Modern high-frequency trading (HFT) environments are characterized by sudden price spikes that present both risk and opportunity, but conventional financial models often fail to capture the required fine temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework well-suited to these challenges due to their natural ability to process discrete events and preserve millisecond-scale timing. This work investigates the application of SNNs to high-frequency price-spike forecasting, enhancing performance via robust hyperparameter tuning with Bayesian Optimization (BO). This work converts high-frequency stock data into spike trains and evaluates three architectures: an established unsupervised STDP-trained SNN, a novel SNN with explicit inhibitory competition, and a supervised backpropagation network. BO was driven by a novel objective, Penalized Spike Accuracy (PSA), designed to ensure a network's predicted price spike rate aligns with the empirical rate of price events. Simulated trading demonstrated that models optimized with PSA consistently outperformed their Spike Accuracy (SA)-tuned counterparts and baselines. Specifically, the extended SNN model with PSA achieved the highest cumulative return (76.8%) in simple backtesting, significantly surpassing the supervised alternative (42.54% return). These results validate the potential of spiking networks, when robustly tuned with task-specific objectives, for effective price spike forecasting in HFT.
    Date: 2025–12
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2512.05868
  3. By: Takayuki Sakuma
    Abstract: We apply quantum distributional compositional circuit (QDisCoCirc) to 3-class sentiment analysis of financial text. In our classical simulations, we keep the Hilbert-space dimension manageable by decomposing each sentence into short contiguous chunks. Each chunk is mapped to a shallow quantum circuit, and the resulting Bloch vectors are used as a sequence of quantum tokens. Simple averaging of chunk vectors ignores word order and syntactic roles. We therefore add a small Transformer encoder over the raw Bloch-vector sequence and attach a CCG-based type embedding to each chunk. This hybrid design preserves physically interpretable semantic axes of quantum tokens while allowing the classical side to model word order and long-range dependencies. The sequence model improves test macro-F1 over the averaging baseline and chunk-level attribution further shows that evidential mass concentrates on a small number of chunks, that type embeddings are used more reliably for correctly predicted sentences. For real-world quantum language processing applications in finance, future key challenges include circuit designs that avoid chunking and the design of inter-chunk fusion layers.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.18804
  4. By: Xiaohong Chen; Wayne Yuan Gao; Likang Wen
    Abstract: We propose a new formulation of the maximum score estimator that uses compositions of rectified linear unit (ReLU) functions, instead of indicator functions as in Manski (1975, 1985), to encode the sign alignment restrictions. Since the ReLU function is Lipschitz, our new ReLU-based maximum score criterion function is substantially easier to optimize using standard gradient-based optimization pacakges. We also show that our ReLU-based maximum score (RMS) estimator can be generalized to an umbrella framework defined by multi-index single-crossing (MISC) conditions, while the original maximum score estimator cannot be applied. We establish the $n^{-s/(2s+1)}$ convergence rate and asymptotic normality for the RMS estimator under order-$s$ Holder smoothness. In addition, we propose an alternative estimator using a further reformulation of RMS as a special layer in a deep neural network (DNN) architecture, which allows the estimation procedure to be implemented via state-of-the-art software and hardware for DNN.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.19121
  5. By: Feng, Yuan; Liu, Shuang; Zhang, Man; Jin, Yanhong; Yu, Xiaohua
    Keywords: Food Consumption/Nutrition/Food Safety
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:ags:aaea24:343836
  6. By: Zhang, Yuxiang; Liu, Yizao; Sears, James M.
    Keywords: Food Consumption/Nutrition/Food Safety, Health Economics and Policy, Consumer/Household Economics
    Date: 2024
    URL: https://d.repec.org/n?u=RePEc:ags:aaea24:343727
  7. By: Sidharth Rony; Jack Patman
    Abstract: Standard Occupational Classifiers (SOC) are systems used to categorize and classify different types of jobs and occupations based on their similarities in terms of job duties, skills, and qualifications. Integrating these facets with Big Data from job advertisement offers the prospect to investigate labour demand that is specific to various occupations. This project investigates the use of recent developments in natural language processing to construct a classifier capable of assigning an occupation code to a given job advertisement. We develop various classifiers for both UK ONS SOC and US O*NET SOC, using different Language Models. We find that an ensemble model, which combines Google BERT and a Neural Network classifier while considering job title, description, and skills, achieved the highest prediction accuracy. Specifically, the ensemble model exhibited a classification accuracy of up to 61% for the lower (or fourth) tier of SOC, and 72% for the third tier of SOC. This model could provide up to date, accurate information on the evolution of the labour market using job advertisements.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.23057
  8. By: Stefano Vrizzi; Daniel W. O'Neill
    Abstract: The 'Doughnut' of social and planetary boundaries has emerged as a popular framework for assessing environmental and social sustainability. Here, we provide a proof-of-concept analysis that shows how machine learning (ML) methods can be applied to a simple macroeconomic model of the Doughnut. First, we show how ML methods can be used to find policy parameters that are consistent with 'living within the Doughnut'. Second, we show how a reinforcement learning agent can identify the optimal trajectory towards desired policies in the parameter space. The approaches we test, which include a Random Forest Classifier and $Q$-learning, are frugal ML methods that are able to find policy parameter combinations that achieve both environmental and social sustainability. The next step is the application of these methods to a more complex ecological macroeconomic model.
    Date: 2025–12
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2512.02200
  9. By: Klakow Akepanidtaworn; Korkrid Akepanidtaworn
    Abstract: Are Machine Learning (ML) algorithms superior to traditional econometric models for GDP nowcasting in a time series setting? Based on our evaluation of all models from both classes ever used in nowcasting across simulation and six country cases, traditional econometric models tend to outperform ML algorithms. Among the ML algorithms, linear ML algorithm – Lasso and Elastic Net – perform best in nowcasting, even surpassing traditional econometric models in cases of long GDP data and rich high-frequency indicators. Among the traditional econometric models, the Bridge and Dynamic Factor deliver the strongest empirical results, while Three-Pass Regression Filter performs well in our simulation. Due to the relatively short length of GDP series, complex and non-linear ML algorithms are prone to overfitting, which compromises their out-of-sample performance.
    Keywords: Nowcasting; Machine Learning; Forecast evaluation; Real-time data
    Date: 2025–12–05
    URL: https://d.repec.org/n?u=RePEc:imf:imfwpa:2025/252
  10. By: Paulo Roberto de Melo Barros Junior; Monica Alexandra Vilar Ribeiro De Meireles; Jose Luis Lima de Jesus Silva
    Abstract: Our work investigates the economic efficiency of the prevailing "ladder-step" investment strategy in oil and gas exploration, which advocates for the incremental acquisition of geological information throughout the project lifecycle. By employing a multi-agent Deep Reinforcement Learning (DRL) framework, we model an alternative strategy that prioritizes the early acquisition of high-quality information assets. We simulate the entire upstream value chain-comprising competitive bidding, exploration, and development phases-to evaluate the economic impact of this approach relative to traditional methods. Our results demonstrate that front-loading information investment significantly reduces the costs associated with redundant data acquisition and enhances the precision of reserve valuation. Specifically, we find that the alternative strategy outperforms traditional methods in highly competitive environments by mitigating the "winner's curse" through more accurate bidding. Furthermore, the economic benefits are most pronounced during the development phase, where superior data quality minimizes capital misallocation. These findings suggest that optimal investment timing is structurally dependent on market competition rather than solely on price volatility, offering a new paradigm for capital allocation in extractive industries.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2512.00243
  11. By: Sang-Kyu Lee (Korea Institute for Industrial Economics and Trade)
    Abstract: This study is grounded on the premise that, given the transformative advances in artificial intelligence (AI) technologies occurring across the industrial landscape, AI tools should be actively implemented into the design and implementation of industrial policy. We argue that this is especially true for R&D policy, which is central to national competitiveness in science and technology, and which must consider multiple diverse variables, including the global economy, the overall industrial environment, corporate management, and technological capabilities.<p> For this study, I apply machine learning (ML)-based anomaly detection (AD) to analyze high-performing national R&D projects, and specifically assess ML-based AD that considers both input and output variables and analyzes structural patterns. Building on these analytical results, I propose firm-size-specific differentiated policy measures designed to enhance R&D performance.<p> The goal of this study is to establish a policy-decision framework that improves timeliness and precision in the operation and management of national R&D programs and, in the longer term, contributes to the realization of AI-based policy planning and operational management.
    Keywords: machine learning; artificial intelligence; AI; anomaly detection; DEA; SHAP; research and development; R&D; government R&D; industrial policy; South Korea
    JEL: I23 I28 O32 O38
    Date: 2025–10–31
    URL: https://d.repec.org/n?u=RePEc:ris:kieter:021804
  12. By: Xiang Gao; Cody Hyndman
    Abstract: We develop an arbitrage-free deep learning framework for yield curve and bond price forecasting based on the Heath-Jarrow-Morton (HJM) term-structure model and a dynamic Nelson-Siegel parameterization of forward rates. Our approach embeds a no-arbitrage drift restriction into a neural state-space architecture by combining Kalman, extended Kalman, and particle filters with recurrent neural networks (LSTM/CLSTM), and introduces an explicit arbitrage error regularization (AER) term during training. The model is applied to U.S. Treasury and corporate bond data, and its performance is evaluated for both yield-space and price-space predictions at 1-day and 5-day horizons. Empirically, arbitrage regularization leads to its strongest improvements at short maturities, particularly in 5-day-ahead forecasts, increasing market-consistency as measured by bid-ask hit rates and reducing dollar-denominated prediction errors.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.17892
  13. By: Kamal Paykan (Department of Mathematics, Tafresh University, Tafresh, Iran)
    Abstract: This paper proposes a reinforcement learning--based framework for cryptocurrency portfolio management using the Soft Actor--Critic (SAC) and Deep Deterministic Policy Gradient (DDPG) algorithms. Traditional portfolio optimization methods often struggle to adapt to the highly volatile and nonlinear dynamics of cryptocurrency markets. To address this, we design an agent that learns continuous trading actions directly from historical market data through interaction with a simulated trading environment. The agent optimizes portfolio weights to maximize cumulative returns while minimizing downside risk and transaction costs. Experimental evaluations on multiple cryptocurrencies demonstrate that the SAC and DDPG agents outperform baseline strategies such as equal-weighted and mean--variance portfolios. The SAC algorithm, with its entropy-regularized objective, shows greater stability and robustness in noisy market conditions compared to DDPG. These results highlight the potential of deep reinforcement learning for adaptive and data-driven portfolio management in cryptocurrency markets.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2511.20678
  14. By: Zhiming Lian
    Abstract: Financial text classification has increasingly become an important aspect in quantitative trading systems and related tasks, such as financial sentiment analysis and the classification of financial news. In this paper, we assess the performance of the large language model Qwen3-8B on both tasks. Qwen3-8B is a state-of-the-art model that exhibits strong instruction-following and multilingual capabilities, and is distinct from standard models, primarily because it is specifically optimized for efficient fine tuning and high performance on reasoning-based benchmarks, making it suitable for financial applications. To adapt this model, we apply Noisy Embedding Instruction Finetuning and based on our previous work, this method increases robustness by injecting controlled noise into the embedding layers during supervised adaptation. We improve efficiency further with Rank-stabilized Low-Rank Adaptation low-rank optimization approach, and FlashAttention, which allow for faster training with lower GPU memory. For both tasks, we benchmark Qwen3-8B against standard classical transformer models, such as T5, BERT, and RoBERTa, and large models at scale, such as LLaMA1-7B, LLaMA2-7B, and Baichuan2-7B. The findings reveal that Qwen3-8B consistently surpasses these baselines by obtaining better classification accuracy and needing fewer training epochs. The synergy of instruction-based fine-tuning and memory-efficient optimization methods suggests Qwen3-8B can potentially serve as a scalable, economical option for real-time financial NLP applications. Qwen3-8B provides a very promising base for advancing dynamic quantitative trading systems in the future.
    Date: 2025–11
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2512.00630

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.