nep-cmp 2026-03-16 papers

on Computational Economics

Issue of 2026–03–16
sixteen papers chosen by
Stan Miles, Thompson Rivers University

Uncertainty-Aware Deep Hedging By Manan Poddar
Short-Term Stock Price Prediction Based on Single and Stacking Machine Learning Models By Chia Yean Lim
A Bipartite Graph Approach to U.S.-China Cross-Market Return Forecasting By Jing Liu; Maria Grith; Xiaowen Dong; Mihai Cucuringu
Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance By Adir Saly-Kaufmann; Kieran Wood; Jan Peter-Calliess; Stefan Zohren
Enhancing Implementation SuccessinCohesion Policy. A Machine Learning Approach By Mara Giua; Francesca Micocci; Giulia Valeria Sonzogno
RAUI: Uncertainty Indicators Built With Artificial Intelligence By Morteza Ghomi; Samuel Hurtado
Double Machine Learning for Time Series By Milos Ciganovic; Federico D'Amario; Massimiliano Tancioni
Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies By Bruno Petrungaro; Anthony C. Constantinou
DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining By Yutong Yan; Raphael Tang; Zhenyu Gao; Wenxi Jiang; Yao Lu
Earning While Learning: How to Run Batched Bandit Experiments By Kemper, Jan; Rostam-Afschar, Davud
TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure By Maxime Kawawa-Beaudan; Srijan Sood; Kassiani Papasotiriou; Daniel Borrajo; Manuela Veloso
Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction By Dehao Dai; Ding Ma; Dou Liu; Kerui Geng; Yiqing Wang
AI-Powered Skill Classification: Mapping Technology Intensity in the German Labor Market By Grenz, Sabrina; Gregory, Terry; Lehmer, Florian
Who Shirks at Work? An Application of Machine Learning to Time Use Data By Giménez-Nadal, José Ignacio; Molina, José Alberto; Velilla, Jorge
General Social Agents By Benjamin S. Manning; John J. Horton
Generative AI for surveys on payment apps: AI views on privacy and technology By Koji Takahashi; Joon Suk Park

By:	Manan Poddar (Department of Mathematics, London School of Economics)
Abstract:	Deep hedging trains neural networks to manage derivative risk under market frictions, but produces hedge ratios with no measure of model confidence -- a significant barrier to deployment. We introduce uncertainty quantification to the deep hedging framework by training a deep ensemble of five independent LSTM networks under Heston stochastic volatility with proportional transaction costs. The ensemble's disagreement at each time step provides a per-time-step confidence measure that is strongly predictive of hedging performance: the learned strategy outperforms the Black-Scholes delta on approximately 80% of paths when model agreement is high, but on fewer than 20% when disagreement is elevated. We propose a CVaR-optimised blending strategy that combines the ensemble's hedge with the classical Black-Scholes delta, weighted by the level of model uncertainty. The blend improves on the Black-Scholes delta by 35-80 basis points in CVaR across several Heston calibrations, and on the theoretically optimal Whalley-Wilmott strategy by 100-250 basis points, with all improvements statistically significant under paired bootstrap tests. The analysis reveals that ensemble uncertainty is driven primarily by option moneyness rather than volatility, and that the uncertainty-performance relationship inverts under weak leverage -- findings with practical implications for the deployment of machine learning in hedging systems.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.10137

Short-Term Stock Price Prediction Based on Single and Stacking Machine Learning Models

By:	Chia Yean Lim (School of Computer Sciences, Universiti Sains Malaysia, 11800, Minden, Malaysia Author-2-Name: Wenchuan Sun Author-2-Workplace-Name: School of Computer Sciences, Universiti Sains Malaysia, 11800, Minden, Malaysia Author-3-Name: Fengqi Guo Author-3-Workplace-Name: CITIC Securities, 150000, Harbin, China Author-4-Name: Sau Loong Ang Author-4-Workplace-Name: Department of Computing and Information Technology, Tunku Abdul Rahman University of Management and Technology, Penang Branch, 11200, Tanjung Bungah, Malaysia Author-5-Name: Author-5-Workplace-Name: Author-6-Name: Author-6-Workplace-Name: Author-7-Name: Author-7-Workplace-Name: Author-8-Name: Author-8-Workplace-Name:)
Abstract:	" Objective - As the investment environment improves, individuals are increasingly eager to invest their idle funds. Securities companies have become the preferred choice for buying financial products. The current accuracy of stock predictions relies on the comprehensive models used by each securities company, including stock market trading, data, and stock pricing models. However, securities companies have not adequately explored a single suitable model for stock predictions and have rarely assessed the effectiveness of stacking and ensemble methods in improving these predictions. Methodology - This research first explored and proposed the best single-stock prediction model. Next, it combined four individual prediction models to create a stacking model. Findings - The comparison between the single and stacking models demonstrated that the stacking model's prediction accuracy exceeded that of the single model. Therefore, it is recommended that securities companies adopt a stacking-type prediction model to forecast share prices for their investment customers. Novelty - Using a stacking model could improve the accuracy of stock price predictions for investment managers, help users make better decisions, and ultimately enhance the company's earnings by delivering more accurate investment outcomes. Type of Paper - Empirical"
Keywords:	Long short-term memory, random forest model, stacking model, stock prediction, support vector machine, XGBoost model.
JEL:	F17 F47
Date:	2026–03–31
URL:	https://d.repec.org/n?u=RePEc:gtr:gatrjs:gjbssr674

A Bipartite Graph Approach to U.S.-China Cross-Market Return Forecasting

By:	Jing Liu; Maria Grith; Xiaowen Dong; Mihai Cucuringu
Abstract:	This paper studies cross-market return predictability through a machine learning framework that preserves economic structure. Exploiting the non-overlapping trading hours of the U.S. and Chinese equity markets, we construct a directed bipartite graph that captures time-ordered predictive linkages between stocks across markets. Edges are selected via rolling-window hypothesis testing, and the resulting graph serves as a sparse, economically interpretable feature-selection layer for downstream machine learning models. We apply a range of regularized and ensemble methods to forecast open-to-close returns using lagged foreign-market information. Our results reveal a pronounced directional asymmetry: U.S. previous-close-to-close returns contain substantial predictive information for Chinese intraday returns, whereas the reverse effect is limited. This informational asymmetry translates into economically meaningful performance differences and highlights how structured machine learning frameworks can uncover cross-market dependencies while maintaining interpretability.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.10559

Deep Learning for Financial Time Series: A Large-Scale Benchmark of Risk-Adjusted Performance

By:	Adir Saly-Kaufmann; Kieran Wood; Jan Peter-Calliess; Stefan Zohren
Abstract:	We present a large scale benchmark of modern deep learning architectures for a financial time series prediction and position sizing task, with a primary focus on Sharpe ratio optimization. Evaluating linear models, recurrent networks, transformer based architectures, state space models, and recent sequence representation approaches, we assess out of sample performance on a daily futures dataset spanning commodities, equity indices, bonds, and FX spanning 2010 to 2025. Our evaluation goes beyond average returns and includes statistical significance, downside and tail risk measures, breakeven transaction cost analysis, robustness to random seed selection, and computational efficiency. We find that models explicitly designed to learn rich temporal representations consistently outperform linear benchmarks and generic deep learning models, which often lead the ranking in standard time series benchmarks. Hybrid models such as VSN with LSTM, a combination of Variable Selection Networks (VSN) and LSTMs, achieves the highest overall Sharpe ratio, while VSN with xLSTM and LSTM with PatchTST exhibit superior downside adjusted characteristics. xLSTM demonstrates the largest breakeven transaction cost buffer, indicating improved robustness to trading frictions.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.01820

Enhancing Implementation SuccessinCohesion Policy. A Machine Learning Approach

By:	Mara Giua; Francesca Micocci; Giulia Valeria Sonzogno
Abstract:	We test the hypothesis that Cohesion Policy (CP) under performance, measured as projects' delays and (Output Indicators) target failures, is systematicaly driven by project-level features of the CP implementation architecture rather than by contextual conditions alone: using Italian project-level data (2014-2020) in a Machine Learning approach, we show how governance ar- rangements in terms of programme type, programmers, activation procedures and beneficiary combine with underlying contextual conditions in predicting projects' outcomes. Successful policy configurations avoiding underperformance can be adopted in an evidence-based perspective by combining some of the existing policy tools and accounting for the socio-economic context upstream.
Keywords:	CohesionPolicy, EuropeanUnion, Policyimplementation, MachineLearning
JEL:	O18 R11 R58 C53 C55
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:rtr:wpaper:0289

RAUI: Uncertainty Indicators Built With Artificial Intelligence

By:	Morteza Ghomi (BANCO DE ESPAÑA); Samuel Hurtado (BANCO DE ESPAÑA)
Abstract:	We present a methodology for generating uncertainty indicators for user-defined topics based on newspaper data. The approach is based on Retrieval-Augmented Generation (RAG) systems commonly used in artificial intelligence applications, which we adapt to construct topic-specific uncertainty measures, referred to as Retrieval-Augmented Uncertainty Indicators (RAUI). The method employs semantic search with an embedding model to select news articles relevant to a given topic, and a large language model (LLM) to quantify the level of uncertainty contained in each of those articles. We construct uncertainty indicators for ten topics using Spanish newspaper data and an aggregate measure that also highlights how each topic contributes to overall uncertainty. We present two practical applications of these indicators: a VAR analysis that shows how different sources of uncertainty have different effects on the Spanish economy, and an estimation that generates time-varying fan charts around the Banco de España GDP growth projections.
Keywords:	uncertainty, artificial intelligence, natural language processing, newspapers
JEL:	C81 E32
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:bde:wpaper:2609

Double Machine Learning for Time Series

By:	Milos Ciganovic; Federico D'Amario; Massimiliano Tancioni
Abstract:	We modify the Double Machine Learning estimator to broaden its applicability to macroeconomic time-series settings. A deterministic cross-fitting step, termed Reverse Cross-Fitting, leverages the time-reversibility of stationary series to improve sample utilization and efficiency. We detail and prove the conditions under which the estimator is asymptotically valid. We then demonstrate, through simulations, that its performance remains valid in realistic finite samples and is robust to model misspecification and violations of assumptions, such as heteroskedasticity. In high dimensions, predictive metrics for tuning nuisance learners do not generally minimize bias in the causal score. We propose a calibration rule targeting a "Goldilocks zone", a region of tuning parameters that delivers stable, partialled-out signals and reduced small-sample bias. Finally, we apply our procedure to residualized Local Projections to estimate the dynamic effects of a rise in Tier 1 regulatory capital. The results underscore the usefulness of the methodology for inference in macroeconomic applications.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.10999

Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies

By:	Bruno Petrungaro; Anthony C. Constantinou
Abstract:	Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains the subject of ongoing research in causal ML. In addition to traditional causal ML, this study assesses econometric methods that some argue can recover causal structures from time series data. The use of these methods can be explained by the significant attention the field of econometrics has given to causality, and specifically to time series, over the years. This presents the possibility of comparing the causal discovery performance between econometric and traditional causal ML algorithms. We seek to understand if there are lessons to be incorporated into causal ML from econometrics, and provide code to translate the results of these econometric methods to the most widely used Bayesian Network R library, bnlearn. We investigate the benefits and challenges that these algorithms present in supporting policy decision-making, using the real-world case of COVID-19 in the UK as an example. Four econometric methods are evaluated in terms of graphical structure, model dimensionality, and their ability to recover causal effects, and these results are compared with those of eleven causal ML algorithms. Amongst our main results, we see that econometric methods provide clear rules for temporal structures, whereas causal-ML algorithms offer broader discovery by exploring a larger space of graph structures that tends to lead to denser graphs that capture more identifiable causal relationships.
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.00041

DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining

By:	Yutong Yan; Raphael Tang; Zhenyu Gao; Wenxi Jiang; Yao Lu
Abstract:	In financial backtesting, large language models pretrained on internet-scale data risk introducing lookahead bias that undermines their forecasting validity, as they may have already seen the true outcome during training. To address this, we present DatedGPT, a family of twelve 1.3B-parameter language models, each trained from scratch on approximately 100 billion tokens of temporally partitioned data with strict annual cutoffs spanning 2013 to 2024. We further enhance each model with instruction fine-tuning on both general-domain and finance-specific datasets curated to respect the same temporal boundaries. Perplexity-based probing confirms that each model's knowledge is effectively bounded by its data cutoff year, while evaluation on standard benchmarks shows competitive performance with existing models of similar scale. We provide an interactive web demo that allows users to query and compare responses from models across different cutoff years.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.11838

Earning While Learning: How to Run Batched Bandit Experiments

By:	Kemper, Jan (ZEW, University of Mannheim); Rostam-Afschar, Davud (University of Mannheim)
Abstract:	Researchers typically collect experimental data sequentially, allowing early outcome observations and adaptive treatment assignment to reduce exposure to inferior treatments. This article reviews multi-armed-bandit adaptive experimental designs that balance exploration and exploitation. Because adaptively collected experimental data through bandit algorithms violate standard asymptotics, inference is challenging. We implement an estimator that yields valid heteroskedasticity-robust confidence intervals in batched bandit designs and compare coverage in Monte Carlo simulations. We introduce bbandits for Stata, a tool for designing experiments via simulation, running interactive bandit experiments, and implementing and analyzing adaptively collected data. bbandits includes three common assignment algorithmsâ€”Îµ-first, Îµ-greedy, and Thompson samplingâ€”and supports estimation, inference, and visualization.
Keywords:	randomized controlled trial, causal inference, multi-armed bandits, experimental design, machine learning
JEL:	C1 C11 C12 C13 C15 C18 C8 C87 C88 C9 D83
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18429

TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure

By:	Maxime Kawawa-Beaudan; Srijan Sood; Kassiani Papasotiriou; Daniel Borrajo; Manuela Veloso
Abstract:	Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions of trade events across >9K equities. To enable cross-asset generalization, we develop scale-invariant features and a universal tokenization scheme that map the heterogeneous, multi-modal event stream of order flow into a unified discrete sequence -- eliminating asset-specific calibration. Integrated with a deterministic market simulator, TradeFM-generated rollouts reproduce key stylized facts of financial returns, including heavy tails, volatility clustering, and absence of return autocorrelation. Quantitatively, TradeFM achieves 2-3x lower distributional error than Compound Hawkes baselines and generalizes zero-shot to geographically out-of-distribution APAC markets with moderate perplexity degradation. Together, these results suggest that scale-invariant trade representations capture transferable structure in market microstructure, opening a path toward synthetic data generation, stress testing, and learning-based trading agents.
Date:	2026–02
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2602.23784

Beyond Polarity: Multi-Dimensional LLM Sentiment Signals for WTI Crude Oil Futures Return Prediction

By:	Dehao Dai; Ding Ma; Dou Liu; Kerui Geng; Yiqing Wang
Abstract:	Forecasting crude oil prices remains challenging because market-relevant information is embedded in large volumes of unstructured news and is not fully captured by traditional polarity-based sentiment measures. This paper examines whether multi-dimensional sentiment signals extracted by large language models improve the prediction of weekly WTI crude oil futures returns. Using energy-sector news articles from 2020 to 2025, we construct five sentiment dimensions covering relevance, polarity, intensity, uncertainty, and forwardness based on GPT-4o, Llama 3.2-3b, and two benchmark models, FinBERT and AlphaVantage. We aggregate article-level signals to the weekly level and evaluate their predictive performance in a classification framework. The best results are achieved by combining GPT-4o and FinBERT, suggesting that LLM-based and conventional financial sentiment models provide complementary predictive information. SHAP analysis further shows that intensity- and uncertainty-related features are among the most important predictors, indicating that the predictive value of news sentiment extends beyond simple polarity. Overall, the results suggest that multi-dimensional LLM-based sentiment measures can improve commodity return forecasting and support energy-market risk monitoring.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2603.11408

AI-Powered Skill Classification: Mapping Technology Intensity in the German Labor Market

By:	Grenz, Sabrina (Utrecht University); Gregory, Terry (LISER); Lehmer, Florian (IAB Nueremberg)
Abstract:	The rapid evolution of technology is reshaping labor markets by altering skill demands and job profiles. This paper introduces a novel skill-based measure of occupational technology intensity -- the Occupational Technology Skill Share (OTSS) -- that distinguishes between manual, digital, and frontier technologies. Using natural language processing, generative AI, and supervised machine learning, we develop an AI-powered skill classification that enriches occupation-linked skill labels with standardized GenAI-generated descriptions and structured indicators of technological content, enabling transparent classification by technology intensity. We compute OTSS for all occupations in the German labor market. For the average worker in 2023, manual technologies account for the largest share of skill content (42\%), followed by digital (38\%) and frontier technologies (20\%). Frontier technologies remain concentrated in specialized occupations, while digital technologies are widespread. Linking these measures to administrative data from 2012â€“2023 shows a broad shift from manual and digital toward frontier skills across occupations, and reveals a U-shaped relationship between changes in frontier skill intensity and employment growth.
Keywords:	artificial intelligence, digitalization, skills, employment growth
JEL:	J21 J24 O33
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18415

Who Shirks at Work? An Application of Machine Learning to Time Use Data

By:	Giménez-Nadal, José Ignacio (University of Zaragoza); Molina, José Alberto (University of Zaragoza); Velilla, Jorge (University of Zaragoza)
Abstract:	Worker productivity depends not only on hours worked, but also on how work time is actually used, and time-use evidence shows that non-work at work is non-trivial. This paper provides a data-driven characterization of shirking, and studies which observable characteristics best predict shirking behavior using American Time Use Survey data over 2003â€“2024. We implement a machine-learning forward selection procedure based on out-of-sample predictive performance. Our results suggest that shirking strongly depends on stochastic or unobserved factors, and that the determinants of the extensive and intensive margins are different. Moreover, the most informative predictors are predominantly job-related and time-allocation variables, whereas macro and labor-market indicators seem less relevant. This suggests that policies or managerial approaches to improve worker efficiency relying on observables face important limitations.
Keywords:	shirking, non-work at work, ATUS data, prediction
JEL:	J22 C53
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18432

General Social Agents

By:	Benjamin S. Manning; John J. Horton
Abstract:	Useful social science theories predict behavior across settings. However, applying a theory to make predictions in new settings is challenging: rarely can it be done without ad hoc modifications to account for setting-specific factors. We argue that AI agents put in simulations of those novel settings offer an alternative for applying theory, requiring minimal or no modifications. We present an approach for building such "general" agents that use theory-grounded natural language instructions, existing empirical data, and knowledge acquired by the underlying AI during training. To demonstrate the approach in settings where no data from that data-generating process exists--as is often the case in applied prediction problems--we design a heterogeneous population of 883, 320 novel games. AI agents are constructed using human data from a small set of conceptually related but structurally distinct "seed" games. In preregistered experiments, on average, agents predict initial human play in a random sample of 1, 500 games from the population better than (i) a cognitive hierarchy model, (ii) game-theoretic equilibria, and (iii) out-of-the-box agents. For a small set of separate novel games, these simulations predict responses from a new sample of human subjects better even than the most plausibly relevant published human data.
JEL:	D01 D03
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:34937

Generative AI for surveys on payment apps: AI views on privacy and technology

By:	Koji Takahashi; Joon Suk Park
Abstract:	This study uses ChatGPT to simulate survey responses about payment apps, focusing on privacy and perceived benefits. By designing prompts that mirror real user characteristics, the generated responses align with findings from a Dutch survey, especially when grouped by privacy concern. Privacy-concerned agents view apps less favorably, while users show more positive attitudes than non-users, even without such traits in the prompt. However, ChatGPT fails to match the real survey's response variability and tends to overstate privacy concerns. These results indicate that generative AI can complement but not replace human surveys for studying perceptions of payment tools.
Keywords:	ChatGPT, generative artificial agents, privacy paradox, Westin index, survey, payment
JEL:	M31 C83 C45 D12 L86
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:bis:biswps:1333

This nep-cmp issue is ©2026 by Stan Miles. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.