|
on Computational Economics |
| By: | Wo Long; Wenxin Zeng; Xiaoyu Zhang; Ziyao Zhou |
| Abstract: | This research develops a sentiment-driven quantitative trading system that leverages a large language model, FinGPT, for sentiment analysis, and explores a novel method for signal integration using a reinforcement learning algorithm, Twin Delayed Deep Deterministic Policy Gradient (TD3). We compare the performance of strategies that integrate sentiment and technical signals using both a conventional rule-based approach and a reinforcement learning framework. The results suggest that sentiment signals generated by FinGPT offer value when combined with traditional technical indicators, and that reinforcement learning algorithm presents a promising approach for effectively integrating heterogeneous signals in dynamic trading environments. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.10526 |
| By: | Giorgia Rensi; Pietro Rossi; Marco Bianchetti |
| Abstract: | The SABR model is a cornerstone of interest rate volatility modeling, but its practical application relies heavily on the analytical approximation by Hagan et al., whose accuracy deteriorates for high volatility, long maturities, and out-of-the-money options, admitting arbitrage. While machine learning approaches have been proposed to overcome these limitations, they have often been limited by simplified SABR dynamics or a lack of systematic validation against the full spectrum of market conditions. We develop a novel SABR DNN, a specialized Artificial Deep Neural Network (DNN) architecture that learns the true SABR stochastic dynamics using an unprecedented large training dataset (more than 200 million points) of interest rate Cap/Floor volatility surfaces, including very long maturities (30Y) and extreme strikes consistently with market quotations. Our dataset is obtained via high-precision unbiased Monte Carlo simulation of a special scaled shifted-SABR stochastic dynamics, which allows dimensional reduction without any loss of generality. Our SABR DNN provides arbitrage-free calibration of real market volatility surfaces and Caps/Floors prices for any maturity and strike with negligible computational effort and without retraining across business dates. Our results fully address the gaps in the previous machine learning SABR literature in a systematic and self-consistent way, and can be extended to cover any interest rate European options in different rate tenors and currencies, thus establishing a comprehensive functional SABR framework that can be adopted for daily trading and risk management activities. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.10343 |
| By: | Zofia Bracha; Pawe{\l} Sakowski; Jakub Micha\'nk\'ow |
| Abstract: | This paper explores the application of deep Q-learning to hedging at-the-money options on the S\&P~500 index. We develop an agent based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, trained to simulate hedging decisions without making explicit model assumptions on price dynamics. The agent was trained on historical intraday prices of S\&P~500 call options across years 2004--2024, using a single time series of six predictor variables: option price, underlying asset price, moneyness, time to maturity, realized volatility, and current hedge position. A walk-forward procedure was applied for training, which led to nearly 17~years of out-of-sample evaluation. The performance of the deep reinforcement learning (DRL) agent is benchmarked against the Black--Scholes delta-hedging strategy over the same period. We assess both approaches using metrics such as annualized return, volatility, information ratio, and Sharpe ratio. To test the models' adaptability, we performed simulations across varying market conditions and added constraints such as transaction costs and risk-awareness penalties. Our results show that the DRL agent can outperform traditional hedging methods, particularly in volatile or high-cost environments, highlighting its robustness and flexibility in practical trading contexts. While the agent consistently outperforms delta-hedging, its performance deteriorates when the risk-awareness parameter is higher. We also observed that the longer the time interval used for volatility estimation, the more stable the results. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.09247 |
| By: | Marlon Azinovic-Yang; Jan \v{Z}emli\v{c}ka |
| Abstract: | We develop a deep learning algorithm for approximating functional rational expectations equilibria of dynamic stochastic economies in the sequence space. We use deep neural networks to parameterize equilibrium objects of the economy as a function of truncated histories of exogenous shocks. We train the neural networks to fulfill all equilibrium conditions along simulated paths of the economy. To illustrate the performance of our method, we solve three economies of increasing complexity: the stochastic growth model, a high-dimensional overlapping generations economy with multiple sources of aggregate risk, and finally an economy where households and firms face uninsurable idiosyncratic risk, shocks to aggregate productivity, and shocks to idiosyncratic and aggregate volatility. Furthermore, we show how to design practical neural policy function architectures that guarantee monotonicity of the predicted policies, facilitating the use of the endogenous grid method to simplify parts of our algorithm. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.13623 |
| By: | Yilie Huang |
| Abstract: | This paper proposes a novel approach for Asset-Liability Management (ALM) by employing continuous-time Reinforcement Learning (RL) with a linear-quadratic (LQ) formulation that incorporates both interim and terminal objectives. We develop a model-free, policy gradient-based soft actor-critic algorithm tailored to ALM for dynamically synchronizing assets and liabilities. To ensure an effective balance between exploration and exploitation with minimal tuning, we introduce adaptive exploration for the actor and scheduled exploration for the critic. Our empirical study evaluates this approach against two enhanced traditional financial strategies, a model-based continuous-time RL method, and three state-of-the-art RL algorithms. Evaluated across 200 randomized market scenarios, our method achieves higher average rewards than all alternative strategies, with rapid initial gains and sustained superior performance. The outperformance stems not from complex neural networks or improved parameter estimation, but from directly learning the optimal ALM strategy without learning the environment. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.23280 |
| By: | George Gui; Seungwoo Kim |
| Abstract: | Pre-experiment stratification, or blocking, is a well-established technique for designing more efficient experiments and increasing the precision of the experimental estimates. However, when researchers have access to many covariates at the experiment design stage, they often face challenges in effectively selecting or weighting covariates when creating their strata. This paper proposes a Generative Stratification procedure that leverages Large Language Models (LLMs) to synthesize high-dimensional covariate data to improve experimental design. We demonstrate the value of this approach by applying it to a set of experiments and find that our method would have reduced the variance of the treatment effect estimate by 10%-50% compared to simple randomization in our empirical applications. When combined with other standard stratification methods, it can be used to further improve the efficiency. Our results demonstrate that LLM-based simulation is a practical and easy-to-implement way to improve experimental design in covariate-rich settings. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.25709 |
| By: | Yun Zhao; Harry Zheng |
| Abstract: | We propose an approach to applying neural networks on linear parabolic variational inequalities. We use loss functions that directly incorporate the variational inequality on the whole domain to bypass the need to determine the stopping region in advance and prove the existence of neural networks whose losses converge to zero. We also prove the functional convergence in the Sobolev space. We then apply our approach to solving an optimal investment and stopping problem in finance. By leveraging duality, we convert the nonlinear HJB-type variational inequality of the primal problem into a linear variational inequality of the dual problem and prove the convergence of the primal value function from the dual neural network solution, an outcome made possible by our Sobolev norm analysis. We illustrate the versatility and accuracy of our method with numerical examples for both power and non-HARA utilities as well as high-dimensional American put option pricing. Our results underscore the potential of neural networks for solving variational inequalities in optimal stopping and control problems. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.26535 |
| By: | Lars van der Laan; Nathan Kallus; Aur\'elien Bibaut |
| Abstract: | Inverse reinforcement learning (IRL) aims to explain observed behavior by uncovering an underlying reward. In the maximum-entropy or Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function and a soft value function that together satisfy the soft Bellman consistency condition and maximize the likelihood of observed actions. While this perspective has had enormous impact in imitation learning for robotics and understanding dynamic choices in economics, practical learning algorithms often involve delicate inner-loop optimization, repeated dynamic programming, or adversarial training, all of which complicate the use of modern, highly expressive function approximators like neural nets and boosting. We revisit softmax IRL and show that the population maximum-likelihood solution is characterized by a linear fixed-point equation involving the behavior policy. This observation reduces IRL to two off-the-shelf supervised learning problems: probabilistic classification to estimate the behavior policy, and iterative regression to solve the fixed point. The resulting method is simple and modular across function approximation classes and algorithms. We provide a precise characterization of the optimal solution, a generic oracle-based algorithm, finite-sample error bounds, and empirical results showing competitive or superior performance to MaxEnt IRL. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.21172 |
| By: | Yun Lin; Jiawei Lou; Jinghe Zhang |
| Abstract: | Deep learning offers new tools for portfolio optimization. We present an end-to-end framework that directly learns portfolio weights by combining Long Short-Term Memory (LSTM) networks to model temporal patterns, Graph Attention Networks (GAT) to capture evolving inter-stock relationships, and sentiment analysis of financial news to reflect market psychology. Unlike prior approaches, our model unifies these elements in a single pipeline that produces daily allocations. It avoids the traditional two-step process of forecasting asset returns and then applying mean--variance optimization (MVO), a sequence that can introduce instability. We evaluate the framework on nine U.S. stocks spanning six sectors, chosen to balance sector diversity and news coverage. In this setting, the model delivers higher cumulative returns and Sharpe ratios than equal-weighted and CAPM-based MVO benchmarks. Although the stock universe is limited, the results underscore the value of integrating price, relational, and sentiment signals for portfolio management and suggest promising directions for scaling the approach to larger, more diverse asset sets. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.24144 |
| By: | Mehmet Caner Qingliang Fan |
| Abstract: | In this review, we provide practical guidance on some of the main machine learning tools used in portfolio weight formation. This is not an exhaustive list, but a fraction of the ones used and have some statistical analysis behind it. All this research is essentially tied to precision matrix of excess asset returns. Our main point is that the techniques should be used in conjunction with outlined objective functions. In other words, there should be joint analysis of Machine Learning (ML) technique with the possible portfolio choice-objective functions in terms of test period Sharpe Ratio or returns. The ML method with the best objective function should provide the weight for portfolio formation. Empirically we analyze five time periods of interest, that are out-sample and show performance of some ML-Artificial Intelligence (AI) methods. We see that nodewise regression with Global Minimum Variance portfolio based weights deliver very good Sharpe Ratio and returns across five time periods in this century we analyze. We cover three downturns, and 2 long term investment spans. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.25456 |
| By: | Yang Li; Zhi Chen; Steve Y. Yang; Ruixun Zhang |
| Abstract: | Traditional stochastic control methods in finance rely on simplifying assumptions that often fail in real world markets. While these methods work well in specific, well defined scenarios, they underperform when market conditions change. We introduce FinFlowRL, a novel framework for financial stochastic control that combines imitation learning with reinforcement learning. The framework first pretrains an adaptive meta policy by learning from multiple expert strategies, then finetunes it through reinforcement learning in the noise space to optimize the generation process. By employing action chunking, that is generating sequences of actions rather than single decisions, it addresses the non Markovian nature of financial markets. FinFlowRL consistently outperforms individually optimized experts across diverse market conditions. |
| Date: | 2025–09 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.17964 |
| By: | Philipp Bach; Victor Chernozhukov; Carlos Cinelli; Lin Jia; Sven Klaassen; Nils Skotara; Martin Spindler |
| Abstract: | Causal Machine Learning has emerged as a powerful tool for flexibly estimating causal effects from observational data in both industry and academia. However, causal inference from observational data relies on untestable assumptions about the data-generating process, such as the absence of unobserved confounders. When these assumptions are violated, causal effect estimates may become biased, undermining the validity of research findings. In these contexts, sensitivity analysis plays a crucial role, by enabling data scientists to assess the robustness of their findings to plausible violations of unconfoundedness. This paper introduces sensitivity analysis and demonstrates its practical relevance through a (simulated) data example based on a use case at Booking.com. We focus our presentation on a recently proposed method by Chernozhukov et al. (2023), which derives general non-parametric bounds on biases due to omitted variables, and is fully compatible with (though not limited to) modern inferential tools of Causal Machine Learning. By presenting this use case, we aim to raise awareness of sensitivity analysis and highlight its importance in real-world scenarios. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.09109 |
| By: | Saeid Mashhadi; Amirhossein Saghezchi; Vesal Ghassemzadeh Kashani |
| Abstract: | This study develops an interpretable machine learning framework to forecast startup outcomes, including funding, patenting, and exit. A firm-quarter panel for 2010-2023 is constructed from Crunchbase and matched to U.S. Patent and Trademark Office (USPTO) data. Three horizons are evaluated: next funding within 12 months, patent-stock growth within 24 months, and exit through an initial public offering (IPO) or acquisition within 36 months. Preprocessing is fit on a development window (2010-2019) and applied without change to later cohorts to avoid leakage. Class imbalance is addressed using inverse-prevalence weights and the Synthetic Minority Oversampling Technique for Nominal and Continuous features (SMOTE-NC). Logistic regression and tree ensembles, including Random Forest, XGBoost, LightGBM, and CatBoost, are compared using the area under the precision-recall curve (PR-AUC) and the area under the receiver operating characteristic curve (AUROC). Patent, funding, and exit predictions achieve AUROC values of 0.921, 0.817, and 0.872, providing transparent and reproducible rankings for innovation finance. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.09465 |
| By: | Songrun He; Linying Lv; Asaf Manela; Jimmy Wu |
| Abstract: | We introduce a family of chronologically consistent, instruction-following large language models to eliminate lookahead bias. Each model is trained only on data available before a clearly defined knowledge-cutoff date, ensuring strict temporal separation from any post-cutoff data. The resulting framework offers (i) a simple, conversational chat interface, (ii) fully open, fixed model weights that guarantee replicability, and (iii) a conservative lower bound on forecast accuracy, isolating the share of predictability that survives once training leakage is removed. Together, these features provide researchers with an easy-to-use generative AI tool useful for a wide range of prediction tasks that is free of lookahead bias. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.11677 |
| By: | Ali Atiah Alzahrani |
| Abstract: | We present a deep BSDE and 2BSDE solver that combines truncated log signatures with a neural rough differential equation backbone for high dimensional, path dependent valuation and control. The design aligns stochastic analysis with sequence to path learning, using a CVaR tilted objective to emphasize left tail risk and an optional second order head for risk sensitive control. Under equal compute and parameter budgets, the method improves accuracy, tail fidelity, and training stability across Asian and barrier option pricing and portfolio control tasks. At 200 dimensions, it achieves CVaR(0.99) = 9.8 percent compared to 12.0-13.1 percent for strong baselines, while attaining low HJB residuals and small RMSE for Z and Gamma. Ablations confirm complementary gains from the sequence to path representation and the second order structure. Overall, the results show that combining stochastic analysis with modern deep learning expands the class of solvable path dependent financial models at scale. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.10728 |
| By: | Junyan Ye; Hoi Ying Wong; Kyunghyun Park |
| Abstract: | We propose and analyze a continuous-time robust reinforcement learning framework for optimal stopping problems under ambiguity. In this framework, an agent chooses a stopping rule motivated by two objectives: robust decision-making under ambiguity and learning about the unknown environment. Here, ambiguity refers to considering multiple probability measures dominated by a reference measure, reflecting the agent's awareness that the reference measure representing her learned belief about the environment would be erroneous. Using the $g$-expectation framework, we reformulate an optimal stopping problem under ambiguity as an entropy-regularized optimal control problem under ambiguity, with Bernoulli distributed controls to incorporate exploration into the stopping rules. We then derive the optimal Bernoulli distributed control characterized by backward stochastic differential equations. Moreover, we establish a policy iteration theorem and implement it as a reinforcement learning algorithm. Numerical experiments demonstrate the convergence and robustness of the proposed algorithm across different levels of ambiguity and exploration. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.10260 |
| By: | Ta-Chung Chi (Kevin); Ting-Han Fan (Kevin); Raffaele M. Ghigliazza (Kevin); Domenico Giannone (Kevin); Zixuan (Kevin); Wang |
| Abstract: | We forecast the full conditional distribution of macroeconomic outcomes by systematically integrating three key principles: using high-dimensional data with appropriate regularization, adopting rigorous out-of-sample validation procedures, and incorporating nonlinearities. By exploiting the rich information embedded in a large set of macroeconomic and financial predictors, we produce accurate predictions of the entire profile of macroeconomic risk in real time. Our findings show that regularization via shrinkage is essential to control model complexity, while introducing nonlinearities yields limited improvements in predictive accuracy. Out-of-sample validation plays a critical role in selecting model architecture and preventing overfitting. |
| Date: | 2025–10 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2510.11008 |