|
on Financial Markets |
| By: | Kamil Kashif; Robert \'Slepaczuk |
| Abstract: | This study develops and evaluates a deep reinforcement learning framework for dynamic portfolio allocation across global equity markets. The Soft Actor-Critic algorithm is used to learn continuous portfolio weights within a Markov Decision Process, incorporating transaction costs, turnover penalties, and diversification constraints into the reward function. Five model configurations are compared, varying in reward formulation, policy structure (flat versus hierarchical Dirichlet), portfolio constraints, and temporal encoder (LSTM versus Transformer), and evaluated via walk-forward optimization across sixteen out-of-sample folds spanning 2003-2026 on the Nasdaq-100, Nikkei 225, and Euro Stoxx 50. Results show that RL strategies achieve competitive risk-adjusted performance primarily in the Euro Stoxx 50, where statistically significant abnormal returns are observed, but the central hypothesis is only partially confirmed: no strategy achieves statistically significant excess returns relative to Buy and Hold under HAC-robust inference across all markets. Regime analysis reveals that RL adds the most value during periods of elevated uncertainty, while ensemble aggregation across markets improves risk-adjusted performance and confirms the benefits of geographic diversification. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.17307 |
| By: | Sebastian Bell; Ali Kakhbod; Martin Lettau; Abdolreza Nazemi |
| Abstract: | We propose AlphaGlass, an inherently interpretable machine-learning framework for constructing portfolios that directly optimize investment objectives. AlphaGlass maps stock characteristics into additive signals with sparse interactions and converts these signals into long-short portfolios through a differentiable rank-and-mask layer. This end-to-end design allows the model to optimize objectives such as the Sharpe ratio or mean-variance utility while keeping portfolio weights interpretable and traceable to specific characteristics and interactions. We show theoretically that in-sample objective maximization consistently estimates the population objective and that the differentiable rank-and-mask layer is a faithful smooth proxy for the corresponding conventional long-short quantile portfolio. In U.S. equities, AlphaGlass delivers strong out-of-sample performance and reveals economically interpretable drivers of long and short positions. |
| JEL: | C14 C45 G10 G11 G12 |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:35186 |
| By: | Lin William Cong; Ke Tang; Jingyuan Wang |
| Abstract: | We adapt attention-based neural networks and reinforcement learning to direct portfolio construction, allowing broader portfolio-management objectives (including non-time-additively separable ones) and in a data-driven way, searching over a much richer policy/strategy space than low-dimensional parametric rules or human-specified strategies. As arguably the first non-text-based, “large” GenAI model in Finance, AlphaPortfolio accommodates long- and short-range path dependence in firm and market states (e.g., using Transformer encoder), cross-asset information, flexible (path-dependent) objectives (incl. Sharpe ratio, which is non-additively separable across periods) for end-to-end (rather than step-by-step) optimizations. In U.S. equities, AlphaPortfolio yields superior out-of-sample performance (e.g., Sharpe ratio above two and risk-adjusted alpha over 13% with monthly rebalancing) robust under various market conditions and economic restrictions (e.g., exclusion of small/illiquid stocks) and over time. The gains come from the direct construction, effective sequence modeling, and cross-asset attention network. We further demonstrate AlphaPortfolio's flexibility to incorporate transaction costs, state interactions, and alternative objectives, before developing a polynomial-feature-sensitivity analysis to uncover key drivers of performance, including their rotation and nonlinearity. |
| JEL: | C14 C58 G11 G12 |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:35195 |
| By: | Winston Wei Dou; Wei Wang; Wenyu Wang |
| Abstract: | Distressed firms need urgent financing to preserve operations and avoid inefficient liquidation, but they borrow in concentrated markets shaped by existing-creditor blocking power and a small group of specialized lenders. We show that these borrowers pay exceptionally high loan spreads even after removing compensation for credit risk, liquidity risk, and non-risk loan-making costs. To quantify and decompose lender market power, we develop and estimate a dynamic game-theoretic model of distressed lending with latent demand heterogeneity, endogenous lender participation, creditor blocking power, and tacit collusion sustained by repeated syndication. Using granular facility-level data on debtor-in-possession (DIP) loans and highly speculative loans, we find that lender market power explains 533 bps of risk-adjusted spreads in the DIP loan market and 300 bps in the highly speculative loan market, including about 140 bps from tacit collusion in each market. Lender market power is therefore a major source of financial distress costs, reducing survival-critical liquidity by 16–20% and thereby worsening asset-value destruction. |
| JEL: | C11 D43 G12 G18 G2 G21 G23 G28 K21 L13 L4 |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:35206 |
| By: | Yosuke Fukunishi (The Graduate School of Economics, The University of Tokyo); Haorong Qiu (Formerly Graduate School of Economics, The University of Tokyo); Akihiko Takahashi (The University of Tokyo) |
| Abstract: | Modeling the probability distribution of stock returns is a fundamental challenge in quantitative finance, with significant implications for risk management, derivative pricing, and portfolio optimization. This paper proposes a diffusion-based generative framework tailored to the statistical characteristics of financial return distributions. By incorporating learned reverse-process variance, velocity parameterization, and a sigmoid noise schedule, the proposed model aims to improve distributional fidelity, particularly in the tails. The framework is further extended to regime-conditional generation, enabling controlled simulation of distinct market states. Empirical evaluations demonstrate that the proposed approach outperforms classical parametric models such as Geometric Brownian Motion and GARCH, deep generative baselines like VAEs, and existing diffusion-based methods across multiple distributional metrics, including higher-order moments and tail behaviors. The results highlight the potential of diffusion models as robust tools for synthetic return generation and scenario analysis in finance. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:tky:fseres:2026cf1273 |
| By: | Domagoj Ćorić (Dr. Franjo Tuđman Defense and Security University, Department of Statistics); Matej Kožnjak (Faculty of science – Department of Mathematics, University of Zagreb); Dražen Smiljanić (Dr. Franjo Tuđman Defense and Security University) |
| Abstract: | This paper examines the long-run cointegration between the German DAX and the US S&P 500 from January 2021 to December 2025, using daily closing prices expressed as natural logarithms. The central argument is that methodological choices prevalent in the existing literature systematically fail to detect genuine long-run equilibrium relationships due to the econometric costs of over-differencing and firststep OLS bias amplification. Drawing on the theoretical contributions of Granger and Newbold (1974), Granger (1981), Granger and Joyeux (1980), Engle and Granger (1987), and Phillips (1988), the paper reconstructs the conditions under which differencing destroys low-frequency spectral dynamics and renders standard cointegration tests unreliable. With this conclusion in mind, the paper tests two models that encompass long-term cointegration: ARDL and ECM models. The empirical analysis confirms that the ARDL model outperforms its competitor, making it the best-fit methodology for modelling cross-Atlantic equity market integration and carrying direct implications for portfolio diversification, financial stability monitoring, and applied econometric practice. |
| Keywords: | cointegration, ARDL, error correction model, DAX, S&P 500, spurious regression, fractional integration, capital market integration |
| JEL: | C22 C51 G17 |
| Date: | 2026–04–14 |
| URL: | https://d.repec.org/n?u=RePEc:zag:wpaper:2603 |
| By: | Xin Li; Yan Ke; Longbing Cao |
| Abstract: | ESG-aware portfolio optimization is increasingly important for sustainable capital allocation, yet most learning-based methods still operationalize ESG by appending static scores to the policy observation or reward. This creates a mismatch for sequential control: ESG scores are noisy, provider-dependent, low-frequency, and temporally misaligned with sequential portfolio decisions, while financial evidence suggests that ESG is better treated as a portfolio preference, risk-exposure, or hedge dimension than as a robust alpha factor. We propose to impose ESG constraints without modifying the financial policy's observation or reward, using a Multimodal Action-Conditioned Constraint Field (MACF) that learns mechanism-specific ESG costs from point-in-time multimodal evidence and contemplated portfolio transitions. We then introduce MACF-X, a family of optimizer-specific adapters that converts MACF costs and uncertainties into native constrained-optimization interfaces through a shared slack- and uncertainty-aware pressure layer. Across multiple constraint-integration interfaces, MACF-X reduces tail ESG budget pressure while maintaining competitive financial performance. Ablations show that this improvement depends on dynamic evidence inputs and three-head decomposition, while static ESG-score proxies are nearly indistinguishable from score-shuffled noise baselines. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.09310 |
| By: | Bj\"orn L\"ofdahl Grelsson |
| Abstract: | Historical Simulation (HS) and its extensions form a popular class of methods for estimating Value-at-Risk for portfolios of financial assets based on historical data. In this note, we seek to unify several ideas and models from throughout the literature into a single modeling framework. By explicitly defining a parametric model form for the asset returns and extracting the realized increments of the driving innovation process from historical data, we are able to reproduce the Historical Simulation, filtered Historical Simulation, and displaced Historical Simulation methods. This shows beyond a doubt that these methods need more underlying assumptions than what is often alluded to. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.10066 |
| By: | Kirill Zernikov (New Economic School) |
| Abstract: | This paper studies empirical deep hedging for S&P 500 index options under a local downside-shortfall reward. It moves beyond performance comparison by asking what the learned hedge does, when it fails, and whether it can be made auditable. TD3 agents are compared with a daily-updated Black-Scholes delta hedge on the same option episodes. In walk-forward tests from 2015 to 2023, the agents usually learn a systematic delta haircut relative to Black-Scholes. The correction is explained by spot-implied-volatility co-movement and often improves accumulated reward and terminal downside variance, but it is regime-fragile: 2022 exposes losses in adverse daily states, while 2023 shows that underhedging can raise ordinary variance when option P&L is spot-dominated and the volatility channel is unusually weak. Symbolic regression distills the neural policies into compact formulas that can be traded out of sample; these formulas preserve much of the reward, downside-variance, and CVaR advantage over Black-Scholes, and sometimes sharpen it, but inherit the same fragility in difficult regimes. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.21696 |