|
on Computational Economics |
| By: | Zuoyou Jiang; Li Zhao; Rui Sun; Ruohan Sun; Zhongjian Li; Jing Li; Daxin Jiang; Zuo Bai; Cheng Hua |
| Abstract: | Signal decay and regime shifts pose recurring challenges for data-driven investment strategies in non-stationary markets. Conventional time-series and machine learning approaches, which rely primarily on historical correlations, often struggle to generalize when the economic environment changes. While large language models (LLMs) offer strong capabilities for processing unstructured information, their potential to support quantitative factor screening through explicit economic reasoning remains underexplored. Existing factor-based methods typically reduce alphas to numerical time series, overlooking the semantic rationale that determines when a factor is economically relevant. We propose Alpha-R1, an 8B-parameter reasoning model trained via reinforcement learning for context-aware alpha screening. Alpha-R1 reasons over factor logic and real-time news to evaluate alpha relevance under changing market conditions, selectively activating or deactivating factors based on contextual consistency. Empirical results across multiple asset pools show that Alpha-R1 consistently outperforms benchmark strategies and exhibits improved robustness to alpha decay. The full implementation and resources are available at https://github.com/FinStep-AI/Alpha-R1. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.23515 |
| By: | Julia Ko\'nczal; Micha{\l} Balcerek; Krzysztof Burnecki |
| Abstract: | In recent years, the growing frequency and severity of natural disasters have increased the need for effective tools to manage catastrophe risk. Catastrophe (CAT) bonds allow the transfer of part of this risk to investors, offering an alternative to traditional reinsurance. This paper examines the role of climate variability in CAT bond pricing and evaluates the predictive performance of various machine learning models in forecasting CAT bond coupons. We combine features typically used in the literature with a new set of climate indicators, including Oceanic Ni{\~n}o Index, Arctic Oscillation, North Atlantic Oscillation, Outgoing Longwave Radiation, Pacific-North American pattern, Pacific Decadal Oscillation, Southern Oscillation Index, and sea surface temperatures. We compare the performance of linear regression with several machine learning algorithms, such as random forest, gradient boosting, extremely randomized trees, and extreme gradient boosting. Our results show that including climate-related variables improves predictive accuracy across all models, with extremely randomized trees achieving the lowest root mean squared error (RMSE). These findings suggest that large-scale climate variability has a measurable influence on CAT bond pricing and that machine learning methods can effectively capture these complex relationships. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.22660 |
| By: | Christophe D. Hounwanou; Yae Ulrich Gaba |
| Abstract: | Synthetic financial data provides a practical solution to the privacy, accessibility, and reproducibility challenges that often constrain empirical research in quantitative finance. This paper investigates the use of deep generative models, specifically Time-series Generative Adversarial Networks (TimeGAN) and Variational Autoencoders (VAEs) to generate realistic synthetic financial return series for portfolio construction and risk modeling applications. Using historical daily returns from the S and P 500 as a benchmark, we generate synthetic datasets under comparable market conditions and evaluate them using statistical similarity metrics, temporal structure tests, and downstream financial tasks. The study shows that TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns. When applied to mean--variance portfolio optimization, the resulting synthetic datasets lead to portfolio weights, Sharpe ratios, and risk levels that remain close to those obtained from real data. The VAE provides more stable training but tends to smooth extreme market movements, which affects risk estimation. Finally, the analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation, particularly when models are able to capture temporal dynamics. Synthetic data therefore provides a privacy-preserving, cost-effective, and reproducible tool for financial experimentation and model development. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.21798 |
| By: | Muço, Arieda |
| Abstract: | Using Brazilian municipal audit reports, I construct an automated corruption index that combines a dictionary of audit irregularities with principal component analysis. The index validates strongly against independent human coders, explaining 71–73 % of the variation in hand-coded corruption counts in samples where coders themselves exhibit high agreement, and the results are robust within these validation samples. The index behaves as theory predicts, correlating with municipal characteristics that prior research links to corruption. Supervised learning alternatives yield nearly identical municipal rankings (R2=0.98), confirming that the dictionary approach captures the same underlying construct. The method scales to the full audit corpus and offers advantages over both manual coding and Large Language Models (LLMs) in transparency, cost, and long-run replicability. |
| Date: | 2025–12–12 |
| URL: | https://d.repec.org/n?u=RePEc:osf:socarx:cftvk_v1 |
| By: | Greta Polo; Yuan Gao Rollinson; Ms. Yevgeniya Korniyenko; Tongfang Yuan |
| Abstract: | This paper presents a machine learning–based nowcasting framework for estimating quarterly non-oil GDP growth in the Gulf Cooperation Council (GCC) countries. Leveraging machine learning models tailored to each country, the framework integrates a broad range of high-frequency indicators—including real activity, financial conditions, trade, and oil-related variables—to produce timely, sector-specific estimates. Advancing the nowcasting literature for the MENA region, this approach moves beyond single-model methodologies by incorporating a richer set of high-frequency, cross-border indicators. It presents two key innovations: (i) a tailored data integration strategy that broadens and automates the use of high-frequency indicators; and (ii) a novel application of Shapley value decompositions to enhance model interpretability and guide the iterative selection of predictive indicators. The framework’s flexibility allows it to account for the region’s unique economic structures, ongoing reform agendas, and the spillover effects of oil market volatility on non-oil sectors. By enhancing the granularity, responsiveness, and transparency of short-term forecasts, the model enables faster, data-driven policy decisions strengthening economic surveillance and enhancing policy agility across the GCC amid a rapidly evolving global environment. |
| Keywords: | GCC; Nowcasting; Machine Learning; Non-oil Growth |
| Date: | 2025–12–19 |
| URL: | https://d.repec.org/n?u=RePEc:imf:imfwpa:2025/268 |
| By: | Jeffrey Allen; Max S. Hatfield |
| Abstract: | We examined the performance of four families of large language models (LLMs) and a variety of common fuzzy matching algorithms in assessing the similarity of names and addresses in a sanctions screening context. On average, across a range of realistic matching thresholds, the LLMs in our study reduced sanctions screening false positives by 92 percent and increased detection rates by 11 percent relative to the best-performing fuzzy matching baseline. Smaller, less computationally intensive models from the same language model families performed comparably, which may support scaling. In terms of computing performance, the LLMs were, on average, over four orders of magnitude slower than the fuzzy methods. To help address this, we propose a model cascade that escalates higher uncertainty screening cases to LLMs, while relying on fuzzy and exact matching for easier cases. The cascade is nearly twice as fast and just as accurate as the pure LLM system. We show even stronger runtime gains and comparable screening accuracy by relying on the fastest language models within the cascade. In the near term, the economic cost of running LLMs, inference latency, and other frictions, including API limits, will likely necessitate using these types of tiered approaches for sanctions screening in high-velocity and high-throughput financial activities, such as payments. Sanctions screening in slower-moving processes, such as customer due diligence for account opening and lending, may be able to rely on LLMs more extensively. |
| Keywords: | Large Language Models; Sanctions Screening; Model cascading |
| Date: | 2025–09–29 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedgfe:2025-92 |
| By: | E. C. Garrido-Merch\'an; S. Mora-Figueroa; M. Coronado-Vaca |
| Abstract: | DRL agents circumvent the issue of classic models in the sense that they do not make assumptions like the financial returns being normally distributed and are able to deal with any information like the ESG score if they are configured to gain a reward that makes an objective better. However, the performance of DRL agents has high variability and it is very sensible to the value of their hyperparameters. Bayesian optimization is a class of methods that are suited to the optimization of black-box functions, that is, functions whose analytical expression is unknown, are noisy and expensive to evaluate. The hyperparameter tuning problem of DRL algorithms perfectly suits this scenario. As training an agent just for one objective is a very expensive period, requiring millions of timesteps, instead of optimizing an objective being a mixture of a risk-performance metric and an ESG metric, we choose to separate the objective and solve the multi-objective scenario to obtain an optimal Pareto set of portfolios representing the best tradeoff between the Sharpe ratio and the ESG mean score of the portfolio and leaving to the investor the choice of the final portfolio. We conducted our experiments using environments encoded within the OpenAI Gym, adapted from the FinRL platform. The experiments are carried out in the Dow Jones Industrial Average (DJIA) and the NASDAQ markets in terms of the Sharpe ratio achieved by the agent and the mean ESG score of the portfolio. We compare the performance of the obtained Pareto sets in hypervolume terms illustrating how portfolios are the best trade-off between the Sharpe ratio and mean ESG score. Also, we show the usefulness of our proposed methodology by comparing the obtained hypervolume with one achieved by a Random Search methodology on the DRL hyperparameter space. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.14992 |
| By: | Ferraz, Vinícius; Olah, Tamas; Sazedul, Ratin; Schmidt, Robert; Schwieren, Christiane |
| Abstract: | We investigate if Large Language Models (LLMs) exhibit personality-driven strategic behavior in the Ultimatum Game by manipulating Dark Factor of Personality (D-Factor) profiles via standardized prompts. Across 400k decisions from 17 open-source models and 4, 166 human benchmarks, we test whether LLMs playing the proposer and responder roles exhibit systematic behavioral shifts across five D-Factor levels (from least to most selfish). The proposer role exhibited strong monotonic declines in fair offers from 91% (D1) to 17% (D5), mirroring human patterns but with 34% steeper gradients, indicating hypersensitivity to personality prompts. Responders diverged sharply: where humans became more punitive at higher D-levels, LLMs maintained high acceptance rates (75-92%) with weak or reversed D-Factor sensitivity, failing to reproduce reciprocity-punishment dynamics. These role-specific patterns align with strong-weak situation accounts—personality matters when incentives are ambiguous (proposers) but is muted when contingent (responders). Cross-model heterogeneity was substantial: Models exhibiting the closest alignment with human behavior, according to composite similarity scores (integrating prosocial rates, D-Factor correlations, and odds ratios), were dolphin3, deepseek_1.5b, and llama3.2 (0.74-0.85), while others exhibited extreme or non-variable behavior. Temperature settings (0.2 vs. 0.8) exerted minimal influence. We interpret these patterns as prompt-driven regularities rather than genuine motivational processes, suggesting LLMs can approximate but not fully replicate human strategic behavior in social dilemmas. |
| Date: | 2025–12–16 |
| URL: | https://d.repec.org/n?u=RePEc:awi:wpaper:0768 |
| By: | Bong-Gyu Jang; Younwoo Jeong; Changeun Kim |
| Abstract: | We introduce the \textit{Consensus-Bottleneck Asset Pricing Model} (CB-APM), a partially interpretable neural network that replicates the reasoning processes of sell-side analysts by capturing how dispersed investor beliefs are compressed into asset prices through a consensus formation process. By modeling this ``bottleneck'' to summarize firm- and macro-level information, CB-APM not only predicts future risk premiums of U.S. equities but also links belief aggregation to expected returns in a structurally interpretable manner. The model improves long-horizon return forecasts and outperforms standard deep learning approaches in both predictive accuracy and explanatory power. Comprehensive portfolio analyses show that CB-APM's out-of-sample predictions translate into economically meaningful payoffs, with monotonic return differentials and stable long-short performance across regularization settings. Empirically, CB-APM leverages consensus as a regularizer to amplify long-horizon predictability and yields interpretable consensus-based components that clarify how information is priced in returns. Moreover, regression and GRS-based pricing diagnostics reveal that the learned consensus representations capture priced variation only partially spanned by traditional factor models, demonstrating that CB-APM uncovers belief-driven structure in expected returns beyond the canonical factor space. Overall, CB-APM provides an interpretable and empirically grounded framework for understanding belief-driven return dynamics. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.16251 |
| By: | Nathan Kallus |
| Abstract: | Aligning large language models to preference data is commonly implemented by assuming a known link function between the distribution of observed preferences and the unobserved rewards (e.g., a logistic link as in Bradley-Terry). If the link is wrong, however, inferred rewards can be biased and policies be misaligned. We study policy alignment to preferences under an unknown and unrestricted link. We consider an $f$-divergence-constrained reward maximization problem and show that realizability of the solution in a policy class implies a semiparametric single-index binary choice model, where a scalar-valued index determined by a policy captures the dependence on demonstrations and the rest of the preference distribution is an unrestricted function thereof. Rather than focus on estimation of identifiable finite-dimensional structural parameters in the index as in econometrics, we focus on policy learning, focusing on error to the optimal policy and allowing unidentifiable and nonparametric indices. We develop a variety of policy learners based on profiling the link function, orthogonalizing the link function, and using link-agnostic bipartite ranking objectives. We analyze these and provide finite-sample policy error bounds that depend on generic functional complexity measures of the index class. We further consider practical implementations using first-order optimization suited to neural networks and batched data. The resulting methods are robust to unknown preference noise distribution and scale, while preserving the direct optimization of policies without explicitly fitting rewards. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.21917 |
| By: | Daniel H. Karney; Don Fullerton; Kathy Baylis |
| Abstract: | Computational general equilibrium (CGE) models can evaluate detailed tax reforms, trade restrictions, or environmental policy. These models can capture many complexities, but these complexities can make results difficult to interpret. Analytical general equilibrium (AGE) models provide better intuition and interpretation but cannot capture relevant complexities. We propose a method that employs AGE models to understand CGE models – a “model of the model”. We apply this idea to climate policy and carbon leakage – the increase in emissions elsewhere. Our AGE models identify seven key economic determinants of leakage within any one outcome. We then unpack results from three existing CGE models. |
| Keywords: | analyticala general equilibrium, AGE, and computable general equilibrium (CGE) models |
| JEL: | C63 H23 Q58 |
| Date: | 2025 |
| URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12332 |
| By: | Frank Tian-Fang Ye (Division of Social Sciences, The HKU SPACE Community College, Hong Kong SAR, PRC); Xiaozi Gao (Department of Early Childhood Education, Education University of Hong Kong, Hong Kong SAR, PRC) |
| Abstract: | China's marriage registrations have declined dramatically, dropping from 13.47 million couples in 2013 to 6.1 million in 2024. Understanding public attitudes toward marriage requires examining not only emotional sentiment but also the moral reasoning underlying these evaluations. This study analyzed 219, 358 marriage-related posts from two major Chinese social media platforms (Sina Weibo and Xiaohongshu) using large language model (LLM)-assisted content analysis. Drawing on Shweder's Big Three moral ethics framework, posts were coded for sentiment (positive, negative, neutral) and moral dimensions (Autonomy, Community, Divinity). Results revealed platform differences: Weibo discourse skewed positive, while Xiaohongshu was predominantly neutral. Most posts across both platforms lacked explicit moral framing. However, when moral ethics were invoked, significant associations with sentiment emerged. Posts invoking Autonomy ethics and Community ethics were predominantly negative, whereas Divinity-framed posts tended toward neutral or positive sentiment. These findings suggest that concerns about both personal autonomy constraints and communal obligations drive negative marriage attitudes in contemporary China. The study demonstrates LLMs' utility for scaling qualitative analysis and offers insights for developing culturally informed policies addressing marriage decline in Chinese contexts. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.23609 |