nep-ain 2025-08-18 papers

on Artificial Intelligence

Issue of 2025–08–18
24 papers chosen by
Ben Greiner, Wirtschaftsuniversität Wien

Can Author Manipulation of AI Referees be Welfare Improving? By Joshua S. Gans
AI Agents and the Attention Lemons Problem in Two-Sided Ad Markets By Md Mahadi Hasan
Algorithmic Coercion with Faster Pricing By Zach Y. Brown; Alexander MacKay
What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce By Amine Allouah; Omar Besbes; Josu\'e D Figueroa; Yash Kanoria; Akshit Kumar
Advancing AI Capabilities and Evolving Labor Outcomes By Jacob Dominski; Yong Suk Lee
How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index By Golo Henseke; Rhys Davies; Alan Felstead; Duncan Gallie; Francis Green; Ying Zhou
Will Compute Bottlenecks Prevent an Intelligence Explosion? By Parker Whitfill; Cheryl Wu
Beyond pay: AI skills reward more job benefits By Fabian Stephany; Alejandra Mira; Matthew Bone
Valuing Time in Silicon: Can Large Language Model Replicate Human Value of Travel Time By Yingnan Yan; Tianming Liu; Yafeng Yin
The ordinary meaning bot: Simulating human surveys with LLMs By Johannes Kruse
Human Realignment: An Empirical Study of LLMs as Legal Decision-Aids in Moral Dilemmas By Christoph Engel; Yoan Hermstrüwer; Alison Kim
Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities By Georges Sfeir; Gabriel Nova; Stephane Hess; Sander van Cranenburgh
How AI Detects Financial Fraud: A Review of Emerging Deep Learning Methods By Mori, Misato
Financial inclusion and large language models By Ozili, Peterson K; Obiora, Kingsley I; Onuzo, Chinwendu
Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis By Md Talha Mohsin
Your AI, Not Your View: The Bias of LLMs in Investment Analysis By Hoyoung Lee; Junhyuk Seo; Suhwan Park; Junhyeong Lee; Wonbin Ahn; Chanyeol Choi; Alejandro Lopez-Lira; Yongjae Lee
FinSurvival: A Suite of Large Scale Survival Modeling Tasks from Finance By Aaron Green; Zihan Nie; Hanzhen Qin; Oshani Seneviratne; Kristin P. Bennett
FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification By Baptiste Lefort; Eric Benhamou; Beatrice Guez; Jean-Jacques Ohana; Ethan Setrouk; Alban Etienne
AI-Powered Trading, Algorithmic Collusion, and Price Efficiency By Winston Wei Dou; Itay Goldstein; Yan Ji
Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach By Wei Lu; Daniel L. Chen; Christian B. Hansen
Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures By Nicholas Botti; Flora Haberkorn; Charlotte Hoopes; Shaun Khan
Defining Current and Expected Financial Constraints Using AI: Reinterpreting the Cash Flow Sensitivity of Cash By Rachel Cho; Christoph Görtz; Danny McGowan; Max Schröder
Financial Regulation and AI: A Faustian Bargain? By Coppola, Antonio; Clayton, Christopher
ContestTrade: A Multi-Agent Trading System Based on Internal Contest Mechanism By Li Zhao; Rui Sun; Zuoyou Jiang; Bo Yang; Yuxiao Bai; Mengting Chen; Xinyang Wang; Jing Li; Zuo Bai

Can Author Manipulation of AI Referees be Welfare Improving?

By:	Joshua S. Gans
Abstract:	This paper examines a new moral hazard in delegated decision-making: authors can embed hidden instructions—known as prompt injections—to bias AI referees in academic peer review, thereby hijacking machine recommendations. Because AI reviews are relatively inexpensive compared to manual assessments, referees would otherwise delegate fully, which undermines quality. The paper shows that moderate detection of manipulation can paradoxically improve welfare. With intermediate detection probabilities, only low-quality authors undertake manipulation, and detection becomes informative about quality, inducing referees to mix between manual and AI reviews. This partially separating equilibrium preserves the value of peer review when AI quality is intermediate. When detection is too low, all bad papers are manipulated and the market unravels; when detection is perfect, referees use only AI and acceptance collapses. Thus, some prompt injection must be tolerated to sustain the market: it disciplines referees and generates information. The results caution against zero-tolerance enforcement and highlight how prompt injection can, counterintuitively, play a welfare-enhancing role when AI reviews are easily produced.
JEL:	D82 D86 O33
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:34082

AI Agents and the Attention Lemons Problem in Two-Sided Ad Markets

By:	Md Mahadi Hasan
Abstract:	I develop a theoretical model to examine how the rise of autonomous AI (artificial intelligence) agents disrupts two-sided digital advertising markets. Through this framework, I demonstrate that users' rational, private decisions to delegate browsing to agents create a negative externality, precipitating declines in ad prices, publisher revenues, and overall market efficiency. The model identifies the conditions under which publisher interventions such as blocking AI agents or imposing tolls may mitigate these effects, although they risk fragmenting access and value. I formalize the resulting inefficiency as an ``attention lemons" problem, where synthetic agent traffic dilutes the quality of attention sold to advertisers, generating adverse selection. To address this, I propose a Pigouvian correction mechanism: a per-delegation fee designed to internalize the externality and restore welfare. The model demonstrates that, for an individual publisher, charging AI agents toll fees for access strictly dominates both the `Blocking' and `Null (inaction)' strategies. Finally, I characterize a critical tipping point beyond which unchecked delegation triggers a collapse of the ad-funded digital market.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22435

Algorithmic Coercion with Faster Pricing

By:	Zach Y. Brown; Alexander MacKay
Abstract:	We examine a model in which one firm uses a pricing algorithm that enables faster pricing and multi-period commitment. We characterize a coercive equilibrium in which the algorithmic firm maximizes its profits subject to the incentive compatibility constraint of its rival. By adopting an algorithm that enables faster pricing and (imperfect) commitment, a firm can unilaterally induce substantially higher equilibrium prices even when its rival maximizes short-run profits and cannot collude. The algorithmic firm can earn profits that exceed its share of collusive profits, and coercive equilibrium outcomes can be worse for consumers than collusive outcomes. In extensions, we incorporate simple learning by the rival, and we explore the implications for platform design.
JEL:	D43 L13 L40 L81 L86
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:34070

What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce

By:	Amine Allouah; Omar Besbes; Josu\'e D Figueroa; Yash Kanoria; Akshit Kumar
Abstract:	Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, vision-language-model (VLM) agents can parse webpages, evaluate products, and transact. This raises a fundamental question: what do AI agents buy, and why? We develop ACES, a sandbox environment that pairs a platform-agnostic VLM agent with a fully programmable mock marketplace to study this question. We first conduct basic rationality checks in the context of simple tasks, and then, by randomizing product positions, prices, ratings, reviews, sponsored tags, and platform endorsements, we obtain causal estimates of how frontier VLMs actually shop. Models show strong but heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal "top" rank. They penalize sponsored tags and reward endorsements. Sensitivities to price, ratings, and reviews are directionally human-like but vary sharply in magnitude across models. Motivated by scenarios where sellers use AI agents to optimize product listings, we show that a seller-side agent that makes minor tweaks to product descriptions, targeting AI buyer preferences, can deliver substantial market-share gains if AI-mediated shopping dominates. We also find that modal product choices can differ across models and, in some cases, demand may concentrate on a few select products, raising competition questions. Together, our results illuminate how AI agents may behave in e-commerce settings and surface concrete seller strategy, platform design, and regulatory questions in an AI-mediated ecosystem.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.02630

Advancing AI Capabilities and Evolving Labor Outcomes

By:	Jacob Dominski; Yong Suk Lee
Abstract:	This study investigates the labor market consequences of AI by analyzing near real-time changes in employment status and work hours across occupations in relation to advances in AI capabilities. We construct a dynamic Occupational AI Exposure Score based on a task-level assessment using state-of-the-art AI models, including ChatGPT 4o and Anthropic Claude 3.5 Sonnet. We introduce a five-stage framework that evaluates how AI's capability to perform tasks in occupations changes as technology advances from traditional machine learning to agentic AI. The Occupational AI Exposure Scores are then linked to the US Current Population Survey, allowing for near real-time analysis of employment, unemployment, work hours, and full-time status. We conduct a first-differenced analysis comparing the period from October 2022 to March 2023 with the period from October 2024 to March 2025. Higher exposure to AI is associated with reduced employment, higher unemployment rates, and shorter work hours. We also observe some evidence of increased secondary job holding and a decrease in full-time employment among certain demographics. These associations are more pronounced among older and younger workers, men, and college-educated individuals. College-educated workers tend to experience smaller declines in employment but are more likely to see changes in work intensity and job structure. In addition, occupations that rely heavily on complex reasoning and problem-solving tend to experience larger declines in full-time work and overall employment in association with rising AI exposure. In contrast, those involving manual physical tasks appear less affected. Overall, the results suggest that AI-driven shifts in labor are occurring along both the extensive margin (unemployment) and the intensive margin (work hours), with varying effects across occupational task content and demographics.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.08244

How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index

By:	Golo Henseke; Rhys Davies; Alan Felstead; Duncan Gallie; Francis Green; Ying Zhou
Abstract:	We introduce the Generative AI Susceptibility Index (GAISI), a task-based measure of UK job exposure to large language models (LLMs), such as ChatGPT. GAISI is derived from probabilistic task ratings by LLMs and linked to worker-reported task data from the Skills and Employment Surveys. It reflects the share of job activities where an LLM or LLM-powered system can reduce task completion time by at least 25 per cent beyond existing productivity tools. The index demonstrates high reliability, strong alignment with AI capabilities, and superior predictive power compared to existing exposure measures. By 2023-24, nearly all UK jobs exhibited some exposure, yet only a minority were heavily affected. Aggregate exposure has risen since 2017, primarily due to occupational shifts rather than changes in task profiles. The price premium for AI-exposed tasks declined relative to 2017, measuring approximately 11 per cent lower in 2023-24. Job postings in high-exposure roles also fell by 6.5 per cent following the release of ChatGPT. GAISI offers a robust framework for assessing generative AI's impact on work, providing early evidence that displacement effects may already outweigh productivity gains.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22748

Will Compute Bottlenecks Prevent an Intelligence Explosion?

By:	Parker Whitfill; Cheryl Wu
Abstract:	The possibility of a rapid, "software-only" intelligence explosion brought on by AI's recursive self-improvement (RSI) is a subject of intense debate within the AI community. This paper presents an economic model and an empirical estimation of the elasticity of substitution between research compute and cognitive labor at frontier AI firms to shed light on the possibility. We construct a novel panel dataset for four leading AI labs (OpenAI, DeepMind, Anthropic, and DeepSeek) from 2014 to 2024 and fit the data to two alternative Constant Elasticity of Substitution (CES) production function models. Our two specifications yield divergent results: a baseline model estimates that compute and labor are substitutes, whereas a 'frontier experiments' model, which accounts for the scale of state-of-the-art models, estimates that they are complements. We conclude by discussing the limitations of our analysis and the implications for forecasting AI progress.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.23181

Beyond pay: AI skills reward more job benefits

By:	Fabian Stephany; Alejandra Mira; Matthew Bone
Abstract:	This study investigates the non-monetary rewards associated with artificial intelligence (AI) skills in the U.S. labour market. Using a dataset of approximately ten million online job vacancies from 2018 to 2024, we identify AI roles-positions requiring at least one AI-related skill-and examine the extent to which these roles offer non-monetary benefits such as tuition assistance, paid leave, health and well-being perks, parental leave, workplace culture enhancements, and remote work options. While previous research has documented substantial wage premiums for AI-related roles due to growing demand and limited talent supply, our study asks whether this demand also translates into enhanced non-monetary compensation. We find that AI roles are significantly more likely to offer such perks, even after controlling for education requirements, industry, and occupation type. It is twice as likely for an AI role to offer parental leave and almost three times more likely to provide remote working options. Moreover, the highest-paying AI roles tend to bundle these benefits, suggesting a compound premium where salary increases coincide with expanded non-monetary rewards. AI roles offering parental leave or health benefits show salaries that are, on average, 12% to 20% higher than AI roles without this benefit. This pattern is particularly pronounced in years and occupations experiencing the highest AI-related demand, pointing to a demand-driven dynamic. Our findings underscore the strong pull of AI talent in the labor market and challenge narratives of technological displacement, highlighting instead how employers compete for scarce talent through both financial and non-financial incentives.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20410

Valuing Time in Silicon: Can Large Language Model Replicate Human Value of Travel Time

By:	Yingnan Yan; Tianming Liu; Yafeng Yin
Abstract:	As a key advancement in artificial intelligence, large language models (LLMs) are set to transform transportation systems. While LLMs offer the potential to simulate human travelers in future mixed-autonomy transportation systems, their behavioral fidelity in complex scenarios remains largely unconfirmed by existing research. This study addresses this gap by conducting a comprehensive analysis of the value of travel time (VOT) of a popular LLM, GPT-4o. We employ a full factorial experimental design to systematically examine the LLM's sensitivity to various transportation contexts, including the choice setting, travel purpose, income, and socio-demographic factors. Our results reveal a high degree of behavioral similarity between the LLM and humans. The LLM exhibits an aggregate VOT similar to that of humans, and demonstrates human-like sensitivity to travel purpose, income, and the time-cost trade-off ratios of the alternatives. Furthermore, the behavioral patterns of LLM are remarkably consistent across varied contexts. However, we also find that the LLM's context sensitivity is less pronounced than that observed in humans. Overall, this study provides a foundational benchmark for the future development of LLMs as proxies for human travelers, demonstrating their value and robustness while highlighting that their blunted contextual sensitivity requires careful consideration.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22244

The ordinary meaning bot: Simulating human surveys with LLMs

By:	Johannes Kruse (Max Planck Institute for Research on Collective Goods, Bonn)
Abstract:	This comment shows how large language models (LLMs) can help courts discern the "ordinary meaning" of statutory terms. Instead of relying on expert-heavy corpusâ€‘linguistic techniques (Gries 2025), the author simulates a human survey with GPTâ€‘4o. Demographically realistic AI agents replicate the 2, 835 participants in Tobia's 2020 study on vehicle and yield response distributions with no statistically significant difference from the human data (Kolmogorovâ€“Smirnov p = 0.915). The paper addresses concerns about hallucinations, reproducibility, data leakage, and explainability, and introduces the lockedâ€‘prompt "Ordinary Meaning Bot, " arguing that LLM-based survey simulation is a practical, accurate alternative to dictionaries, intuition, or complex corpus analysis.
Keywords:	ordinary meaning; large language models; prompt engineering; human survey simulation; alignment
JEL:	K1 Z0
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_12

Human Realignment: An Empirical Study of LLMs as Legal Decision-Aids in Moral Dilemmas

By:	Christoph Engel (Max Planck Institute for Research on Collective Goods, Bonn); Yoan Hermstrüwer (University of Zurich); Alison Kim (University of Zurich)
Abstract:	Recent advances in AI create possibilities for delegating legal decision-making to machines or enhancing human adjudication through AI assistance. Using classic normative conflicts-the trolley problem and similar moral dilemmas-as a proof of concept, we examine the alignment between AI legal reasoning and human judgment. In our baseline experiment, we find a pronounced mismatch between decisions made by GPT and those of human subjects. This misalignment raises substantive concerns for AI-powered legal decision-aids. We investigate whether explicit normative guidance can address this misalignment, with mixed results. GPT-3.5 is susceptible to such intervention, but frequently refuses to decide when faced with a moral dilemma. GPT-4 is outright utilitarian, and essentially ignores the instruction to decide on deontological grounds. GPT-o3-mini faithfully implements this instruction, but is unwilling to balance deontological and utilitarian concerns if instructed to do so. At least for the time being, explicit normative instructions are not fully able to realign AI advice with the normative convictions of the legislator.
Keywords:	large language models, human-AI alignment, rule of law, moral dilemmas, trolley problems
JEL:	C99 D63 D81 K10 K40 Z13
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_03

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

By:	Georges Sfeir; Gabriel Nova; Stephane Hess; Sander van Cranenburgh
Abstract:	Large Language Models (LLMs) are widely used to support various workflows across different disciplines, yet their potential in choice modelling remains relatively unexplored. This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models. We implement a systematic experimental framework involving thirteen versions of six leading LLMs (ChatGPT, Claude, DeepSeek, Gemini, Gemma, and Llama) evaluated under five experimental configurations. These configurations vary along three dimensions: modelling goal (suggesting vs. suggesting and estimating MNLs); prompting strategy (Zero-Shot vs. Chain-of-Thoughts); and information availability (full dataset vs. data dictionary only). Each LLM-suggested specification is implemented, estimated, and evaluated based on goodness-of-fit metrics, behavioural plausibility, and model complexity. Findings reveal that proprietary LLMs can generate valid and behaviourally sound utility specifications, particularly when guided by structured prompts. Open-weight models such as Llama and Gemma struggled to produce meaningful specifications. Claude 4 Sonnet consistently produced the best-fitting and most complex models, while GPT models suggested models with robust and stable modelling outcomes. Some LLMs performed better when provided with just data dictionary, suggesting that limiting raw data access may enhance internal reasoning capabilities. Among all LLMs, GPT o3 was uniquely capable of correctly estimating its own specifications by executing self-generated code. Overall, the results demonstrate both the promise and current limitations of LLMs as assistive agents in choice modelling, not only for model specification but also for supporting modelling decision and estimation, and provide practical guidance for integrating these tools into choice modellers' workflows.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.21790

How AI Detects Financial Fraud: A Review of Emerging Deep Learning Methods

By:	Mori, Misato
Abstract:	Financial fraud generates persistent risk and capital loss across sectors. This study investigates artificial intelligence (AI) methodologies for financial fraud detection, with emphasis on Retrieval-Augmented Generation (RAG). The review covers supervised classification, unsupervised anomaly detection, and graph-based relational modeling using deep neural networks, transformers, and hybrid architectures. Challenges include class imbalance, concept drift, and decision interpretability. We describe the RAG framework integrating retrievers and generative language models with external knowledge bases. Empirical comparisons on synthetic and real-time fraud datasets show improved F1-score, precision, and contextual reasoning in contrast to fine-tuned transformers and static classifiers. Applications include transaction monitoring, policy violation detection, account takeover analysis, and social engineering prevention. Evaluation highlights retrieval-grounded generation as an effective fraud signal augmentation mechanism. The paper concludes with architectural implications for deploying scalable, compliant, and adaptive fraud detection pipelines in multi-domain financial systems.
Date:	2025–07–16
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:5yjm4_v1

Financial inclusion and large language models

By:	Ozili, Peterson K; Obiora, Kingsley I; Onuzo, Chinwendu
Abstract:	Large language models have gained popularity, and it is important to understand their applications in the financial inclusion domain. This study identifies the benefits and risks of using large language models (LLMs) in the financial inclusion domain. We show that LLMs can be used to (i) summarize the key themes in financial inclusion communications, (ii) gain insights from the tone of financial inclusion communications, (iii) bring discipline to financial inclusion communications, (iv) improve financial inclusion decision making, and (v) enhance context-sensitive text analysis and evaluation. However, the use of large language models in the financial inclusion domain poses risks relating to biased interpretations of LLM-generated responses, data privacy risk, misinformation and falsehood risks. We emphasize that LLMs can be used safely in the financial inclusion domain to summarise financial inclusion speeches and communication, but they should not be used in situations where finding the truth is important to make decisions that promote financial inclusion.
Keywords:	financial inclusion, large language models, LLM, algorithm, risk, benefit, communication, speech, artificial intelligence, digital financial inclusion
JEL:	G20 G21 G23
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125562

Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis

By:	Md Talha Mohsin
Abstract:	Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide variety of Financial Natural Language Processing (FinNLP) tasks. However, systematic comparisons among widely used LLMs remain underexplored. Given the rapid advancement and growing influence of LLMs in financial analysis, this study conducts a thorough comparative evaluation of five leading LLMs, GPT, Claude, Perplexity, Gemini and DeepSeek, using 10-K filings from the 'Magnificent Seven' technology companies. We create a set of domain-specific prompts and then use three methodologies to evaluate model performance: human annotation, automated lexical-semantic metrics (ROUGE, Cosine Similarity, Jaccard), and model behavior diagnostics (prompt-level variance and across-model similarity). The results show that GPT gives the most coherent, semantically aligned, and contextually relevant answers; followed by Claude and Perplexity. Gemini and DeepSeek, on the other hand, have more variability and less agreement. Also, the similarity and stability of outputs change from company to company and over time, showing that they are sensitive to how prompts are written and what source material is used.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22936

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

By:	Hoyoung Lee; Junhyuk Seo; Suhwan Park; Junhyeong Lee; Wonbin Ahn; Chanyeol Choi; Alejandro Lopez-Lira; Yongjae Lee
Abstract:	In finance, Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data. These conflicts become particularly problematic when LLMs are deployed in real-world investment services, where misalignment between a model's embedded preferences and those of the financial institution can lead to unreliable recommendations. Yet little research has examined what investment views LLMs actually hold. We propose an experimental framework to investigate such conflicts, offering the first quantitative analysis of confirmation bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract models' latent preferences and measure their persistence. Focusing on sector, size, and momentum, our analysis reveals distinct, model-specific tendencies. In particular, we observe a consistent preference for large-cap stocks and contrarian strategies across most models. These preferences often harden into confirmation bias, with models clinging to initial judgments despite counter-evidence.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20957

FinSurvival: A Suite of Large Scale Survival Modeling Tasks from Finance

By:	Aaron Green; Zihan Nie; Hanzhen Qin; Oshani Seneviratne; Kristin P. Bennett
Abstract:	Survival modeling predicts the time until an event occurs and is widely used in risk analysis; for example, it's used in medicine to predict the survival of a patient based on censored data. There is a need for large-scale, realistic, and freely available datasets for benchmarking artificial intelligence (AI) survival models. In this paper, we derive a suite of 16 survival modeling tasks from publicly available transaction data generated by lending of cryptocurrencies in Decentralized Finance (DeFi). Each task was constructed using an automated pipeline based on choices of index and outcome events. For example, the model predicts the time from when a user borrows cryptocurrency coins (index event) until their first repayment (outcome event). We formulate a survival benchmark consisting of a suite of 16 survival-time prediction tasks (FinSurvival). We also automatically create 16 corresponding classification problems for each task by thresholding the survival time using the restricted mean survival time. With over 7.5 million records, FinSurvival provides a suite of realistic financial modeling tasks that will spur future AI survival modeling research. Our evaluation indicated that these are challenging tasks that are not well addressed by existing methods. FinSurvival enables the evaluation of AI survival models applicable to traditional finance, industry, medicine, and commerce, which is currently hindered by the lack of large public datasets. Our benchmark demonstrates how AI models could assess opportunities and risks in DeFi. In the future, the FinSurvival benchmark pipeline can be used to create new benchmarks by incorporating more DeFi transactions and protocols as the use of cryptocurrency grows.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.14160

FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification

By:	Baptiste Lefort; Eric Benhamou; Beatrice Guez; Jean-Jacques Ohana; Ethan Setrouk; Alban Etienne
Abstract:	This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22932

AI-Powered Trading, Algorithmic Collusion, and Price Efficiency

By:	Winston Wei Dou; Itay Goldstein; Yan Ji
Abstract:	The integration of algorithmic trading with reinforcement learning, termed AI-powered trading, is transforming financial markets. Alongside the benefits, it raises concerns for collusion. This study first develops a model to explore the possibility of collusion among informed speculators in a theoretical environment. We then conduct simulation experiments, replacing the speculators in the model with informed AI speculators who trade based on reinforcement-learning algorithms. We show that they autonomously sustain collusive supra-competitive profits without agreement, communication, or intent. Such collusion undermines competition and market efficiency. We demonstrate that two separate mechanisms are underlying this collusion and characterize when each one arises.
JEL:	D43 G10 G14 L13
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:34054

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

By:	Wei Lu; Daniel L. Chen; Christian B. Hansen
Abstract:	Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20796

Efficacy of AI RAG Tools for Complex Information Extraction and Data Annotation Tasks: A Case Study Using Banks Public Disclosures

By:	Nicholas Botti (Federal Reserve Board); Flora Haberkorn (Federal Reserve Board); Charlotte Hoopes (Federal Reserve Board); Shaun Khan (Federal Reserve Board)
Abstract:	We utilize a within-subjects design with randomized task assignments to understand the effectiveness of using an AI retrieval augmented generation (RAG) tool to assist analysts with an information extraction and data annotation task. We replicate an existing, challenging real-world annotation task with complex multi-part criteria on a set of thousands of pages of public disclosure documents from global systemically important banks (GSIBs) with heterogeneous and incomplete information content. We test two treatment conditions. First, a "naive" AI use condition in which annotators use only the tool and must accept the first answer they are given. And second, an "interactive" AI treatment condition where annotators use the tool interactively, and use their judgement to follow-up with additional information if necessary. Compared to the human-only baseline, the use of the AI tool accelerated task execution by up to a factor of 10 and enhanced task accuracy, particularly in the interactive condition. We find that when extrapolated to the full task, these methods could save up to 268 hours compared to the human-only approach. Additionally, our findings suggest that annotator skill, not just with the subject matter domain, but also with AI tools, is a factor in both the accuracy and speed of task performance.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.21360

Defining Current and Expected Financial Constraints Using AI: Reinterpreting the Cash Flow Sensitivity of Cash

By:	Rachel Cho; Christoph Görtz; Danny McGowan; Max Schröder
Abstract:	We propose a new approach to identify firm-level financial constraints by applying artificial intelligence to text of 10-K filings by U.S. public firms from 1993 to 2021. Leveraging transformer-based natural language processing, our model captures contextual and semantic nuances often missed by traditional text classification techniques, enabling more accurate detection of financial constraints. A key contribution is to differentiate between constraints that affect firms presently and those anticipated in the future. These two types of constraints are associated with distinctly different financial profiles: while firms expecting future constraints tend to accumulate cash preemptively, currently constrained firms exhibit reduced liquidity and higher leverage. We show that only firms anticipating financial constraints exhibit significant cash flow sensitivity of cash, whereas currently constrained and unconstrained firms do not. This calls for a narrower interpretation of this widely used cash-based constraints measure, as it may conflate distinct firm types – unconstrained and currently constrained – and fail to capture all financially constrained firms. Our findings underscore the critical role of constraint timing in shaping corporate financial behavior.
Keywords:	financial constraints, artificial intelligence, expectations, cash, cash flow, corporate finance behavior
JEL:	G31 G32 D92
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_12054

Financial Regulation and AI: A Faustian Bargain?

By:	Coppola, Antonio; Clayton, Christopher
Abstract:	We examine whether and how granular, real-time predictive models should be integrated into central banks' macroprudential toolkit. First, we develop a tractable framework that formalizes the tradeoff regulators face when choosing between implementing models that forecast systemic risk accurately but have uncertain causal content and models with the opposite profile. We derive the regulator’s optimal policy in a setting in which private portfolios react endogenously to the regulator's model choice and policy rule. We show that even purely predictive models can generate welfare gains for a regulator, and that predictive precision and knowledge of causal impacts of policy interventions are complementary. Second, we introduce a deep learning architecture tailored to financial holdings data—a graph transformer—and we discuss why it is optimally suited to this problem. The model learns vector embedding representations for both assets and investors by explicitly modeling the relational structure of holdings, and it attains state-of-the-art predictive accuracy in out-of-sample forecasting tasks including trade prediction.
Date:	2025–07–25
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:xwsje_v1

ContestTrade: A Multi-Agent Trading System Based on Internal Contest Mechanism

By:	Li Zhao; Rui Sun; Zuoyou Jiang; Bo Yang; Yuxiao Bai; Mengting Chen; Xinyang Wang; Jing Li; Zuo Bai
Abstract:	In financial trading, large language model (LLM)-based agents demonstrate significant potential. However, the high sensitivity to market noise undermines the performance of LLM-based trading systems. To address this limitation, we propose a novel multi-agent system featuring an internal competitive mechanism inspired by modern corporate management structures. The system consists of two specialized teams: (1) Data Team - responsible for processing and condensing massive market data into diversified text factors, ensuring they fit the model's constrained context. (2) Research Team - tasked with making parallelized multipath trading decisions based on deep research methods. The core innovation lies in implementing a real-time evaluation and ranking mechanism within each team, driven by authentic market feedback. Each agent's performance undergoes continuous scoring and ranking, with only outputs from top-performing agents being adopted. The design enables the system to adaptively adjust to dynamic environment, enhances robustness against market noise and ultimately delivers superior trading performance. Experimental results demonstrate that our proposed system significantly outperforms prevailing multiagent systems and traditional quantitative investment methods across diverse evaluation metrics.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.00554

This nep-ain issue is ©2025 by Ben Greiner. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.