nep-ain 2025-05-19 papers

on Artificial Intelligence

Issue of 2025–05–19
29 papers chosen by
Ben Greiner, Wirtschaftsuniversität Wien

Human aversion? Do AI Agents Judge Identity More Harshly Than Performance By Yuanjun Feng; Vivek Chodhary; Yash Raj Shrestha
Who is More Bayesian: Humans or ChatGPT? By Tianshi Mu; Pranjal Rawat; John Rust; Chengjun Zhang; Qixuan Zhong
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation By Ji Ma
From G-Factor to A-Factor: Establishing a Psychometric Framework for AI Literacy By Ning Li; Wenming Deng; Jiatan Chen
Averting a Robot Catastrophe: Preparing for Converging Trends in Robotics and Frontier AI By Vermeer, Michael J. D.; Bonds, Tim; Lathrop, Emily; Smith, Gregory
The Rise of Industrial AI in America: Microfoundations of the Productivity J-curve(s) By Kristina McElheran; Mu-Jeung Yang; Zachary Kroff; Erik Brynjolfsson
Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming By Andrey Fradkin
The Paradox of Professional Input: How Expert Collaboration with AI Systems Shapes Their Future Value By Venkat Ram Reddy Ganuthula; Krishna Kumar Balaraman
The role of human capital for AI adoption: Evidence from French firms By Luca Fontanelli; Flavio Calvino; Chiara Criscuolo; Lionel Nesta; Elena Verdolini
Divergent LLM Adoption and Heterogeneous Convergence Paths in Research Writing By Cong William Lin; Wu Zhu
Generative AI’s Impact on Student Achievement and Implications for Worker Productivity By Naomi Hausman; Oren Rigbi; Sarit Weisburd
AI and the Extended Workday: Productivity, Contracting Efficiency, and Distribution of Rents By Wei Jiang; Junyoung Park; Rachel (Jiqiu) Xiao; Shen Zhang
Shifting Work Patterns with Generative AI By Eleanor Wiske Dillon; Sonia Jaffe; Nicole Immorlica; Christopher T. Stanton
Early Impacts of M365 Copilot By Eleanor Wiske Dillon; Sonia Jaffe; Sida Peng; Alexia Cambon
Advancing Job Design through Artificial Intelligence: Bibliometric Data-based Insights and Suggestions for Future Research By Ljupcho Eftimov; Bojan Kitanovikj
The Backfiring Effect of Weak AI Safety Regulation By Benjamin Laufer; Jon Kleinberg; Hoda Heidari
Artificial Intelligence and the Dual Paradoxes: Examining the Interplay of Efficiency, Resource Consumption, and Labor Dynamics By Mfon Akpan; Adeyemi Adebayo
Pricing AI Model Accuracy By Nikhil Kumar
Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations By Alejandro Lopez-Lira
Unleashing the power of text for credit default prediction: Comparing human-written and generative AI-refined texts By Zongxiao Wu; Yizhe Dong; Yaoyiran Li; Baofeng Shi
Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks By Julian Junyan Wang; Victor Xiaoqi Wang
Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model By Youngbin Lee; Yejin Kim; Suin Kim; Yongjae Lee
Agentic Workflows for Economic Research: Design and Implementation By Herbert Dawid; Philipp Harting; Hankui Wang; Zhongli Wang; Jiachen Yi
LLM-powered Topic Modeling for Discovering Public Mental Health Trends in Social Media By Zhao, Chuqing; Chen, Yisong
Trust, but verify By Michael J. Yuan; Carlos Campoy; Sydney Lai; James Snewin; Ju Long
Beyond the AI Divide : A Simple Approach to Identifying Global and Local Overperformers in AI Preparedness By Pierre Jean-Claude Mandon
Unequal Impacts of AI on Colombia's Labor Market: An Analysis of AI Exposure, Wages, and Job Dynamics By García-Suaza, Andrés; Sarango Iturralde, Alexander; Caiza-Guamán, Pamela; Gil Díaz, Mateo; Acosta Castillo, Dana
AI Safety Should Prioritize the Future of Work By Sanchaita Hazra; Bodhisattwa Prasad Majumder; Tuhin Chakrabarty
EthosGPT: Mapping Human Value Diversity to Advance Sustainable Development Goals (SDGs) By Luyao Zhang

Human aversion? Do AI Agents Judge Identity More Harshly Than Performance

By:	Yuanjun Feng; Vivek Chodhary; Yash Raj Shrestha
Abstract:	This study examines the understudied role of algorithmic evaluation of human judgment in hybrid decision-making systems, a critical gap in management research. While extant literature focuses on human reluctance to follow algorithmic advice, we reverse the perspective by investigating how AI agents based on large language models (LLMs) assess and integrate human input. Our work addresses a pressing managerial constraint: firms barred from deploying LLMs directly due to privacy concerns can still leverage them as mediating tools (for instance, anonymized outputs or decision pipelines) to guide high-stakes choices like pricing or discounts without exposing proprietary data. Through a controlled prediction task, we analyze how an LLM-based AI agent weights human versus algorithmic predictions. We find that the AI system systematically discounts human advice, penalizing human errors more severely than algorithmic errors--a bias exacerbated when the agent's identity (human vs AI) is disclosed and the human is positioned second. These results reveal a disconnect between AI-generated trust metrics and the actual influence of human judgment, challenging assumptions about equitable human-AI collaboration. Our findings offer three key contributions. First, we identify a reverse algorithm aversion phenomenon, where AI agents undervalue human input despite comparable error rates. Second, we demonstrate how disclosure and positional bias interact to amplify this effect, with implications for system design. Third, we provide a framework for indirect LLM deployment that balances predictive power with data privacy. For practitioners, this research emphasize the need to audit AI weighting mechanisms, calibrate trust dynamics, and strategically design decision sequences in human-AI systems.
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.13871

Who is More Bayesian: Humans or ChatGPT?

By:	Tianshi Mu; Pranjal Rawat; John Rust; Chengjun Zhang; Qixuan Zhong
Abstract:	We compare the performance of human and artificially intelligent (AI) decision makers in simple binary classification tasks where the optimal decision rule is given by Bayes Rule. We reanalyze choices of human subjects gathered from laboratory experiments conducted by El-Gamal and Grether and Holt and Smith. We confirm that while overall, Bayes Rule represents the single best model for predicting human choices, subjects are heterogeneous and a significant share of them make suboptimal choices that reflect judgement biases described by Kahneman and Tversky that include the ``representativeness heuristic'' (excessive weight on the evidence from the sample relative to the prior) and ``conservatism'' (excessive weight on the prior relative to the sample). We compare the performance of AI subjects gathered from recent versions of large language models (LLMs) including several versions of ChatGPT. These general-purpose generative AI chatbots are not specifically trained to do well in narrow decision making tasks, but are trained instead as ``language predictors'' using a large corpus of textual data from the web. We show that ChatGPT is also subject to biases that result in suboptimal decisions. However we document a rapid evolution in the performance of ChatGPT from sub-human performance for early versions (ChatGPT 3.5) to superhuman and nearly perfect Bayesian classifications in the latest versions (ChatGPT 4o).
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.10636

Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation

By:	Ji Ma
Abstract:	Large language models (LLMs) increasingly serve as human-like decision-making agents in social science and applied settings. These LLM-agents are typically assigned human-like characters and placed in real-life contexts. However, how these characters and contexts shape an LLM's behavior remains underexplored. This study proposes and tests methods for probing, quantifying, and modifying an LLM's internal representations in a Dictator Game -- a classic behavioral experiment on fairness and prosocial behavior. We extract ``vectors of variable variations'' (e.g., ``male'' to ``female'') from the LLM's internal state. Manipulating these vectors during the model's inference can substantially alter how those variables relate to the model's decision-making. This approach offers a principled way to study and regulate how social concepts can be encoded and engineered within transformer-based models, with implications for alignment, debiasing, and designing AI agents for social simulations in both academic and commercial applications.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.11671

From G-Factor to A-Factor: Establishing a Psychometric Framework for AI Literacy

By:	Ning Li; Wenming Deng; Jiatan Chen
Abstract:	This research addresses the growing need to measure and understand AI literacy in the context of generative AI technologies. Through three sequential studies involving a total of 517 participants, we establish AI literacy as a coherent, measurable construct with significant implications for education, workforce development, and social equity. Study 1 (N=85) revealed a dominant latent factor - termed the "A-factor" - that accounts for 44.16% of variance across diverse AI interaction tasks. Study 2 (N=286) refined the measurement tool by examining four key dimensions of AI literacy: communication effectiveness, creative idea generation, content evaluation, and step-by-step collaboration, resulting in an 18-item assessment battery. Study 3 (N=146) validated this instrument in a controlled laboratory setting, demonstrating its predictive validity for real-world task performance. Results indicate that AI literacy significantly predicts performance on complex, language-based creative tasks but shows domain specificity in its predictive power. Additionally, regression analyses identified several significant predictors of AI literacy, including cognitive abilities (IQ), educational background, prior AI experience, and training history. The multidimensional nature of AI literacy and its distinct factor structure provide evidence that effective human-AI collaboration requires a combination of general and specialized abilities. These findings contribute to theoretical frameworks of human-AI collaboration while offering practical guidance for developing targeted educational interventions to promote equitable access to the benefits of generative AI technologies.
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2503.16517

Averting a Robot Catastrophe: Preparing for Converging Trends in Robotics and Frontier AI

By:	Vermeer, Michael J. D.; Bonds, Tim; Lathrop, Emily; Smith, Gregory
Abstract:	This working paper assesses the convergence of trends in robotics and frontier artificial intelligence (AI) systems, particularly the national security risk that results from the potential for the proliferation of robotic embodiments of artificial general intelligence (AGI). This paper highlights that while the benefits of advanced robotic capabilities are likely to outweigh the associated risks in many contexts, the combination of AGI with robots featuring high mobility and dexterous manipulation could introduce significant systemic vulnerabilities. Policymakers face challenges in balancing the need for safety and security with economic competitiveness, as there are no straightforward regulatory options that effectively limit risky combinations of capabilities without hindering innovation. We conclude by stressing the urgency of proactively addressing this issue now rather than waiting until the technologies are fully deployed, to ensure responsible governance and risk management in the evolving landscape of robotics and AI.
Date:	2025–04–25
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:ymvf5_v1

The Rise of Industrial AI in America: Microfoundations of the Productivity J-curve(s)

By:	Kristina McElheran; Mu-Jeung Yang; Zachary Kroff; Erik Brynjolfsson
Abstract:	We examine the prevalence and productivity dynamics of artificial intelligence (AI) in American manufacturing. Working with the Census Bureau to collect detailed large-scale data for 2017 and 2021, we focus on AI-related technologies with industrial applications. We find causal evidence of J-curve-shaped returns, where short-term performance losses precede longer-term gains. Consistent with costly adjustment taking place within core production processes, industrial AI use increases work-in-progress inventory, investment in industrial robots, and labor shedding, while harming productivity and profitability in the short run. These losses are unevenly distributed, concentrating among older businesses while being mitigated by growth-oriented business strategies and within-firm spillovers. Dynamics, however, matter: earlier (pre-2017) adopters exhibit stronger growth over time, conditional on survival. Notably, among older establishments, abandonment of structured production-management practices accounts for roughly one-third of these losses, revealing a specific channel through which intangible factors shape AI’s impact. Taken together, these results provide novel evidence on the microfoundations of technology J-curves, identifying mechanisms and illuminating how and why they differ across firm types. These findings extend our understanding of modern General Purpose Technologies, explaining why their economic impact—exemplified here by AI—may initially disappoint, particularly in contexts dominated by older, established firms.
Keywords:	Artificial Intelligence, General Purpose Technology, Manufacturing, Organizational Change, Productivity, Management Practices
JEL:	D24 O33 M11 L60
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:cen:wpaper:25-27

Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming

By:	Andrey Fradkin
Abstract:	This paper documents three stylized facts about the demand for Large Language Models (LLMs) using data from OpenRouter, a prominent LLM marketplace. First, new models experience rapid initial adoption that stabilizes within weeks. Second, model releases differ substantially in whether they primarily attract new users or substitute demand from competing models. Third, multihoming, using multiple models simultaneously, is common among apps. These findings suggest significant horizontal and vertical differentiation in the LLM market, implying opportunities for providers to maintain demand and pricing power despite rapid technological advances.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.15440

The Paradox of Professional Input: How Expert Collaboration with AI Systems Shapes Their Future Value

By:	Venkat Ram Reddy Ganuthula; Krishna Kumar Balaraman
Abstract:	This perspective paper examines a fundamental paradox in the relationship between professional expertise and artificial intelligence: as domain experts increasingly collaborate with AI systems by externalizing their implicit knowledge, they potentially accelerate the automation of their own expertise. Through analysis of multiple professional contexts, we identify emerging patterns in human-AI collaboration and propose frameworks for professionals to navigate this evolving landscape. Drawing on research in knowledge management, expertise studies, human-computer interaction, and labor economics, we develop a nuanced understanding of how professional value may be preserved and transformed in an era of increasingly capable AI systems. Our analysis suggests that while the externalization of tacit knowledge presents certain risks to traditional professional roles, it also creates opportunities for the evolution of expertise and the emergence of new forms of professional value. We conclude with implications for professional education, organizational design, and policy development that can help ensure the codification of expert knowledge enhances rather than diminishes the value of human expertise.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.12654

The role of human capital for AI adoption: Evidence from French firms

By:	Luca Fontanelli (GREDEG - Groupe de Recherche en Droit, Economie et Gestion - UNS - Université Nice Sophia Antipolis (1965 - 2019) - CNRS - Centre National de la Recherche Scientifique - UniCA - Université Côte d'Azur, SSSUP - Scuola Universitaria Superiore Sant'Anna = Sant'Anna School of Advanced Studies [Pisa]); Flavio Calvino (OCDE - Organisation de Coopération et de Développement Economiques = Organisation for Economic Co-operation and Development); Chiara Criscuolo; Lionel Nesta (GREDEG - Groupe de Recherche en Droit, Economie et Gestion - UNS - Université Nice Sophia Antipolis (1965 - 2019) - CNRS - Centre National de la Recherche Scientifique - UniCA - Université Côte d'Azur); Elena Verdolini (CMCC - Centro Euro-Mediterraneo per i Cambiamenti Climatici [Bologna])
Abstract:	We leverage a uniquely comprehensive combination of data sources to explore the enabling role of human capital in fostering the adoption of predictive AI systems in French firms. Using a causal estimation approach, we show that ICT engineers play a key role for AI adoption by firms. Our estimates indicate that raising the current average share of ICT engineers in firms not using AI (1.66%) to the level of AI users (6.7%) would increase their probability to adopt AI by 0.81 percentage pointsequivalent to an 8.43 percent growth. However, this would imply substantial investments to fill the existing gap in ICT human capital, amounting to around 450.000 additional ICT engineers. We also explore potential mechanisms, showing that the relevance of ICT engineers for predictive AI is driven by the innovative nature of its use, make-vs-buy choices, large availability of data, ICT and R&D intensity.
Keywords:	artificial intelligence, human capital, technological diffusion
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-05029748

Divergent LLM Adoption and Heterogeneous Convergence Paths in Research Writing

By:	Cong William Lin; Wu Zhu
Abstract:	Large Language Models (LLMs), such as ChatGPT, are reshaping content creation and academic writing. This study investigates the impact of AI-assisted generative revisions on research manuscripts, focusing on heterogeneous adoption patterns and their influence on writing convergence. Leveraging a dataset of over 627, 000 academic papers from arXiv, we develop a novel classification framework by fine-tuning prompt- and discipline-specific large language models to detect the style of ChatGPT-revised texts. Our findings reveal substantial disparities in LLM adoption across academic disciplines, gender, native language status, and career stage, alongside a rapid evolution in scholarly writing styles. Moreover, LLM usage enhances clarity, conciseness, and adherence to formal writing conventions, with improvements varying by revision type. Finally, a difference-in-differences analysis shows that while LLMs drive convergence in academic writing, early adopters, male researchers, non-native speakers, and junior scholars exhibit the most pronounced stylistic shifts, aligning their writing more closely with that of established researchers.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.13629

Generative AI’s Impact on Student Achievement and Implications for Worker Productivity

By:	Naomi Hausman; Oren Rigbi; Sarit Weisburd
Abstract:	Student use of Artificial Intelligence (AI) in higher education is reshaping learning and redefining the skills of future workers. Using student-course data from a top Israeli university, we examine the impact of generative AI tools on academic performance. Comparisons across more and less AI-compatible courses before and after ChatGPT’s introduction show that AI availability raises grades, especially for lower-performing students, and compresses the grade distribution, eroding the signal value of grades for employers. Evidence suggests gains in AI-specific human capital but possible losses in traditional human capital, highlighting benefits and costs AI may impose on future workforce productivity.
Keywords:	generative AI, student achievement, worker productivity, higher education, human capital.
JEL:	I23 J24 O33
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_11843

AI and the Extended Workday: Productivity, Contracting Efficiency, and Distribution of Rents

By:	Wei Jiang; Junyoung Park; Rachel (Jiqiu) Xiao; Shen Zhang
Abstract:	This study investigates how occupational AI exposure impacts employment at the intensive margin, i.e., the length of workdays and the allocation of time between work and leisure. Drawing on individual-level time diary data from 2004–2023, we find that higher AI exposure—whether stemming from the ChatGPT shock or broader AI evolution—is associated with longer work hours and reduced leisure time, primarily due to AI complementing human labor rather than replacing it. This effect is particularly pronounced in contexts where AI significantly enhances marginal productivity and monitoring efficiency. It is further amplified in competitive labor and product markets, where workers have limited bargaining power to retain the benefits of productivity gains, which are often captured by consumers or firms instead. The findings question the expectation that technological advancements alleviate human labor burdens, revealing instead a paradox where such progresses compromise work-life balance.
JEL:	G3 J2 O3
Date:	2025–02
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:33536

Shifting Work Patterns with Generative AI

By:	Eleanor Wiske Dillon; Sonia Jaffe; Nicole Immorlica; Christopher T. Stanton
Abstract:	We present evidence on how generative AI changes the work patterns of knowledge workers using data from a 6-month-long, cross-industry, randomized field experiment. Half of the 6, 000 workers in the study received access to a generative AI tool integrated into the applications they already used for emails, document creation, and meetings. We find that access to the AI tool during the first year of its release primarily impacted behaviors that could be changed independently and not behaviors that required coordination to change: workers who used the tool spent 3 fewer hours, or 25% less time on email each week (intent to treat estimate is 1.4 hours) and seemed to complete documents moderately faster, but did not significantly change time spent in meetings.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.11436

Early Impacts of M365 Copilot

By:	Eleanor Wiske Dillon; Sonia Jaffe; Sida Peng; Alexia Cambon
Abstract:	Advances in generative AI have rapidly expanded the potential of computers to perform or assist in a wide array of tasks traditionally performed by humans. We analyze a large, real-world randomized experiment of over 6, 000 workers at 56 firms to present some of the earliest evidence on how these technologies are changing the way knowledge workers do their jobs. We find substantial time savings on common core tasks across a wide range of industries and occupations: workers who make use of this technology spent half an hour less reading email each week and completed documents 12% faster. Despite the newness of the technology, nearly 40% of workers who were given access to the tool used it regularly in their work throughout the 6-month study.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.11443

Advancing Job Design through Artificial Intelligence: Bibliometric Data-based Insights and Suggestions for Future Research

By:	Ljupcho Eftimov (Ss. Cyril and Methodius University in Skopje, Faculty of Economics – Skopje); Bojan Kitanovikj (Ss. Cyril and Methodius University in Skopje, Faculty of Economics – Skopje)
Abstract:	As the digital transformation of businesses reshapes jobs to delegate tasks to technology, human resource professionals and managers find themselves at a crossroads when it comes to designing and redesigning jobs, especially under the influence of artificial intelligence (AI). Being an emerging topic, this article aims to synthesize the current state-of-the-art literature regarding the application of AI for job design purposes using a multi-technique bibliometric analysis followed by a literature review in compliance with the rigorous Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. The research presents the findings grounded in data from 67 Scopus-indexed publications, which was analyzed with a combination of descriptive bibliometric analysis, co-authorship, bibliographic coupling, and co-occurrence analysis, helping us identify past scientific directions as well as draft a future research agenda. As one of the first bibliometric analyses in the field, it contributes to the scientific discourse by revealing the core themes of the literature, including job characteristics impacted by AI and data-driven human resource (HR) practices, group-level AI integration in job design, AI-related job skills of the future of the workforce, human-AI trust and labor relations and the role of algorithmic human resource management (HRM) in job design. Further, we stress seven distinct pathways for future research.
Keywords:	Job design, Work design, Artificial intelligence, Bibliometric review
JEL:	M12
Date:	2024–12–15
URL:	https://d.repec.org/n?u=RePEc:aoh:conpro:2024:i:5:p:202-205

The Backfiring Effect of Weak AI Safety Regulation

By:	Benjamin Laufer; Jon Kleinberg; Hoda Heidari
Abstract:	Recent policy proposals aim to improve the safety of general-purpose AI, but there is little understanding of the efficacy of different regulatory approaches to AI safety. We present a strategic model that explores the interactions between the regulator, the general-purpose AI technology creators, and domain specialists--those who adapt the AI for specific applications. Our analysis examines how different regulatory measures, targeting different parts of the development chain, affect the outcome of the development process. In particular, we assume AI technology is described by two key attributes: safety and performance. The regulator first sets a minimum safety standard that applies to one or both players, with strict penalties for non-compliance. The general-purpose creator then develops the technology, establishing its initial safety and performance levels. Next, domain specialists refine the AI for their specific use cases, and the resulting revenue is distributed between the specialist and generalist through an ex-ante bargaining process. Our analysis of this game reveals two key insights: First, weak safety regulation imposed only on the domain specialists can backfire. While it might seem logical to regulate use cases (as opposed to the general-purpose technology), our analysis shows that weak regulations targeting domain specialists alone can unintentionally reduce safety. This effect persists across a wide range of settings. Second, in sharp contrast to the previous finding, we observe that stronger, well-placed regulation can in fact benefit all players subjected to it. When regulators impose appropriate safety standards on both AI creators and domain specialists, the regulation functions as a commitment mechanism, leading to safety and performance gains, surpassing what is achieved under no regulation or regulating one player only.
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2503.20848

Artificial Intelligence and the Dual Paradoxes: Examining the Interplay of Efficiency, Resource Consumption, and Labor Dynamics

By:	Mfon Akpan; Adeyemi Adebayo
Abstract:	Artificial Intelligence's (AI) rapid development and growth not only transformed industries but also fired up important debates about its impacts on employment, resource allocation, and the ethics involved in decision-making. It serves to understand how changes within an industry will be able to influence society with that change. Advancing AI technologies will create a dual paradox of efficiency, greater resource consumption, and displacement of traditional labor. In this context, we explore the impact of AI on energy consumption, human labor roles, and hybrid roles widespread human labor replacement. We used mixed methods involving qualitative and quantitative analyses of data identified from various sources. Findings suggest that AI increases energy consumption and has impacted human labor roles to a minimal extent, considering that its applicability is limited to some tasks that require human judgment. In this context, the
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.10503

Pricing AI Model Accuracy

By:	Nikhil Kumar
Abstract:	This paper examines the market for AI models in which firms compete to provide accurate model predictions and consumers exhibit heterogeneous preferences for model accuracy. We develop a consumer-firm duopoly model to analyze how competition affects firms' incentives to improve model accuracy. Each firm aims to minimize its model's error, but this choice can often be suboptimal. Counterintuitively, we find that in a competitive market, firms that improve overall accuracy do not necessarily improve their profits. Rather, each firm's optimal decision is to invest further on the error dimension where it has a competitive advantage. By decomposing model errors into false positive and false negative rates, firms can reduce errors in each dimension through investments. Firms are strictly better off investing on their superior dimension and strictly worse off with investments on their inferior dimension. Profitable investments adversely affect consumers but increase overall welfare.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.13375

Can Large Language Models Trade? Testing Financial Theories with LLM Agents in Market Simulations

By:	Alejandro Lopez-Lira
Abstract:	This paper presents a realistic simulated stock market where large language models (LLMs) act as heterogeneous competing trading agents. The open-source framework incorporates a persistent order book with market and limit orders, partial fills, dividends, and equilibrium clearing alongside agents with varied strategies, information sets, and endowments. Agents submit standardized decisions using structured outputs and function calls while expressing their reasoning in natural language. Three findings emerge: First, LLMs demonstrate consistent strategy adherence and can function as value investors, momentum traders, or market makers per their instructions. Second, market dynamics exhibit features of real financial markets, including price discovery, bubbles, underreaction, and strategic liquidity provision. Third, the framework enables analysis of LLMs' responses to varying market conditions, similar to partial dependence plots in machine-learning interpretability. The framework allows simulating financial theories without closed-form solutions, creating experimental designs that would be costly with human participants, and establishing how prompts can generate correlated behaviors affecting market stability.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.10789

Unleashing the power of text for credit default prediction: Comparing human-written and generative AI-refined texts

By:	Zongxiao Wu; Yizhe Dong; Yaoyiran Li; Baofeng Shi
Abstract:	This study explores the integration of a representative large language model, ChatGPT, into lending decision-making with a focus on credit default prediction. Specifically, we use ChatGPT to analyse and interpret loan assessments written by loan officers and generate refined versions of these texts. Our comparative analysis reveals significant differences between generative artificial intelligence (AI)-refined and human-written texts in terms of text length, semantic similarity, and linguistic representations. Using deep learning techniques, we show that incorporating unstructured text data, particularly ChatGPT-refined texts, alongside conventional structured data significantly enhances credit default predictions. Furthermore, we demonstrate how the contents of both human-written and ChatGPT-refined assessments contribute to the models' prediction and show that the effect of essential words is highly context-dependent. Moreover, we find that ChatGPT's analysis of borrower delinquency contributes the most to improving predictive accuracy. We also evaluate the business impact of the models based on human-written and ChatGPT-refined texts, and find that, in most cases, the latter yields higher profitability than the former. This study provides valuable insights into the transformative potential of generative AI in financial services.
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2503.18029

Assessing Consistency and Reproducibility in the Outputs of Large Language Models: Evidence Across Diverse Finance and Accounting Tasks

By:	Julian Junyan Wang; Victor Xiaoqi Wang
Abstract:	This study provides the first comprehensive assessment of consistency and reproducibility in Large Language Model (LLM) outputs in finance and accounting research. We evaluate how consistently LLMs produce outputs given identical inputs through extensive experimentation with 50 independent runs across five common tasks: classification, sentiment analysis, summarization, text generation, and prediction. Using three OpenAI models (GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), we generate over 3.4 million outputs from diverse financial source texts and data, covering MD&As, FOMC statements, finance news articles, earnings call transcripts, and financial statements. Our findings reveal substantial but task-dependent consistency, with binary classification and sentiment analysis achieving near-perfect reproducibility, while complex tasks show greater variability. More advanced models do not consistently demonstrate better consistency and reproducibility, with task-specific patterns emerging. LLMs significantly outperform expert human annotators in consistency and maintain high agreement even where human experts significantly disagree. We further find that simple aggregation strategies across 3-5 runs dramatically improve consistency. Simulation analysis reveals that despite measurable inconsistency in LLM outputs, downstream statistical inferences remain remarkably robust. These findings address concerns about what we term "G-hacking, " the selective reporting of favorable outcomes from multiple Generative AI runs, by demonstrating that such risks are relatively low for finance and accounting tasks.
Date:	2025–03
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2503.16974

Integrating LLM-Generated Views into Mean-Variance Optimization Using the Black-Litterman Model

By:	Youngbin Lee; Yejin Kim; Suin Kim; Yongjae Lee
Abstract:	Portfolio optimization faces challenges due to the sensitivity in traditional mean-variance models. The Black-Litterman model mitigates this by integrating investor views, but defining these views remains difficult. This study explores the integration of large language models (LLMs) generated views into portfolio optimization using the Black-Litterman framework. Our method leverages LLMs to estimate expected stock returns from historical prices and company metadata, incorporating uncertainty through the variance in predictions. We conduct a backtest of the LLM-optimized portfolios from June 2024 to February 2025, rebalancing biweekly using the previous two weeks of price data. As baselines, we compare against the S&P 500, an equal-weighted portfolio, and a traditional mean-variance optimized portfolio constructed using the same set of stocks. Empirical results suggest that different LLMs exhibit varying levels of predictive optimism and confidence stability, which impact portfolio performance. The source code and data are available at https://github.com/youngandbin/LLM-MVO-B LM.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.14345

Agentic Workflows for Economic Research: Design and Implementation

By:	Herbert Dawid; Philipp Harting; Hankui Wang; Zhongli Wang; Jiachen Yi
Abstract:	This paper introduces a methodology based on agentic workflows for economic research that leverages Large Language Models (LLMs) and multimodal AI to enhance research efficiency and reproducibility. Our approach features autonomous and iterative processes covering the entire research lifecycle--from ideation and literature review to economic modeling and data processing, empirical analysis and result interpretation--with strategic human oversight. The workflow architecture comprises specialized agents with clearly defined roles, structured inter-agent communication protocols, systematic error escalation pathways, and adaptive mechanisms that respond to changing research demand. Human-in-the-loop (HITL) checkpoints are strategically integrated to ensure methodological validity and ethical compliance. We demonstrate the practical implementation of our framework using Microsoft's open-source platform, AutoGen, presenting experimental examples that highlight both the current capabilities and future potential of agentic workflows in improving economic research.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.09736

LLM-powered Topic Modeling for Discovering Public Mental Health Trends in Social Media

By:	Zhao, Chuqing; Chen, Yisong
Abstract:	Online platforms such as Reddit have become significant spaces for public discussions on mental health, offering valuable insights into psychological distress and support-seeking behaviors. Large Language Models (LLMs) have emerged as powerful tools for analyzing these discussions, enabling the identification of mental health trends, crisis signals, and potential interventions. This work develops an LLM-based topic modeling framework tailored for domain-specific mental health discourse, uncovering latent themes within user-generated content. Additionally, an interactive and interpretable visualization system is designed to allow users to explore data at various levels of granularity, enhancing the understanding of mental health narratives. This approach aims to bridge the gap between large-scale AI analysis and human-centered interpretability, contributing to more effective and responsible mental health insights on social media.
Date:	2025–05–02
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:xbpts_v1

Trust, but verify

By:	Michael J. Yuan; Carlos Campoy; Sydney Lai; James Snewin; Ju Long
Abstract:	Decentralized AI agent networks, such as Gaia, allows individuals to run customized LLMs on their own computers and then provide services to the public. However, in order to maintain service quality, the network must verify that individual nodes are running their designated LLMs. In this paper, we demonstrate that in a cluster of mostly honest nodes, we can detect nodes that run unauthorized or incorrect LLM through social consensus of its peers. We will discuss the algorithm and experimental data from the Gaia network. We will also discuss the intersubjective validation system, implemented as an EigenLayer AVS to introduce financial incentives and penalties to encourage honest behavior from LLM nodes.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.13443

Beyond the AI Divide : A Simple Approach to Identifying Global and Local Overperformers in AI Preparedness

By:	Pierre Jean-Claude Mandon
Abstract:	This paper examines global disparities in artificial intelligence preparedness, using the 2023 Artificial Intelligence Preparedness Index developed by the International Monetary Fund alongside the multidimensional Economic Complexity Index. The proposed methodology identifies both global and local overperformers by comparing actual artificial intelligence readiness scores to predictions based on economic complexity, offering a comprehensive assessment of national artificial intelligence capabilities. The findings highlight the varying significance of regulation and ethics frameworks, digital infrastructure, as well as human capital and labor market development in driving artificial intelligence overperformance across different income levels. Through case studies, including Singapore, Northern Europe, Malaysia, Kazakhstan, Ghana, Rwanda, and emerging demographic giants like China and India, the analysis illustrates how even resource-constrained nations can achieve substantial artificial intelligence advancements through strategic investments and coherent policies. The study underscores the need for offering actionable insights to foster peer learning and knowledge-sharing among countries. It concludes with recommendations for improving artificial intelligence preparedness metrics and calls for future research to incorporate cognitive and cultural dimensions into readiness frameworks.
Date:	2025–02–24
URL:	https://d.repec.org/n?u=RePEc:wbk:wbrwps:11073

Unequal Impacts of AI on Colombia's Labor Market: An Analysis of AI Exposure, Wages, and Job Dynamics

By:	García-Suaza, Andrés; Sarango Iturralde, Alexander; Caiza-Guamán, Pamela; Gil Díaz, Mateo; Acosta Castillo, Dana
Abstract:	The rapid advancements in the domain of artificial intelligence (AI) have exerted a considerable influence on the labor market, thereby engendering alterations in the demand for specific skills and the structure of employment. This study aims to evaluate the extent of exposure to AI within the Colombian labor market and its relation with workforce characteristics and available job openings. To this end, we built a specific AI exposure index or Colombia based on skill demand in job posts. Our findings indicate that 33.8% of workers are highly exposed to AI, with variations observed depending on the measurement method employed. Furthermore, it is revealed a positive and significant correlation between AI exposure and wages, i.e., highly exposed to AI earn a wage premium of 21.8%. On the demand side, only 2.5% of job openings explicitly mention AI-related skills. These findings imply that international indices may underestimate the wage premium associated with AI exposure in Colombia and underscore the potential unequal effects on wages distribution among different demographic groups.
Keywords:	Artificial intelligence, labor market, job posts, occupations, skills, Colombia
JEL:	E24 J23 J24 O33
Date:	2025–05
URL:	https://d.repec.org/n?u=RePEc:rie:riecdt:113

AI Safety Should Prioritize the Future of Work

By:	Sanchaita Hazra; Bodhisattwa Prasad Majumder; Tuhin Chakrabarty
Abstract:	Current efforts in AI safety prioritize filtering harmful content, preventing manipulation of human behavior, and eliminating existential risks in cybersecurity or biosecurity. While pressing, this narrow focus overlooks critical human-centric considerations that shape the long-term trajectory of a society. In this position paper, we identify the risks of overlooking the impact of AI on the future of work and recommend comprehensive transition support towards the evolution of meaningful labor with human agency. Through the lens of economic theories, we highlight the intertemporal impacts of AI on human livelihood and the structural changes in labor markets that exacerbate income inequality. Additionally, the closed-source approach of major stakeholders in AI development resembles rent-seeking behavior through exploiting resources, breeding mediocrity in creative labor, and monopolizing innovation. To address this, we argue in favor of a robust international copyright anatomy supported by implementing collective licensing that ensures fair compensation mechanisms for using data to train AI models. We strongly recommend a pro-worker framework of global AI governance to enhance shared prosperity and economic justice while reducing technical debt.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.13959

EthosGPT: Mapping Human Value Diversity to Advance Sustainable Development Goals (SDGs)

By:	Luyao Zhang
Abstract:	Large language models (LLMs) are transforming global decision-making and societal systems by processing diverse data at unprecedented scales. However, their potential to homogenize human values poses critical risks, similar to biodiversity loss undermining ecological resilience. Rooted in the ancient Greek concept of ethos, meaning both individual character and the shared moral fabric of communities, EthosGPT draws on a tradition that spans from Aristotle's virtue ethics to Adam Smith's moral sentiments as the ethical foundation of economic cooperation. These traditions underscore the vital role of value diversity in fostering social trust, institutional legitimacy, and long-term prosperity. EthosGPT addresses the challenge of value homogenization by introducing an open-source framework for mapping and evaluating LLMs within a global scale of human values. Using international survey data on cultural indices, prompt-based assessments, and comparative statistical analyses, EthosGPT reveals both the adaptability and biases of LLMs across regions and cultures. It offers actionable insights for developing inclusive LLMs, such as diversifying training data and preserving endangered cultural heritage to ensure representation in AI systems. These contributions align with the United Nations Sustainable Development Goals (SDGs), especially SDG 10 (Reduced Inequalities), SDG 11.4 (Cultural Heritage Preservation), and SDG 16 (Peace, Justice and Strong Institutions). Through interdisciplinary collaboration, EthosGPT promotes AI systems that are both technically robust and ethically inclusive, advancing value plurality as a cornerstone for sustainable and equitable futures.
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2504.09861

This nep-ain issue is ©2025 by Ben Greiner. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.