nep-ain New Economics Papers
on Artificial Intelligence
Issue of 2025–06–16
sixteen papers chosen by
Ben Greiner, Wirtschaftsuniversität Wien


  1. My Advisor, Her AI and Me: Evidence from a Field Experiment on Human-AI Collaboration and Investment Decisions By Cathy; Yang; Kevin Bauer; Xitong Li; Oliver Hinz
  2. Twin-2K-500: A dataset for building digital twins of over 2, 000 people based on their answers to over 500 questions By Olivier Toubia; George Z. Gui; Tianyi Peng; Daniel J. Merlau; Ang Li; Haozhe Chen
  3. Simulating Macroeconomic Expectations using LLM Agents By Jianhao Lin; Lexuan Sun; Yixin Yan
  4. Document Valuation in LLM Summaries: A Cluster Shapley Approach By Zikun Ye; Hema Yoganarasimhan
  5. Capability Inversion: The Turing Test Meets Information Design By Joshua S. Gans
  6. An AI Capability Threshold for Rent-Funded Universal Basic Income in an AI-Automated Economy By Aran Nayebi
  7. Generative AI and Organizational Structure in the Knowledge Economy By Fasheng Xu; Jing Hou; Wei Chen; Karen Xie
  8. Steering Technological Progress By Anton Korinek; Joseph Stiglitz
  9. Artificial Intelligence and Technological Unemployment By Ping Wang; Tsz-Nga Wong
  10. Driving AI Adoption in the EU: A Quantitative Analysis of Macroeconomic Influences By Drago, Carlo; Costantiello, Alberto; Savorgnan, Marco; Leogrande, Angelo
  11. AI Policies towards the AGI Challenge: An International Assessment By Phoebe Koundouri; Fivos Papadimitriou; Georgios Feretzakis; Theodoros Daglis; Vera Alexandropoulou
  12. A Mathematical Framework for AI-Human Integration in Work By Elisa Celis; Lingxiao Huang; Nisheeth K. Vishnoi
  13. Beyond the Black Box: Interpretability of LLMs in Finance By Hariom Tatsat; Ariye Shater
  14. Words That Unite The World: A Unified Framework for Deciphering Central Bank Communications Globally By Agam Shah; Siddhant Sukhani; Huzaifa Pardawala; Saketh Budideti; Riya Bhadani; Rudra Gopal; Siddhartha Somani; Michael Galarnyk; Soungmin Lee; Arnav Hiray; Akshar Ravichandran; Eric Kim; Pranav Aluru; Joshua Zhang; Sebastian Jaskowski; Veer Guda; Meghaj Tarte; Liqin Ye; Spencer Gosden; Rutwik Routu; Rachel Yuh; Sloka Chava; Sahasra Chava; Dylan Patrick Kelly; Aiden Chiang; Harsit Mittal; Sudheer Chava
  15. Learning to Regulate: A New Event-Level Dataset of Capital Control Measures By Geyue Sun; Xiao Liu; Tomas Williams; Roberto Samaniego
  16. FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making By Jiaxiang Chen; Mingxi Zou; Zhuo Wang; Qifan Wang; Dongning Sun; Chi Zhang; Zenglin Xu

  1. By: Cathy (Liu); Yang; Kevin Bauer; Xitong Li; Oliver Hinz
    Abstract: Amid ongoing policy and managerial debates on keeping humans in the loop of AI decision-making, we investigate whether human involvement in AI-based service production benefits downstream consumers. Partnering with a large savings bank in Europe, we produced pure AI and human-AI collaborative investment advice, passed it to customers, and examined their advice-taking in a field experiment. On the production side, contrary to concerns that humans might inefficiently override AI output, we find that giving a human banker the final say over AI-generated financial advice does not compromise its quality. More importantly, on the consumption side, customers are more likely to follow investment advice from the human-AI collaboration compared to pure AI, especially when facing riskier decisions. In our setting, this increased reliance leads to higher material welfare for consumers. Additional analyses from the field experiment and an online experiment show that the persuasive power of human-AI advice cannot be explained by consumers' beliefs about enhanced advice quality due to human-AI complementarities. Instead, the benefit stems from human involvement acting as a peripheral cue that increases the advice's affective appeal. Our findings suggest that regulations and guidelines should adopt a consumer-centered approach by fostering service environments in which humans and AI systems can collaborate to improve consumer outcomes. These insights are relevant for managers designing AI-based services and for policymakers advocating for human oversight in AI systems.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.03707
  2. By: Olivier Toubia; George Z. Gui; Tianyi Peng; Daniel J. Merlau; Ang Li; Haozhe Chen
    Abstract: LLM-based digital twin simulation, where large language models are used to emulate individual human behavior, holds great promise for research in AI, social science, and digital experimentation. However, progress in this area has been hindered by the scarcity of real, individual-level datasets that are both large and publicly available. This lack of high-quality ground truth limits both the development and validation of digital twin methodologies. To address this gap, we introduce a large-scale, public dataset designed to capture a rich and holistic view of individual human behavior. We survey a representative sample of $N = 2, 058$ participants (average 2.42 hours per person) in the US across four waves with 500 questions in total, covering a comprehensive battery of demographic, psychological, economic, personality, and cognitive measures, as well as replications of behavioral economics experiments and a pricing survey. The final wave repeats tasks from earlier waves to establish a test-retest accuracy baseline. Initial analyses suggest the data are of high quality and show promise for constructing digital twins that predict human behavior well at the individual and aggregate levels. By making the full dataset publicly available, we aim to establish a valuable testbed for the development and benchmarking of LLM-based persona simulations. Beyond LLM applications, due to its unique breadth and scale the dataset also enables broad social science research, including studies of cross-construct correlations and heterogeneous treatment effects.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.17479
  3. By: Jianhao Lin; Lexuan Sun; Yixin Yan
    Abstract: We introduce a novel framework for simulating macroeconomic expectation formation using Large Language Model-Empowered Agents (LLM Agents). By constructing thousands of LLM Agents equipped with modules for personal characteristics, prior expectations, and knowledge, we replicate a survey experiment involving households and experts on inflation and unemployment. Our results show that although the expectations and thoughts generated by LLM Agents are more homogeneous than those of human participants, they still effectively capture key heterogeneity across agents and the underlying drivers of expectation formation. Furthermore, a module-ablation exercise highlights the critical role of prior expectations in simulating such heterogeneity. This approach complements traditional survey methods and offers new insights into AI behavioral science in macroeconomic research.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.17648
  4. By: Zikun Ye; Hema Yoganarasimhan
    Abstract: Large Language Models (LLMs) are increasingly used in systems that retrieve and summarize content from multiple sources, such as search engines and AI assistants. While these models enhance user experience by generating coherent summaries, they obscure the contributions of original content creators, raising concerns about credit attribution and compensation. We address the challenge of valuing individual documents used in LLM-generated summaries. We propose using Shapley values, a game-theoretic method that allocates credit based on each document's marginal contribution. Although theoretically appealing, Shapley values are expensive to compute at scale. We therefore propose Cluster Shapley, an efficient approximation algorithm that leverages semantic similarity between documents. By clustering documents using LLM-based embeddings and computing Shapley values at the cluster level, our method significantly reduces computation while maintaining attribution quality. We demonstrate our approach to a summarization task using Amazon product reviews. Cluster Shapley significantly reduces computational complexity while maintaining high accuracy, outperforming baseline methods such as Monte Carlo sampling and Kernel SHAP with a better efficient frontier. Our approach is agnostic to the exact LLM used, the summarization process used, and the evaluation procedure, which makes it broadly applicable to a variety of summarization settings.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.23842
  5. By: Joshua S. Gans
    Abstract: This paper analyzes the design of tests to distinguish human from artificial intelligence through the lens of information design. We identify a fundamental asymmetry: while AI systems can strategically underperform to mimic human limitations, they cannot overperform beyond their capabilities. This leads to our main contribution—the concept of capability inversion domains, where AIs fail detection not through inferior performance, but by performing “suspiciously well” when they overestimate human capabilities. We show that if an AI significantly overestimates human ability in even one domain, it cannot reliably pass an optimally designed test. This insight reverses conventional intuition: effective tests should target not what humans do well, but the specific patterns of human imperfection that AIs systematically misunderstand. We identify structural sources of persistent misperception—including the difficulty of learning about failure from successful examples and fundamental differences in embodied experience—that make certain capability inversions exploitable for detection even as AI systems improve.
    JEL: C72 D82 D83
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:33893
  6. By: Aran Nayebi
    Abstract: We derive the first closed-form condition under which artificial intelligence (AI) capital profits could sustainably finance a universal basic income (UBI) without additional taxes or new job creation. In a Solow-Zeira economy characterized by a continuum of automatable tasks, a constant net saving rate $s$, and task-elasticity $\sigma
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.18687
  7. By: Fasheng Xu; Jing Hou; Wei Chen; Karen Xie
    Abstract: The adoption of GenAI is fundamentally reshaping organizations in the knowledge economy. GenAI can significantly enhance workers' problem-solving abilities and productivity, yet it also presents a major reliability challenge: hallucinations, or errors presented as plausible outputs. This study develops a theoretical model to examine GenAI's impact on organizational structure and the role of human-in-the-loop oversight. Our findings indicate that successful GenAI adoption hinges primarily on maintaining hallucination rates below a critical level. After adoption, as GenAI advances in capability or reliability, organizations optimize their workforce by reducing worker knowledge requirements while preserving operational effectiveness through GenAI augmentation-a phenomenon known as deskilling. Unexpectedly, enhanced capability or reliability of GenAI may actually narrow the span of control, increasing the demand for managers rather than flattening organizational hierarchies. To effectively mitigate hallucination risks, many firms implement human-in-the-loop validation, where managers review GenAI-enhanced outputs before implementation. While the validation increases managerial workload, it can, surprisingly, expand the span of control, reducing the number of managers needed. Furthermore, human-in-the-loop validation influences GenAI adoption differently based on validation costs and hallucination rates, deterring adoption in low-error, high-cost scenarios, while promoting it in high-error, low-cost cases. Finally, productivity improvements from GenAI yield distinctive organizational shifts: as productivity increases, firms tend to employ fewer but more knowledgeable workers, gradually expanding managerial spans of control.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.00532
  8. By: Anton Korinek (University of Virginia); Joseph Stiglitz (Columbia University)
    Abstract: Rapid progress in new technologies such as AI has led to widespread anxiety about adverse labor market impacts. This paper asks how to guide innovative efforts so as to increase labor demand and create better-paying jobs while also evaluating the limitations of such an approach. We develop a theoretical framework to identify the properties that make an innovation desirable from the perspective of workers, including its technological complementarity to labor, the factor share of labor in producing the goods involved, and the relative income of the affected workers. Applications include robot taxation, factor-augmenting progress, and task automation. We find that steering technology becomes more desirable the less efficient social safety nets are. If technological progress devalues labor, the desirability of steering is at first increased, but beyond a critical threshold, it becomes less effective, and policy should shift toward greater redistribution. If labor's economic value diminishes in the future, progress should increasingly focus on enhancing human well-being rather than labor productivity.
    Keywords: technological progress, AI, inequality redistribution
    JEL: E64 D63 O3
    Date: 2025–05–05
    URL: https://d.repec.org/n?u=RePEc:thk:wpaper:inetwp232
  9. By: Ping Wang; Tsz-Nga Wong
    Abstract: How large is the impact of artificial intelligence (AI) on labor productivity and unemployment? This paper introduces a labor-search model of technological unemployment, conceptualizing the generative aspect of AI as a learning-by-using technology. AI capability improves through machine learning from workers and in turn enhances their labor productivity, but eventually displaces workers if wage renegotiation fails. Three distinct equilibria emerge: no AI, some AI with higher unemployment, or unbounded AI with sustained endogenous growth and little impact on employment. By calibrating to the U.S. data, our model predicts more than threefold improvements in productivity in some-AI steady state, alongside a long-run employment loss of 23%, with half this loss occurring over the initial five-year transition. Plausible change in parameter values could lead to global and local indeterminacy. The mechanism highlights the considerable uncertainty of AI's impacts in the presence of labor-market frictions. In the unbounded-AI equilibrium, technological unemployment would not occur. We further show that equilibria are inefficient despite adherence to the Hosios condition. By improving job-finding rate and labor productivity, the optimal subsidy to jobs facing the replacement risk of AI can generate a welfare gain from 26.6% in the short run to over 50% in the long run.
    JEL: E2 J2 O30 O40
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:33867
  10. By: Drago, Carlo; Costantiello, Alberto; Savorgnan, Marco; Leogrande, Angelo
    Abstract: This article investigates macroeconomic factors that support the adoption of Artificial Intelligence (AI) technologies by large European Union (EU) enterprises. In this analysis, panel data regression is combined with machine learning to investigate how macroeconomic variables like health spending, domestic credit, exports, gross capital formation, and inflation, along with health spending and trade openness, influence the share of enterprises that adopt at least one type of AI technology (ALOAI). The results of the estimations—based on fixed and random effects models with 151 observations—show that health spending, inflation, and trade and GDP per capita have positively significant associations with adoption, with significant negative correlations visible with and among domestic credit, exports, and gross capital formation. In adjunct to this, the regression of machine learning models (KNN, Boosting, Random Forest) is benchmarked with MSE, RMSE, MAE, MAPE, and R² measures with KNN performing perfectly on all measures, although with some concerns regarding data overfitting. Furthermore, cluster analysis (Hierarchical, Density-Based, Neighborhood-Based) identifies hidden EU country groups with comparable macroeconomic variables and comparable ALOAI. Notably, those with characteristics of high integration in international trade, access to credit, and strong GDP per capita indicate large ALOAI levels, whereas those with macroeconomic volatility and under-investment in innovation trail behind. These findings suggest that securing the adoption of AI is not merely about finance and infrastructure but also about policy alignment and institutional preparedness. This work provides evidence-driven policy advice by presenting an integrated data-driven analytical framework to comprehend and manage AI diffusion within EU industry sectors.
    Keywords: Artificial Intelligence Adoption, Macroeconomic Indicators, Panel Data Regression, Machine Learning Models, EU Policy and Innovation.
    JEL: C23 C45 E22 L86 O33
    Date: 2025–06–08
    URL: https://d.repec.org/n?u=RePEc:pra:mprapa:124973
  11. By: Phoebe Koundouri; Fivos Papadimitriou; Georgios Feretzakis; Theodoros Daglis; Vera Alexandropoulou
    Abstract: This work examines AI policies and AI legislation following a mixed research method that entails qualitative and quantitative analyses of national and international AI policy official documents, in combination with scientometric analyses of the scientific production. As concerns the former, this research covers countries from all continents (Australia, Canada, China, India, Israel, Japan, Norway, Russia, South Africa, UK, USA) and the EU. As for the latter, the scientometric research was carried out at a global scale. According to the results, the countries do not share the same academic interest in this important matter, neither their formal AI policy documents cover the same AI-related issues with the same emphasis. This analysis leads to the identification of gaps and common elements among national policies (i.e. emphasis on risks, safety) that are of interest to researchers, policymakers, governments, institutions and stakeholders. While there are significant differences among priorities towards AI, among the key findings of this research are the following: a) the most important words in the AI policy documents that have been examined are "risks", "safety" and "ethics"; b) the emerging major issue of Artificial General Intelligence is not addressed in anyone of the official AI documents of the countries previously mentioned; c) there are significant differences in the geographical distributions of both the scientific production and the policy-making processes, with a handful of countries leading the way in both AI law and AGI. Yet, it is encouraging that the growth in the scientific literature about AI legislation grows faster than that related to AGI and so there is hope that countries and international institutions will be able to cope with the rise of AGI in terms of policy-making and legislation.
    Keywords: AI law, AI policy-making, National AI policies, AGI, Content analysis, Scientometric analysis
    Date: 2025–05–26
    URL: https://d.repec.org/n?u=RePEc:aue:wpaper:2535
  12. By: Elisa Celis; Lingxiao Huang; Nisheeth K. Vishnoi
    Abstract: The rapid rise of Generative AI (GenAI) tools has sparked debate over their role in complementing or replacing human workers across job contexts. We present a mathematical framework that models jobs, workers, and worker-job fit, introducing a novel decomposition of skills into decision-level and action-level subskills to reflect the complementary strengths of humans and GenAI. We analyze how changes in subskill abilities affect job success, identifying conditions for sharp transitions in success probability. We also establish sufficient conditions under which combining workers with complementary subskills significantly outperforms relying on a single worker. This explains phenomena such as productivity compression, where GenAI assistance yields larger gains for lower-skilled workers. We demonstrate the framework' s practicality using data from O*NET and Big-Bench Lite, aligning real-world data with our model via subskill-division methods. Our results highlight when and how GenAI complements human skills, rather than replacing them.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.23432
  13. By: Hariom Tatsat (Barclays); Ariye Shater (Barclays)
    Abstract: Large Language Models (LLMs) exhibit remarkable capabilities across a spectrum of tasks in financial services, including report generation, chatbots, sentiment analysis, regulatory compliance, investment advisory, financial knowledge retrieval, and summarization. However, their intrinsic complexity and lack of transparency pose significant challenges, especially in the highly regulated financial sector, where interpretability, fairness, and accountability are critical. As far as we are aware, this paper presents the first application in the finance domain of understanding and utilizing the inner workings of LLMs through mechanistic interpretability, addressing the pressing need for transparency and control in AI systems. Mechanistic interpretability is the most intuitive and transparent way to understand LLM behavior by reverse-engineering their internal workings. By dissecting the activations and circuits within these models, it provides insights into how specific features or components influence predictions - making it possible not only to observe but also to modify model behavior. In this paper, we explore the theoretical aspects of mechanistic interpretability and demonstrate its practical relevance through a range of financial use cases and experiments, including applications in trading strategies, sentiment analysis, bias, and hallucination detection. While not yet widely adopted, mechanistic interpretability is expected to become increasingly vital as adoption of LLMs increases. Advanced interpretability tools can ensure AI systems remain ethical, transparent, and aligned with evolving financial regulations. In this paper, we have put special emphasis on how these techniques can help unlock interpretability requirements for regulatory and compliance purposes - addressing both current needs and anticipating future expectations from financial regulators globally.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.24650
  14. By: Agam Shah; Siddhant Sukhani; Huzaifa Pardawala; Saketh Budideti; Riya Bhadani; Rudra Gopal; Siddhartha Somani; Michael Galarnyk; Soungmin Lee; Arnav Hiray; Akshar Ravichandran; Eric Kim; Pranav Aluru; Joshua Zhang; Sebastian Jaskowski; Veer Guda; Meghaj Tarte; Liqin Ye; Spencer Gosden; Rutwik Routu; Rachel Yuh; Sloka Chava; Sahasra Chava; Dylan Patrick Kelly; Aiden Chiang; Harsit Mittal; Sudheer Chava
    Abstract: Central banks around the world play a crucial role in maintaining economic stability. Deciphering policy implications in their communications is essential, especially as misinterpretations can disproportionately impact vulnerable populations. To address this, we introduce the World Central Banks (WCB) dataset, the most comprehensive monetary policy corpus to date, comprising over 380k sentences from 25 central banks across diverse geographic regions, spanning 28 years of historical data. After uniformly sampling 1k sentences per bank (25k total) across all available years, we annotate and review each sentence using dual annotators, disagreement resolutions, and secondary expert reviews. We define three tasks: Stance Detection, Temporal Classification, and Uncertainty Estimation, with each sentence annotated for all three. We benchmark seven Pretrained Language Models (PLMs) and nine Large Language Models (LLMs) (Zero-Shot, Few-Shot, and with annotation guide) on these tasks, running 15, 075 benchmarking experiments. We find that a model trained on aggregated data across banks significantly surpasses a model trained on an individual bank's data, confirming the principle "the whole is greater than the sum of its parts." Additionally, rigorous human evaluations, error analyses, and predictive tasks validate our framework's economic utility. Our artifacts are accessible through the HuggingFace and GitHub under the CC-BY-NC-SA 4.0 license.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.17048
  15. By: Geyue Sun; Xiao Liu; Tomas Williams; Roberto Samaniego
    Abstract: We construct a novel event-level Capital Control Measures (CCM) dataset covering 196 countries from 1999 to 2023 by leveraging prompt-based large language models (LLMs). The dataset enables event study analysis and cross-country comparisons based on rich policy attributes, including action type, intensity, direction, implementing entity, and other multidimensional characteristics. Using a two-step prompt framework with GPT-4.1, we extract structured information from the IMF's Annual Report on Exchange Arrangements and Exchange Restrictions (AREAER), resulting in 5, 198 capital control events with 27 annotated fields and corresponding model reasoning. Secondly, to facilitate real-time classification and extension to external sources, we fine-tune an open-source Meta Llama 3.1-8B model, named CCM-Llama, trained on AREAER change logs and final status reports. The model achieves 90.09\% accuracy in category classification and 99.55\% in status prediction. Finally, we apply the CCM dataset in an empirical application: an event study on China, Australia, and the US. The results show that inward capital control measures significantly reduce fund inflows within one month, and restrictive policies tend to have stronger effects than liberalizing ones, with notable heterogeneity across countries. Our work contributes to the growing literature on the use of LLMs in economics by providing both a novel high-frequency policy dataset and a replicable framework for automated classification of capital control events from diverse and evolving information sources.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.23025
  16. By: Jiaxiang Chen; Mingxi Zou; Zhuo Wang; Qifan Wang; Dongning Sun; Chi Zhang; Zenglin Xu
    Abstract: Financial decision-making presents unique challenges for language models, demanding temporal reasoning, adaptive risk assessment, and responsiveness to dynamic events. While large language models (LLMs) show strong general reasoning capabilities, they often fail to capture behavioral patterns central to human financial decisions-such as expert reliance under information asymmetry, loss-averse sensitivity, and feedback-driven temporal adjustment. We propose FinHEAR, a multi-agent framework for Human Expertise and Adaptive Risk-aware reasoning. FinHEAR orchestrates specialized LLM-based agents to analyze historical trends, interpret current events, and retrieve expert-informed precedents within an event-centric pipeline. Grounded in behavioral economics, it incorporates expert-guided retrieval, confidence-adjusted position sizing, and outcome-based refinement to enhance interpretability and robustness. Empirical results on curated financial datasets show that FinHEAR consistently outperforms strong baselines across trend prediction and trading tasks, achieving higher accuracy and better risk-adjusted returns.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.09080

This nep-ain issue is ©2025 by Ben Greiner. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.