nep-ain New Economics Papers
on Artificial Intelligence
Issue of 2025–06–30
eighteen papers chosen by
Ben Greiner, Wirtschaftsuniversität Wien


  1. When Experimental Economics Meets Large Language Models: Tactics with Evidence By Shu Wang; Zijun Yao; Shuhuai Zhang; Jianuo Gai; Tracy Xiao Liu; Songfa Zhong
  2. Can Generative AI agents behave like humans? Evidence from laboratory market experiments By R. Maria del Rio-Chanona; Marco Pangallo; Cars Hommes
  3. Large Language Models as 'Hidden Persuaders': Fake Product Reviews are Indistinguishable to Humans and Machines By Weiyao Meng; John Harvey; James Goulding; Chris James Carter; Evgeniya Lukinova; Andrew Smith; Paul Frobisher; Mina Forrest; Georgiana Nica-Avram
  4. Growth in AI Knowledge By Joshua S. Gans
  5. AI and Social Media: A Political Economy Perspective By Daron Acemoglu; Asuman Ozdaglar; James Siderius
  6. How different uses of AI shape labor demand: evidence from France By Aghion, Philippe; Bunel, Simon; Jaravel, Xavier; Mikaelsen, Thomas; Roulet, Alexandra; Søgaard, Jakob
  7. Evolving the Productivity Equation: Should Digital Labor Be Considered a New Factor of Production? By Alex Farach; Alexia Cambon; Jared Spataro
  8. Identifying economic narratives in large text corpora -- An integrated approach using Large Language Models By Tobias Schmidt; Kai-Robin Lange; Matthias Reccius; Henrik M\"uller; Michael Roos; Carsten Jentsch
  9. Revealing economic facts: LLMs know more than they say By Marcus Buckmann; Quynh Anh Nguyen; Edward Hill
  10. EconGym: A Scalable AI Testbed with Diverse Economic Tasks By Qirui Mi; Qipeng Yang; Zijun Fan; Wentian Fan; Heyang Ma; Chengdong Ma; Siyu Xia; Bo An; Jun Wang; Haifeng Zhang
  11. Towards Competent AI for Fundamental Analysis in Finance: A Benchmark Dataset and Evaluation By Zonghan Wu; Junlin Wang; Congyuan Zou; Chenhan Wang; Yilei Shao
  12. Can LLM-based Financial Investing Strategies Outperform the Market in Long Run? By Weixian Waylon Li; Hyeonjun Kim; Mihai Cucuringu; Tiejun Ma
  13. Explainable-AI powered stock price prediction using time series transformers: A Case Study on BIST100 By Sukru Selim Calik; Andac Akyuz; Zeynep Hilal Kilimci; Kerem Colak
  14. Financial literacy, robo-advising, and the demand for human financial advice: Evidence from Italy By David Aristei; Manuela Gallo
  15. From Text to Quantified Insights: A Large-Scale LLM Analysis of Central Bank Communication By Thiago Christiano Silva; Kei Moriya; Mr. Romain M Veyrune
  16. Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting By Timoth\'ee Hornek Amir Sartipi; Igor Tchappi; Gilbert Fridgen
  17. Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research By Giuseppe Arbia; Luca Morandini; Vincenzo Nardelli
  18. Improving text classification: logistic regression makes small LLMs strong and explainable ‘tens-of-shot’ classifiers By Buckmann , Marcus; Hill, Ed

  1. By: Shu Wang; Zijun Yao; Shuhuai Zhang; Jianuo Gai; Tracy Xiao Liu; Songfa Zhong
    Abstract: Advancements in large language models (LLMs) have sparked a growing interest in measuring and understanding their behavior through experimental economics. However, there is still a lack of established guidelines for designing economic experiments for LLMs. By combining principles from experimental economics with insights from LLM research in artificial intelligence, we outline and discuss eight practical tactics for conducting experiments with LLMs. We further perform two sets of experiments to demonstrate the significance of these tactics. Our study enhances the design, replicability, and generalizability of LLM experiments, and broadens the scope of experimental economics in the digital age.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.21371
  2. By: R. Maria del Rio-Chanona; Marco Pangallo; Cars Hommes
    Abstract: We explore the potential of Large Language Models (LLMs) to replicate human behavior in economic market experiments. Compared to previous studies, we focus on dynamic feedback between LLM agents: the decisions of each LLM impact the market price at the current step, and so affect the decisions of the other LLMs at the next step. We compare LLM behavior to market dynamics observed in laboratory settings and assess their alignment with human participants' behavior. Our findings indicate that LLMs do not adhere strictly to rational expectations, displaying instead bounded rationality, similarly to human participants. Providing a minimal context window i.e. memory of three previous time steps, combined with a high variability setting capturing response heterogeneity, allows LLMs to replicate broad trends seen in human experiments, such as the distinction between positive and negative feedback markets. However, differences remain at a granular level--LLMs exhibit less heterogeneity in behavior than humans. These results suggest that LLMs hold promise as tools for simulating realistic human behavior in economic contexts, though further research is needed to refine their accuracy and increase behavioral diversity.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.07457
  3. By: Weiyao Meng; John Harvey; James Goulding; Chris James Carter; Evgeniya Lukinova; Andrew Smith; Paul Frobisher; Mina Forrest; Georgiana Nica-Avram
    Abstract: Reading and evaluating product reviews is central to how most people decide what to buy and consume online. However, the recent emergence of Large Language Models and Generative Artificial Intelligence now means writing fraudulent or fake reviews is potentially easier than ever. Through three studies we demonstrate that (1) humans are no longer able to distinguish between real and fake product reviews generated by machines, averaging only 50.8% accuracy overall - essentially the same that would be expected by chance alone; (2) that LLMs are likewise unable to distinguish between fake and real reviews and perform equivalently bad or even worse than humans; and (3) that humans and LLMs pursue different strategies for evaluating authenticity which lead to equivalently bad accuracy, but different precision, recall and F1 scores - indicating they perform worse at different aspects of judgment. The results reveal that review systems everywhere are now susceptible to mechanised fraud if they do not depend on trustworthy purchase verification to guarantee the authenticity of reviewers. Furthermore, the results provide insight into the consumer psychology of how humans judge authenticity, demonstrating there is an inherent 'scepticism bias' towards positive reviews and a special vulnerability to misjudge the authenticity of fake negative reviews. Additionally, results provide a first insight into the 'machine psychology' of judging fake reviews, revealing that the strategies LLMs take to evaluate authenticity radically differ from humans, in ways that are equally wrong in terms of accuracy, but different in their misjudgments.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.13313
  4. By: Joshua S. Gans
    Abstract: Building on recent advances in the literature on knowledge creation and innovation (notably Carnehl and Schneider (2025), we propose a novel general equilibrium model that explicitly incorporates artificial intelligence (AI) as a decision-enhancing technology capable of interpolating between known points of knowledge. Our framework formalises the trade-off between AI’s coverage— its ability to span wider knowledge gaps—and its accuracy, and reveals the surprising result that, beyond producing immediate productivity gains, AI fundamentally alters the novelty of research. Specifically, when AI systems offer sufficiently broad coverage, they incentivise exploratory research that taps into novel, distant areas of knowledge and accelerates long-run growth; conversely, limited coverage promotes incremental research that may boost short-term efficiency while dampening the overall advancement of new ideas. Moreover, our analysis uncovers that the type of knowledge—whether novel or dense—plays a critical role in determining both the growth and welfare implications of AI, charting a new path for understanding how knowledge influences research strategies. By also examining the roles of market structure, licensing arrangements, and regulatory frameworks, our work contributes new, policy-relevant insights that reconcile the immediate benefits of AI adoption with the demands of sustainable long-term economic expansion.
    JEL: O30 O31 O40
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:33907
  5. By: Daron Acemoglu; Asuman Ozdaglar; James Siderius
    Abstract: We consider the political consequences of the use of artificial intelligence (AI) by online platforms engaged in social media content dissemination, entertainment, or electronic commerce. We identify two distinct but complementary mechanisms, the social media channel and the digital ads channel, which together and separately contribute to the polarization of voters and consequently the polarization of parties. First, AI-driven recommendations aimed at maximizing user engagement on platforms create echo chambers (or “filter bubbles”) that increase the likelihood that individuals are not confronted with counter-attitudinal content. Consequently, social media engagement makes voters more polarized, and then parties respond by becoming more polarized themselves. Second, we show that party competition can encourage platforms to rely more on targeted digital ads for monetization (as opposed to a subscription-based business model), and such ads in turn make the electorate more polarized, further contributing to the polarization of parties. These effects do not arise when one party is dominant, in which case the profit-maximizing business model of the platform is subscription-based. We discuss the impact regulations can have on the polarizing effects of AI-powered online platforms.
    JEL: L10 M37 P40
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:nbr:nberwo:33892
  6. By: Aghion, Philippe; Bunel, Simon; Jaravel, Xavier; Mikaelsen, Thomas; Roulet, Alexandra; Søgaard, Jakob
    Abstract: Using French firm-level data on AI adoption from 2017-2020, we find that, first, firms adopting AI are larger and more productive and skill intensive. Second, difference-in-difference estimates reveal an increase in firm-level employment and sales after AI adoption, suggesting that the induced productivity gains allow firms to grow and outweigh potential displacement effects. Third, occupations classified in recent work as substitutable with AI expand. Fourth, AI usage is a relevant dimension of heterogeneity in the labor demand response: We find positive employment growth for certain uses (e.g., information and communications technology security) and negative for others (e.g., administrative processes).
    JEL: R14 J01
    Date: 2025–05–31
    URL: https://d.repec.org/n?u=RePEc:ehl:lserod:128375
  7. By: Alex Farach; Alexia Cambon; Jared Spataro
    Abstract: As the digital economy grows increasingly intangible, traditional productivity measures struggle to capture the true economic impact of artificial intelligence (AI). AI systems capable of cognitive work significantly enhance productivity, yet their contributions remain obscured within the residual category of Total Factor Productivity (TFP). This paper explores whether it is time for a conceptual shift to explicitly recognize "digital labor, " the autonomous cognitive capability of AI, as a distinct factor of production alongside capital and human labor. We outline the unique economic properties of digital labor, including scalability, intangibility, self-improvement, rapid obsolescence, and elastic substitutability. By integrating digital labor into growth models (such as those by Solow and Romer), we demonstrate strategic implications for business leaders, including new approaches to productivity tracking, resource allocation, investment strategy, and organizational design. Ultimately, treating digital labor as an independent factor offers a clearer view of economic growth and helps organizations manage AI's transformative potential.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.09408
  8. By: Tobias Schmidt; Kai-Robin Lange; Matthias Reccius; Henrik M\"uller; Michael Roos; Carsten Jentsch
    Abstract: As interest in economic narratives has grown in recent years, so has the number of pipelines dedicated to extracting such narratives from texts. Pipelines often employ a mix of state-of-the-art natural language processing techniques, such as BERT, to tackle this task. While effective on foundational linguistic operations essential for narrative extraction, such models lack the deeper semantic understanding required to distinguish extracting economic narratives from merely conducting classic tasks like Semantic Role Labeling. Instead of relying on complex model pipelines, we evaluate the benefits of Large Language Models (LLMs) by analyzing a corpus of Wall Street Journal and New York Times newspaper articles about inflation. We apply a rigorous narrative definition and compare GPT-4o outputs to gold-standard narratives produced by expert annotators. Our results suggests that GPT-4o is capable of extracting valid economic narratives in a structured format, but still falls short of expert-level performance when handling complex documents and narratives. Given the novelty of LLMs in economic research, we also provide guidance for future work in economics and the social sciences that employs LLMs to pursue similar objectives.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.15041
  9. By: Marcus Buckmann; Quynh Anh Nguyen; Edward Hill
    Abstract: We investigate whether the hidden states of large language models (LLMs) can be used to estimate and impute economic and financial statistics. Focusing on county-level (e.g. unemployment) and firm-level (e.g. total assets) variables, we show that a simple linear model trained on the hidden states of open-source LLMs outperforms the models' text outputs. This suggests that hidden states capture richer economic information than the responses of the LLMs reveal directly. A learning curve analysis indicates that only a few dozen labelled examples are sufficient for training. We also propose a transfer learning method that improves estimation accuracy without requiring any labelled data for the target variable. Finally, we demonstrate the practical utility of hidden-state representations in super-resolution and data imputation tasks.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.08662
  10. By: Qirui Mi; Qipeng Yang; Zijun Fan; Wentian Fan; Heyang Ma; Chengdong Ma; Siyu Xia; Bo An; Jun Wang; Haifeng Zhang
    Abstract: Artificial intelligence (AI) has become a powerful tool for economic research, enabling large-scale simulation and policy optimization. However, applying AI effectively requires simulation platforms for scalable training and evaluation-yet existing environments remain limited to simplified, narrowly scoped tasks, falling short of capturing complex economic challenges such as demographic shifts, multi-government coordination, and large-scale agent interactions. To address this gap, we introduce EconGym, a scalable and modular testbed that connects diverse economic tasks with AI algorithms. Grounded in rigorous economic modeling, EconGym implements 11 heterogeneous role types (e.g., households, firms, banks, governments), their interaction mechanisms, and agent models with well-defined observations, actions, and rewards. Users can flexibly compose economic roles with diverse agent algorithms to simulate rich multi-agent trajectories across 25+ economic tasks for AI-driven policy learning and analysis. Experiments show that EconGym supports diverse and cross-domain tasks-such as coordinating fiscal, pension, and monetary policies-and enables benchmarking across AI, economic methods, and hybrids. Results indicate that richer task composition and algorithm diversity expand the policy space, while AI agents guided by classical economic methods perform best in complex settings. EconGym also scales to 10k agents with high realism and efficiency.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.12110
  11. By: Zonghan Wu; Junlin Wang; Congyuan Zou; Chenhan Wang; Yilei Shao
    Abstract: Generative AI, particularly large language models (LLMs), is beginning to transform the financial industry by automating tasks and helping to make sense of complex financial information. One especially promising use case is the automatic creation of fundamental analysis reports, which are essential for making informed investment decisions, evaluating credit risks, guiding corporate mergers, etc. While LLMs attempt to generate these reports from a single prompt, the risks of inaccuracy are significant. Poor analysis can lead to misguided investments, regulatory issues, and loss of trust. Existing financial benchmarks mainly evaluate how well LLMs answer financial questions but do not reflect performance in real-world tasks like generating financial analysis reports. In this paper, we propose FinAR-Bench, a solid benchmark dataset focusing on financial statement analysis, a core competence of fundamental analysis. To make the evaluation more precise and reliable, we break this task into three measurable steps: extracting key information, calculating financial indicators, and applying logical reasoning. This structured approach allows us to objectively assess how well LLMs perform each step of the process. Our findings offer a clear understanding of LLMs current strengths and limitations in fundamental analysis and provide a more practical way to benchmark their performance in real-world financial settings.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.07315
  12. By: Weixian Waylon Li; Hyeonjun Kim; Mihai Cucuringu; Tiejun Ma
    Abstract: Large Language Models (LLMs) have recently been leveraged for asset pricing tasks and stock trading applications, enabling AI agents to generate investment decisions from unstructured financial data. However, most evaluations of LLM timing-based investing strategies are conducted on narrow timeframes and limited stock universes, overstating effectiveness due to survivorship and data-snooping biases. We critically assess their generalizability and robustness by proposing FINSABER, a backtesting framework evaluating timing-based strategies across longer periods and a larger universe of symbols. Systematic backtests over two decades and 100+ symbols reveal that previously reported LLM advantages deteriorate significantly under broader cross-section and over a longer-term evaluation. Our market regime analysis further demonstrates that LLM strategies are overly conservative in bull markets, underperforming passive benchmarks, and overly aggressive in bear markets, incurring heavy losses. These findings highlight the need to develop LLM strategies that are able to prioritise trend detection and regime-aware risk controls over mere scaling of framework complexity.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.07078
  13. By: Sukru Selim Calik; Andac Akyuz; Zeynep Hilal Kilimci; Kerem Colak
    Abstract: Financial literacy is increasingly dependent on the ability to interpret complex financial data and utilize advanced forecasting tools. In this context, this study proposes a novel approach that combines transformer-based time series models with explainable artificial intelligence (XAI) to enhance the interpretability and accuracy of stock price predictions. The analysis focuses on the daily stock prices of the five highest-volume banks listed in the BIST100 index, along with XBANK and XU100 indices, covering the period from January 2015 to March 2025. Models including DLinear, LTSNet, Vanilla Transformer, and Time Series Transformer are employed, with input features enriched by technical indicators. SHAP and LIME techniques are used to provide transparency into the influence of individual features on model outputs. The results demonstrate the strong predictive capabilities of transformer models and highlight the potential of interpretable machine learning to empower individuals in making informed investment decisions and actively engaging in financial markets.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.06345
  14. By: David Aristei; Manuela Gallo
    Abstract: This paper investigates the impact of objective financial knowledge, confidence in one's financial skills, and digital financial literacy on individuals' decisions to seek financial advice from robo-advice platforms. Using microdata from the Bank of Italy's survey on adults' "Financial Literacy and Digital Financial Skills in Italy", we find that individuals with greater financial knowledge are less inclined to rely on online services for automated financial advice. On the contrary, confidence in one's financial abilities and digital financial literacy enhance the likelihood of using robo-advice services. Trust in financial innovation, the use of digital financial services, and the propensity to take risks and save also emerge as significant predictors of an individual's use of robo-advice. We also provide evidence of a significant complementary relationship between using robo-advisory services and the demand for independent professional human advice. In contrast, a substitution effect is found for non-independent human advice. These findings highlight the importance of hybrid solutions in professional financial consulting, where robo-advisory services complement human financial advice.
    Date: 2025–05
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2505.20527
  15. By: Thiago Christiano Silva; Kei Moriya; Mr. Romain M Veyrune
    Abstract: This paper introduces a classification framework to analyze central bank communications across four dimensions: topic, communication stance, sentiment, and audience. Using a fine-tuned large language model trained on central bank documents, we classify individual sentences to transform policy language into systematic and quantifiable metrics on how central banks convey information to diverse stakeholders. Applied to a multilingual dataset of 74, 882 documents from 169 central banks spanning 1884 to 2025, this study delivers the most comprehensive empirical analysis of central bank communication to date. Monetary policy communication changes significantly with inflation targeting, as backward-looking exchange rate discussions give way to forward-looking statements on inflation, interest rates, and economic conditions. We develop a directional communication index that captures signals about future policy rate changes and unconventional measures, including forward guidance and balance sheet operations. This unified signal helps explain future movements in market rates. While tailoring messages to audiences is often asserted, we offer the first systematic quantification of this practice. Audience-specific risk communication has remained stable for decades, suggesting a structural and deliberate tone. Central banks adopt neutral, fact-based language with financial markets, build confidence with the public, and highlight risks to governments. During crises, however, this pattern shifts remarkably: confidence-building rises in communication to the financial sector and government, while risk signaling increases for other audiences. Forward-looking risk communication also predicts future market volatility, demonstrating that central bank language plays a dual role across monetary and financial stability channels. Together, these findings provide novel evidence that communication is an active policy tool for steering expectations and shaping economic and financial conditions.
    Keywords: Central bank communication; large language models; forward guidance; monetary policy; sentiment analysis
    Date: 2025–06–06
    URL: https://d.repec.org/n?u=RePEc:imf:imfwpa:2025/109
  16. By: Timoth\'ee Hornek Amir Sartipi; Igor Tchappi; Gilbert Fridgen
    Abstract: Accurate electricity price forecasting (EPF) is crucial for effective decision-making in power trading on the spot market. While recent advances in generative artificial intelligence (GenAI) and pre-trained large language models (LLMs) have inspired the development of numerous time series foundation models (TSFMs) for time series forecasting, their effectiveness in EPF remains uncertain. To address this gap, we benchmark several state-of-the-art pretrained models--Chronos-Bolt, Chronos-T5, TimesFM, Moirai, Time-MoE, and TimeGPT--against established statistical and machine learning (ML) methods for EPF. Using 2024 day-ahead auction (DAA) electricity prices from Germany, France, the Netherlands, Austria, and Belgium, we generate daily forecasts with a one-day horizon. Chronos-Bolt and Time-MoE emerge as the strongest among the TSFMs, performing on par with traditional models. However, the biseasonal MSTL model, which captures daily and weekly seasonality, stands out for its consistent performance across countries and evaluation metrics, with no TSFM statistically outperforming it.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.08113
  17. By: Giuseppe Arbia; Luca Morandini; Vincenzo Nardelli
    Abstract: This paper investigates Large Language Models (LLMs) ability to assess the economic soundness and theoretical consistency of empirical findings in spatial econometrics. We created original and deliberately altered "counterfactual" summaries from 28 published papers (2005-2024), which were evaluated by a diverse set of LLMs. The LLMs provided qualitative assessments and structured binary classifications on variable choice, coefficient plausibility, and publication suitability. The results indicate that while LLMs can expertly assess the coherence of variable choices (with top models like GPT-4o achieving an overall F1 score of 0.87), their performance varies significantly when evaluating deeper aspects such as coefficient plausibility and overall publication suitability. The results further revealed that the choice of LLM, the specific characteristics of the paper and the interaction between these two factors significantly influence the accuracy of the assessment, particularly for nuanced judgments. These findings highlight LLMs' current strengths in assisting with initial, more surface-level checks and their limitations in performing comprehensive, deep economic reasoning, suggesting a potential assistive role in peer review that still necessitates robust human oversight.
    Date: 2025–06
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2506.06377
  18. By: Buckmann , Marcus (Bank of England); Hill, Ed (Bank of England)
    Abstract: Text classification tasks such as sentiment analysis are common in economics and finance. We demonstrate that smaller, local generative language models can be effectively used for these tasks. Compared to large commercial models, they offer key advantages in privacy, availability, cost, and explainability. We use 17 sentence classification tasks (each with 2 to 4 classes) to show that penalised logistic regression on embeddings from a small language model often matches or exceeds the performance of a large model, even when trained on just dozens of labelled examples per class – the same amount typically needed to validate a large model’s performance. Moreover, this embedding-based approach yields stable and interpretable explanations for classification decisions.
    Keywords: Text classification; large language models; machine learning; embeddings; explainability
    JEL: C38 C45 C80
    Date: 2025–05–23
    URL: https://d.repec.org/n?u=RePEc:boe:boeewp:1127

This nep-ain issue is ©2025 by Ben Greiner. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.