|
on Artificial Intelligence |
| By: | Pisch, Frank; Rossmann, Vitus; Jussupow, Ekaterina; Ingendahl, Franziska; Undorf, Monika |
| Abstract: | Ever more frequent and intense collaboration with agents based on Large Language Models (LLMs) at work and in daily life raises the question of whether this affects how humans view and treat each other. We conducted a randomized laboratory experiment with 158 participants who collaborated with either a human or an LLM-based assistant to solve a complex language task. Afterwards, we measured whether the type of collaborator influenced participants’ prosocial attitudes (through implicit association tests) and behavior (in dictator games). Interacting with an LLM-based assistant led to a reduction of prosociality, but only for participants who identified as female. A mediation analysis suggests that these findings are due to an erosion of trust in the LLM-based assistant's benevolence in the female subsample. Such spillover effects of collaborating with AI on interactions between humans must feature in the evaluation of the societal consequences of artificial intelligence and warrant further research. |
| Date: | 2025–10–11 |
| URL: | https://d.repec.org/n?u=RePEc:dar:wpaper:158953 |
| By: | Felipe A. Csaszar; Aticus Peterson; Daniel Wilde |
| Abstract: | Can artificial intelligence outperform humans at strategic foresight -- the capacity to form accurate judgments about uncertain, high-stakes outcomes before they unfold? We address this question through a fully prospective prediction tournament using live Kickstarter crowdfunding projects. Thirty U.S.-based technology ventures, launched after the training cutoffs of all models studied, were evaluated while fundraising remained in progress and outcomes were unknown. A diverse suite of frontier and open-weight large language models (LLMs) completed 870 pairwise comparisons, producing complete rankings of predicted fundraising success. We benchmarked these forecasts against 346 experienced managers recruited via Prolific and three MBA-trained investors working under monitored conditions. The results are striking: human evaluators achieved rank correlations with actual outcomes between 0.04 and 0.45, while several frontier LLMs exceeded 0.60, with the best (Gemini 2.5 Pro) reaching 0.74 -- correctly ordering nearly four of every five venture pairs. These differences persist across multiple performance metrics and robustness checks. Neither wisdom-of-the-crowd ensembles nor human-AI hybrid teams outperformed the best standalone model. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01684 |
| By: | Adena, Maja; Alabrese, Eleonora; Capozza, Francesco; Leader, Isabelle |
| Abstract: | We test whether AI-generated news images affect outlet demand and trust. In a preregistered experiment with 2, 870 UK adults, the same article was paired with a wireservice photo (with/without credit) or a matched AI image (with/without label). Average newsletter demand changes little. Ex-post photo origin recollection is poor, and many believe even the real photo is synthetic. Beliefs drive behavior: thinking the image is AI cuts demand and perceived outlet quality by about 10 p.p., even when the photo is authentic; believing it is real has the opposite effect. Labels modestly reduce penalties but do little to correct mistaken attributions. |
| Keywords: | AI, Demand for News, Trust, Online Experiment |
| JEL: | C81 C93 D83 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:zbw:wzbiii:336444 |
| By: | Yuhao Fu; Nobuyuki Hanaki |
| Abstract: | This experimental study investigates how people rely on different sources of advice when detecting AI-generated fake news (deepfake news). In a laboratory deepfake detection task, student participants identified the proportion of human-written (non-AI-generated) content in synthetic deepfake news articles and received advice from ChatGPT (GPT-4), human peers, or linguistic experts. The results show that participants rely more on ChatGPT than on human peers when detecting GPT-2-generated deepfake news. Participants also rely more on linguistic experts than on peers, while the relative reliance on experts versus ChatGPT is mixed across experimental waves, potentially reflecting time trends in beliefs about AI-based detection. Moreover, performance improvements reflect the joint role of reliance and advice quality, arising primarily when participants rely on high-quality advice. Overall, relying on AI to detect AI-generated deepfakes can improve detection outcomes, but only when AI-based detection tools are of sufficiently high quality. These findings highlight the dual role of GAI as both a source of deepfakes and a tool for mitigating related risks. |
| Date: | 2024–03 |
| URL: | https://d.repec.org/n?u=RePEc:dpr:wpaper:1233rr |
| By: | Imke Reimers; Joel Waldfogel |
| Abstract: | With the diffusion of LLMs between 2022 and 2025, new book releases have tripled, raising a question of AI's impact on book quality. We develop a ratings-based usage measure that is comparable across book release vintages, and we find that the vintages from the AI influx period have lower average quality. Yet, the top 1, 000 monthly releases per category - albeit not the top 100 - have higher quality than before; and the effect is larger in categories with faster growth in new titles. Authors entering since the LLM influx produce predominantly low-quality work; and the higher-quality output of pre-LLM authors entrants has risen. A nested logit calibration shows that LLM-enhanced book production could, in steady state, raise the surplus that consumers derive from book markets by a quarter to a half. |
| JEL: | L16 L82 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:34777 |
| By: | Joshua S. Gans |
| Abstract: | Machine learning systems embed preferences either in training losses or through post-processing of calibrated predictions. Applying information design methods from Strack and Yang (2024), this paper provides decision-problem-agnostic conditions under which separation—training preference-free and applying preferences ex post is optimal. Unlike prior work that requires specifying downstream objectives, the welfare results here apply uniformly across decision problems. The key primitive is a diminishing-value-of-information condition: relative to a fixed (normalised) preference-free loss, preference embedding makes informativeness less valuable at the margin, inducing a mean-preserving contraction of learned posteriors. Because the value of information is convex in beliefs, preference-free training weakly dominates for any expected-utility decision problem. This provides theoretical foundations for modular AI pipelines that learn calibrated probabilities and implement asymmetric costs through downstream decision rules. However, separation requires users to implement optimal decision rules. When cognitive constraints bind—as documented in human-AI decision-making—preference embedding can dominate by automating threshold computation. These results provide design guidance: preserve optionality through postprocessing when objectives may shift; embed preferences when decision-stage frictions dominate. |
| JEL: | C45 C53 D81 D82 D83 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:34780 |
| By: | Ryan Stevens |
| Abstract: | Generative AI has the potential to transform how firms produce output. Yet, credible evidence on how AI is actually substituting for human labor remains limited. In this paper, we study firm-level substitution between contracted online labor and generative AI using payments data from a large U.S. expense management platform. We track quarterly spending from Q3 2021 to Q3 2025 on online labor marketplaces (such as Upwork and Fiverr) and leading AI model providers. To identify causal effects, we exploit the October 2022 release of ChatGPT as a common adoption shock and estimate a difference-in-differences model. We provide a novel measure of exposure based on the share of spending at online labor marketplaces prior to the shock. Firms with greater exposure to online labor adopt AI earlier and more intensively following the shock, while simultaneously reducing spending on contracted labor. By Q3 2025, firms in the highest exposure quartile increase their share of spending on AI model providers by 0.8 percentage points relative to the lowest exposure quartile, alongside significant declines in labor marketplace spending. Combining these responses yields a direct estimate of substitution: among the most exposed firms, a \$1 decline in online labor spending is associated with approximately \$0.03 of additional AI spending, implying order-of-magnitude cost savings from replacing outsourced tasks with AI services. These effects are heterogeneous across firms and emerge gradually over time. Taken together, our results provide the first direct, micro-level evidence that generative AI is being used as a partial substitute for human labor in production. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00139 |
| By: | Ajay K. Agrawal; John McHale; Alexander Oettl |
| Abstract: | The task-based approach has become the dominant framework for studying the labor-market effects of artificial intelligence (AI), typically emphasizing the replacement of human workers by machines. Motivated by growing empirical evidence that contemporary AI is more often used as a tool that augments workers, this paper develops two related task-based models in which AI enhances worker productivity without automating tasks. Abstracting from capital, we develop a pair of related task-based models that examine how technological progress in AI that provides new tools to augment workers affects aggregate productivity and wage inequality. Both models emphasize the role of human capital in intermediating the effects of AI-related technological shocks. In the first model, AI use requires specialized expertise, and technological progress expands the set of tasks for which such expertise is effective. We show that a larger supply of AI expertise amplifies the productivity gains from improvements in AI technology while attenuating its adverse effects on wage inequality. The second model focuses on non-AI skills, allowing AI tools to alter the set of tasks that workers can perform given their skills. In equilibrium, workers allocate across tasks in response to wages, generating an endogenous distribution of skills across the task space. A central result is that aggregate productivity and wage inequality depend on different global properties of this equilibrium distribution: productivity is particularly sensitive to thinly staffed tasks that create bottlenecks, while wage inequality is driven by the concentration of workers in a narrow set of tasks. As a result, improvements in AI tools can induce non-monotonic co-movement between productivity and inequality. By linking these mechanisms to multidimensional human capital---including AI expertise and higher-order non-AI skills---the paper highlights the role of education and training policies in shaping the economic consequences of AI-driven technological change. |
| JEL: | J24 O33 O41 |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:34781 |
| By: | Annie Liang; Jay Lu |
| Abstract: | Copyright law focuses on whether a new work is "substantially similar" to an existing one, but generative AI can closely imitate style without copying content, a capability now central to ongoing litigation. We argue that existing definitions of infringement are ill-suited to this setting and propose a new criterion: a generative AI output infringes on an existing work if it could not have been generated without that work in its training corpus. To operationalize this definition, we model generative systems as closure operators mapping a corpus of existing works to an output of new works. AI generated outputs are \emph{permissible} if they do not infringe on any existing work according to our criterion. Our results characterize structural properties of permissible generation and reveal a sharp asymptotic dichotomy: when the process of organic creations is light-tailed, dependence on individual works eventually vanishes, so that regulation imposes no limits on AI generation; with heavy-tailed creations, regulation can be persistently constraining. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.12270 |
| By: | Minou Goetze; Sebastian Clajus; Stephan Stricker |
| Abstract: | The present study investigates the psychological and behavioral implications of integrating AI into debt collection practices using data from eleven European countries. Drawing on a large-scale experimental design (n = 3514) comparing human versus AI-mediated communication, we examine effects on consumers' social preferences (fairness, trust, reciprocity, efficiency) and social emotions (stigma, empathy). Participants perceive human interactions as more fair and more likely to elicit reciprocity, while AI-mediated communication is viewed as more efficient; no differences emerge in trust. Human contact elicits greater empathy, but also stronger feelings of stigma. Exploratory analyses reveal notable variation between gender, age groups, and cultural contexts. In general, the findings suggest that AI-mediated communication can improve efficiency and reduce stigma without diminishing trust, but should be used carefully in situations that require high empathy or increased sensitivity to fairness. The study advances our understanding of how AI influences the psychological dynamics in sensitive financial interactions and informs the design of communication strategies that balance technological effectiveness with interpersonal awareness. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00050 |
| By: | M.Jahangir Alam; Shane Boyle; Huiyu Li; Tatevik Sekhposyan |
| Abstract: | Recent research suggests that generic large language models (LLMs) can match the accuracy of traditional methods when forecasting macroeconomic variables in pseudo out-of-sample settings generated via prompts. This paper assesses the out-of-sample forecasting accuracy of LLMs by eliciting real-time forecasts of U.S. inflation from ChatGPT. We find that out-of-sample predictions are largely inaccurate and stale, even though forecasts generated in pseudo out-of-sample environments are comparable to existing benchmarks. Our results underscore the importance of out-of-sample benchmarking for LLM predictions. |
| Keywords: | large language models; generative AI; inflation forecasting |
| JEL: | C45 E31 E37 |
| Date: | 2026–02–05 |
| URL: | https://d.repec.org/n?u=RePEc:fip:fedfwp:102407 |
| By: | Brandon Yee; Krishna Sharma |
| Abstract: | Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{, }000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.01022 |
| By: | Keywan Christian Rasekhschaffe |
| Abstract: | We study whether generative AI can automate feature discovery in U.S. equities. Using large language models with retrieval-augmented generation and structured/programmatic prompting, we synthesize economically motivated features from analyst, options, and price-volume data. These features are then used as inputs to a tabular machine-learning model to forecast short-horizon returns. Across multiple datasets, AI-generated features are consistently competitive with baselines, with Sharpe improvements ranging from 14% to 91% depending on dataset and configuration. Retrieval quality is pivotal: better knowledge bases materially improve outcomes. The AI-generated signals are weakly correlated with traditional features, supporting combination. Overall, generative AI can meaningfully augment feature discovery when retrieval quality is controlled, producing interpretable signals while reducing manual engineering effort. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00196 |
| By: | Keil, Samuel; Martin, Pascal; Schiereck, Dirk |
| Abstract: | Announcements of emerging technologies often lead to notable stock market reactions, with Artificial Intelligence standing out due to its transformative potential and growing regulatory attention. Yet, most research on investor responses to AI disclosures focuses on U.S. firms, leaving the distinct European context unexplored. Using a short-term event study of 526 AI-related announcements by STOXX Europe 600 firms between 2015 and 2024, we report a significantly negative average stock return of -0.176% within a three-day window. However, announcements detailing specific AI technologies, involving collaborations with AI specialists, or made after the release of ChatGPT are associated with less negative reactions. In contrast, references to EU regulatory frameworks like the AI Act show no significant effect. Our findings confirm generally negative investor reactions to AI announcements but show that in Europe, strategic factors such as announcement specificity, collaborations, and timing also significantly mitigate these effects. |
| Date: | 2026–01–06 |
| URL: | https://d.repec.org/n?u=RePEc:dar:wpaper:159306 |
| By: | J. Ignacio Conde-Ruiz; Clara I. González; Miguel Díaz Salazar |
| Abstract: | This paper combines artificial intelligence with economic modeling to design evaluation committees that are both efficient and fair in the presence of gender differences in economic research orientation. We develop a dynamic framework in which research evaluation depends on the thematic similarity between evaluators and researchers. The model shows that while topic balanced committees maximize welfare, this research neutral-gender allocation is dynamically unstable, leading to the persistent dominance of the group initially overrepresented in evaluation committees. Guided by these predictions, we employ unsupervised machine learning to extract research profiles for male and female researchers from articles published in leading economics journals between 2000 and 2025. We characterize optimal balanced committees within this multidimensional latent topic space and introduce the Gender-Topic Alignment Index (GTAI) to measure the alignment between committee expertise and female-prevalent research areas. Our simulations demonstrate that AI-based committee designs closely approximate the welfare-maximizing benchmark. In contrast, traditional headcount-based quotas often fail to achieve balance and may even disadvantage the groups they intend to support. We conclude that AI-based tools can significantly optimize institutional design for editorial boards, tenure committees, and grant panels. |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:fda:fdaddt:2026-01 |
| By: | Walid Siala (SnT, University of Luxembourg, Luxembourg); Ahmed Khanfir (RIADI, ENSI, University of Manouba, Tunisia; SnT, University of Luxembourg, Luxembourg); Mike Papadakis (SnT, University of Luxembourg, Luxembourg) |
| Abstract: | This paper addresses stock price movement prediction by leveraging LLM-based news sentiment analysis. Earlier works have largely focused on proposing and assessing sentiment analysis models and stock movement prediction methods, however, separately. Although promising results have been achieved, a clear and in-depth understanding of the benefit of the news sentiment to this task, as well as a comprehensive assessment of different architecture types in this context, is still lacking. Herein, we conduct an evaluation study that compares 3 different LLMs, namely, DeBERTa, RoBERTa and FinBERT, for sentiment-driven stock prediction. Our results suggest that DeBERTa outperforms the other two models with an accuracy of 75% and that an ensemble model that combines the three models can increase the accuracy to about 80%. Also, we see that sentiment news features can benefit (slightly) some stock market prediction models, i.e., LSTM-, PatchTST- and tPatchGNN-based classifiers and PatchTST- and TimesNet-based regression tasks models. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2602.00086 |