|
on Artificial Intelligence |
| By: | Abel, Martin (Bowdoin College); Dawi, Raghad (Bowdoin College); Lenk, Tyler (Bowdoin College); Singer, Aidan (Bowdoin College) |
| Abstract: | How do workers respond when artificial intelligence replaces human judgment in evaluating prosocial work? Partnering with a non-profit addressing food insecurity, we recruit 1, 491 U.S. volunteers to write fundraising messages and cross-randomize evaluation by humans versus AI and the presence of performance pay. AI evaluation reduces effort by 11–14 percent among volunteers with low commitment to the cause, while having no effect on those strongly aligned with the mission. Performance pay fails to mitigate these adverse effects. Workers perceive AI as less effective at identifying quality, which appears to be the primary mechanism, and as less fair and transparent than human evaluation. Introducing an AI algorithm that explicitly applies human evaluation criteria does not mitigate these negative effects, suggesting that resistance to AI evaluation reflects deeper skepticism about machines' capacity for subjective judgment. |
| Keywords: | algorithm aversion, algorithmic management, artificial intelligence, intrinsic motivation, worker effort |
| JEL: | J24 M54 |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:iza:izadps:dp18678 |
| By: | Kasberger, Bernhard; Martin, Simon; Normann, Hans-Theo; Werner, Tobias |
| Abstract: | Reinforcement learning algorithms play an increasingly important role in economic situations. These situations are often strategic, and the artificial intelligence may or may not be cooperative. We compare human and algorithmic cooperation rates in the infinitely repeated two-player prisoner's dilemma and study which strategies they choose to cooperate and punish deviations. Through a sequence of computational Q-learning and human-player experiments, we find that our Q-learning algorithms tend to cooperate less than humans, particularly when cooperation is risky or not incentive-compatible. Algorithms often use different strategies than humans, leading to distinct on- and off-path behavior. |
| Keywords: | Artificial intelligence, cooperation, Q-learning, repeated prisoner's dilemma |
| JEL: | C72 C73 C92 D83 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:zbw:dicedp:341427 |
| By: | Pallavi Pal; Anjana Susarla |
| Abstract: | Online advertising platforms host hundreds of thousands of A/B tests, but the platform's delivery algorithm routes each creative to the audience it predicts will engage. Every two-arm test therefore conflates the creative's effect with the algorithm's targeting response, and adjusting for the realized audience is biased because audience is a post-treatment mediator. We propose a three-arm design that adds an arm exposing the algorithm to the treatment metadata while holding the user-facing creative identical to control, point-identifying the natural indirect (algorithmic) and direct (creative) effects without sequential ignorability. In a live Meta campaign with a women-targeted text fragment, the algorithmic channel raises female impression share by +2.07 ppt while the creative channel moves it by -0.68 ppt; roughly three-quarters of the absolute reallocation is algorithmic, and a conventional two-arm test understates the algorithmic channel by a factor of two. The design isolates the contribution of platform's algorithm to the outcome which is separable from creative content. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.23706 |
| By: | Masahiro Kato |
| Abstract: | We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability. |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2606.20041 |
| By: | Yan Dai; Maryam Farboodi; Negin Golrezaei; Sepehr Shahshahani |
| Abstract: | How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency. |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2606.12260 |
| By: | Valerio Fedele Addis; Giuseppe Attanasi; Giovanni Di Bartolomeo; Michele Mariella; Valentina Peruzzi |
| Abstract: | We study whether a large language model can reliably evaluate human creativity in constrained, innovation-like tasks. Using expert-generated creative outputs from a validated experiment with workers in cultural and creative industries, we embed ChatGPT as an evaluator and benchmark its assessments against expert human judgments obtained through the Consensual Assessment Technique. In Study 1, we show that AI-based creativity evaluations exhibit internal consistency comparable to that of expert judges across repeated and independent runs, even under conservative scenarios. Replacing a human judge with an AI evaluator does not reduce inter-rater reliability across drawing, mathematical, and verbal tasks. In Study 2, we find that AI evaluations are systematically structured along fluency, flexibility, originality, and elaboration, with task-specific weighting of these dimensions. Overall, the results indicate that AI can serve as a reliable and structured evaluator of creativity in constrained innovation environments. |
| Keywords: | Artificial intelligence, Creativity evaluation, Constrained creativity tasks, Consensual Assessment Technique, Cultural-and-creative-industry professionals, Innovation-like tasks |
| JEL: | O31 D83 M14 C91 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:ter:wpaper:00197 |
| By: | Shan Huang; Renke Schmacker; Hannes Ullrich |
| Abstract: | AI can raise productivity by extracting information from rich data, yet little is known about how experts weigh AI-generated signals against established decision-support tools. We conduct a nationwide survey experiment with 372 Danish primary care physicians (21.5% of all clinics), who make diagnostic and treatment decisions on urinary tract infection vignettes before and after receiving a diagnostic signal. Holding accuracy constant, we randomize between-subjects whether the signal appears as an AI prediction or a commonly used dipstick test result. Physicians update beliefs 41% less in response to AI than to dipstick signals, consistent with AI skepticism. Roughly one-third of physicians ignore the AI tool; linked administrative data show that these non-adopters resemble adopters on a range of observables, including clinical practice and prescribing measures, except for lower baseline technology use at their clinics. When physicians use the AI tool, they ignore asymmetry in informativeness between positive and negative signals and, when shown both the AI and a redundant signal, exhibit correlation neglect. These frictions in information processing lead to increased antibiotic prescribing with the AI signal. Our findings highlight the importance of training and information design for AI implementation. |
| Keywords: | expert decision-making, artificial intelligence, healthcare, mental models |
| JEL: | I11 D81 D83 J24 O33 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:diw:diwwpp:dp2168 |
| By: | Lukas Althoff; Hugo Reichardt |
| Abstract: | Artificial intelligence (AI) reshapes workers’ comparative advantage by altering the tasks they perform and the skills those tasks require. We develop a dynamic task-based model to quantify the general-equilibrium effects of task-specific technical change. Workers have multidimensional skills, choose occupations, and accumulate skills on the job; occupations combine tasks, and productivity depends on how workers’ skills match task requirements. We develop a computationally efficient procedure to estimate the model using panel data and a new database of task-level skill requirements. We apply the model to AI, allowing it to augment, automate, and simplify tasks. We find that AI narrows wage inequality and raises average wages across scenarios ranging from slow to rapid AI progress. The key equalizing force is simplification: by lowering tasks’ skill requirements, AI lets lower-skill workers compete for previously inaccessible jobs. Adoption costs, highest for lower-skill workers, dampen but do not eliminate the decline in inequality. |
| JEL: | C6 C8 D2 D58 E20 J20 J3 J6 O3 O4 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:35353 |
| By: | Westby, Samuel (Northeastern University); Modestino, Alicia (Northeastern University); Cheng, Peiran (Northeastern University) |
| Abstract: | Generative AI may change how firms define occupations. We study this process in software development, where large language models overlap with tasks commonly assigned to junior workers. Using the near-universe U.S. online vacancy data from Lightcast, we examine how the public release of ChatGPT changed entry-level software hiring standards. Event-study and difference-in-differences estimates show a 14–15 percent relative decline in junior versus senior software developer vacancies, larger than in related technical occupations and absent in mechanical engineering. A shift-share decomposition shows that rising experience requirements were driven primarily by employers asking for more experience within the same job titles, not by asking for a different composition of titles. Remaining junior vacancies shifted toward problem solving, communication, and attention to detail, not AI-specific skills. The results show how generative AI redefines entry-level work by raising the bar for what counts as a qualified junior hire. |
| Keywords: | generative AI, economics of information systems, labor demand, job vacancies, hiring standards, entry-level work |
| JEL: | J23 O33 J24 D83 M51 L86 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:iza:izadps:dp18723 |
| By: | Xiliu He; Haoxiang Zhao; Mingyi Ma; Edward Wen Chuan Lai; Koei Enomoto; Anni Hu; Jiatong Li; Lingyun Chu; Yuan Lai |
| Abstract: | Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.25505 |
| By: | Lindenlaub, Ilse; Oh, Ryungha; Rodriguez, Maria Alejandra; Veldkamp, Laura |
| Abstract: | We document and explain the gap between measures of AI exposure and measures of AI adoption in the workplace. This leads us to propose a new AI adoption index based on comparative advantage. Using the representative German DiWaBe employee survey linked to worker and establishment information, we compare worker-reported AI use to prominent exposure measures and find that the relationship is weak. Motivated by this gap, we develop a framework in which adoption depends not only on technical feasibility—AI’s absolute advantage measured by exposure—but also on profitability—AI’s comparative (dis)advantage relative to a specific worker—balancing AI productivity against AI user costs and worker productivity against wages. We operationalize this framework at the task level by (i) estimating worker productivity relative to pay, (ii) mapping exposure indices into AI productivity, and (iii) inferring task-specific AI user costs from revealed-preference adoption. The resulting occupation-level index accounts for 60% of the cross-occupation variation in observed AI adoption, compared with 14% for an exposure-only model. The two approaches diverge substantially for approximately 30% of workers, highlighting that comparative advantage—not exposure alone—is crucial for assessing AI’s labor-market impact. |
| Keywords: | Artificial intelligence; Comparative advantage; Technology diffusion; Worker productivity |
| JEL: | E24 D24 J24 O33 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:cpr:ceprdp:21589 |
| By: | Payal Malik (Indian Council for Research on International Economic Relations (ICRIER)); Nikita Jain |
| Abstract: | This paper examines AI–labour interactions in India through a structural and a task-based lens. Using Indian databases such as KLEMS, PLFS, and NCO classifications, the analysis groups the economy into four structural categories and applies the productivity, inclusivity, and entrepreneurship framework to assess sector-wise AI exposure and adjustment capacity. The findings suggest that AI diffusion in India will be uneven. In employment-intensive, low-productivity sectors, displacement risks remain limited for now, but exclusion from AI-enabled productivity gains is a concern. In manufacturing, AI creates opportunities for upgrading through the servicification of production alongside selective automation. In knowledge-intensive services, AI may augment human capabilities, but this may require high-level skilling of labour, leading to shifting the nature of jobs. In public and social sectors, AI may lead to huge productivity gains. This paper argues for an AI strategy that prioritises diffusion and broad-based participation through coordinated investments in skills, infrastructure, and institutions. |
| Keywords: | Artificial intelligence, labour markets, task-based analysis, productivity, inclusivity, entrepreneurship, AI-for-labour, IPCIDE, icrier |
| Date: | 2026–02 |
| URL: | https://d.repec.org/n?u=RePEc:bdc:ppaper:ipcide-10 |
| By: | Dennis Facius; Roberto Iacono |
| Abstract: | Does Generative AI displace early-career workers? We provide population-wide evidence from Norwegian administrative registers, 2015 through March 2025, exploiting the November 2022 release of ChatGPT as an availability shock. Using the within-firm composition difference-in-differences employed in recent work, supplemented with a synthetic difference-in-differences at the occupation level and a firm-level shift-share design, we find no robust evidence of employment displacement among young workers in highly AI-exposed occupations, nor any robust response across other age cohorts or on incumbent labor-market outcomes. While estimated coefficients for young workers are negative, in line with the existing literature, they are small and statistically insignificant. A backdating exercise on the synthetic difference-in-differences yields larger absolute estimates than the actual treatment date across most age bands. This suggests the apparent post-2022 decline reflects, at least in part, pre-existing secular trends rather than a clean AI-period break. |
| Keywords: | generative artificial intelligence, large language models, automation, labor demand |
| JEL: | J23 J24 J31 O33 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12752 |
| By: | Krzywdzinski, Martin |
| Abstract: | This paper investigates how software developers perceive the current and future automation of their work in the context of rapidly advancing generative and agentic AI. While existing research has primarily focused on productivity effects of specific AI coding tools in experimental settings, less is known about the broader organization of software-development work, the limits of automation, and developers' own expectations regarding labor-market outcomes. The paper addresses four research questions: the current level of automation across software-development tasks and occupations; expectations regarding future automation and its drivers; structural limits to automation; and perceived implications for job security, employability, and income. The analysis draws on an original survey of 1, 731 software developers from eleven countries and six professional subgroups. The findings show that software development is currently characterized by moderate automation across all task domains, with humans still central to planning, coordination, and problem-solving. Respondents expect substantial increases in automation over the next five years, driven primarily by generative and agentic AI. However, the study also identifies important limits to automation: as automation increases, remaining tasks become less standardized, while problem-solving and collaboration demands persist. Finally, most developers remain cautiously optimistic about their labor-market prospects, although workers already operating in highly automated environments express significantly greater concerns about future job security. |
| Keywords: | automation, artificial intelligence, skills, work organization, software development, programming |
| JEL: | J22 J24 J44 L86 O33 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:zbw:wzbgwp:341630 |
| By: | Christopher Campos; John D. Singleton |
| Abstract: | Although use of generative AI tools has quickly become widespread in education settings, emerging evidence suggests that effects on learning will depend on how that use is supported and guided. This paper reports findings from an original national survey of K-12 school principals designed to measure institutional integration of AI in schools through policies, teacher training, guidance for student use, leadership engagement, and the availability of AI-enabled tools. We find that AI use has spread rapidly across schools, largely as a productivity aid. Students mainly use AI for homework help and writing, while educators primarily use it for lesson planning and administrative tasks. The development of teacher training, guidance, and school policies has lagged adoption. We next document two diffusion gaps across schools: First, lower AI integration is associated with a higher share of disadvantaged students (a one standard deviation increase in disadvantage is associated with a 0.07-0.11 SD lower score on an index of AI integration); Second, private and charter schools score 0.23-0.44 SD lower on the AI integration index than traditional public schools. Although several surveyed school-level factors strongly predict AI integration, they do little to explain these gaps. Differences in district size account for roughly one-third of the disadvantage gap between public schools. These findings suggest that the factors associated with greater AI integration differ from those needed to narrow disparities in how schools support and guide AI use. |
| JEL: | I21 O30 O32 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:nbr:nberwo:35347 |
| By: | Strömberg, David; Lei, Victor; Wu, Yanhui |
| Abstract: | Using 30 months of panel data on 26, 811 Chinese students in grades 7--12, we study how generative AI affects homework productivity and learning. The data combine monthly closed-book exams, high-school and college entrance exams, and homework scores and completion time across nine subjects. We exploit staggered AI adoption in a difference-in-differences design. AI adoption raises homework scores by 18% and reduces completion time by 30%, but lowers monthly exam scores by 20% within six months. High-stakes entrance-exam scores fall by 18 and 24%, with the full penalty emerging only after about two years. The losses are largest in social science subjects, followed by STEM and languages, and are especially large for junior students, high-achieving students, and boys. The learning losses are concentrated among roughly 80% of AI users whose behavior is consistent with homework outsourcing, as indicated by exceptionally short homework completion time coupled with high homework scores. AI users who maintain similar homework completion time as non-AI users experience small learning losses. |
| Keywords: | Generative AI; Education; Human capital accumulation; China shock |
| JEL: | O15 I20 O33 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:cpr:ceprdp:21577 |
| By: | Korinek, Anton; McKelvey, Patrick |
| Abstract: | We construct a macroeconomic estimate of total AI production for the United States, combining inference and R&D/training activities and applying quality adjustments based on the evolution of API prices at fixed performance levels and the pace of algorithmic progress. We estimate that nominal AI compute spending grew over 140 percent per year each in 2024 and 2025, raw compute capacity grew over 200 percent per year, and quality-adjusted AI output grew over 2, 000 percent per year. These growth rates reflect three compounding forces: expanding data-center capacity, continued improvements in chip efficiency, and rapid algorithmic progress. We then employ our estimates to develop a nascent framework for “AI GDP†that tracks the AI economy as a coherent whole rather than dispersed across standard industry classifications. Quality-adjusted AI GDP grew by more than 2, 500 percent each in 2024 and 2025. Our measures complement traditional national accounts by providing visibility into a fast-moving sector whose activity is difficult to isolate in existing statistics, and they may serve as building blocks for satellite accounts that track AI’s growing role in the economy. |
| JEL: | E01 O33 O47 E22 |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:cpr:ceprdp:21571 |
| By: | John R. Graham; Campbell R. Harvey; Manish Jha |
| Abstract: | Business sentiment is a closely watched economic signal, but measuring it is slow and costly: surveys reach only a few hundred firms, arrive periodically, and take time to compile. We show that large language models hold the potential to address these shortcomings. We prompt an LLM to role-play as the CFO of a specific company at a specific date and focus on the economic-optimism question on the Duke-Federal Reserve CFO Survey over 2002-2025. We find that the LLM reproduces individual human responses: the predicted optimism score significantly forecasts the CFO's actual answer, surviving firm and year-quarter fixed effects and a control for the most recent prior response. Predictive accuracy increases with the amount of information supplied, as both respondent history and firm characteristics improve fit, and the relationship persists under quarterly aggregation. With appropriate conditioning, LLMs may be able to serve as credible digital twins of executives, offering scalable, high-frequency expectations data for financial research and policy. |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2606.13812 |
| By: | David R. Agrawal; William F. Fox |
| Abstract: | This paper examines how artificial intelligence (AI) reshapes subnational public finance, largely through familiar channels observed from prior technological change. Although some effects are novel, many issues surrounding the taxation of AI-related income and consumption parallel earlier challenges from e-commerce, digitalization, and remote work. AI shifts income from labor toward capital and reallocates tax bases toward consumption and market-based activity, raising questions such as the sales tax treatment of digital services. For governments, AI relaxes long-standing informational and administrative constraints in taxation, enforcement, budgeting, and service delivery, while strengthening scale economies. Cost reductions depend critically on labor-intensive sectors like K-12 education. However, government AI use may advantage larger jurisdictions with greater data access, raising equity and transparency concerns and increasing the value of interstate cooperation to harness scale economies from more data. Overall, AI reinforces—rather than overturns—the classic trade-offs emphasized in the fiscal federalism literature. |
| Keywords: | artificial intelligence, state and local public finance, digital services, economies of scale, federalism |
| JEL: | C55 H71 H72 H77 J45 |
| Date: | 2026 |
| URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12716 |
| By: | Yiqing Wang; Dehao Dai; Ding Ma; Kerui Geng |
| Abstract: | We test whether large language models (LLMs) add value in commodity portfolio construction when the information set and implementation rules are held fixed across strategies. A Hawkish Agent (inflation-tightening prior), a Dovish Agent (growth-easing prior), a Debate Agent, and a deterministic z-score Rule Agent each receive identical FRED macro z-scores and route their tilt signals through the same portfolio engine. Across 124 weekly rebalancing dates spanning the 2023 U.S. rate peak and the 2024-2025 soft landing, all three LLM strategies outperform the Rule Agent in Sharpe terms; the Hawkish and Debate Agents record the largest gains (\Delta Sharpe = +0.044 and +0.040, both p |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2606.08283 |
| By: | Junyi Yao; Zihao Zheng |
| Abstract: | Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution timing, turnover treatment, and transaction-cost modeling. This article presents a targeted topical review and reproducibility audit of execution realism in LLM-based trading research. A coded evidence matrix covering 30 trade-relevant primary studies is used to assess point-in-time controls, split transparency, held-out evaluation, cost and turnover treatment, execution semantics, universe definition, and artifact release. Across the audited sample, architecture reporting is generally clearer than the evaluation assumptions needed to judge whether a trading result is economically interpretable or reproducible. A 10-equity worked example is included only as a methodological scaffold to illustrate how explicit friction and timing choices can materially compress active-strategy results. The main conclusion is that the next useful step for LLM trading research is not only better agent design, but also clearer reporting standards for execution realism, reproducibility, and evaluation comparability. |
| Date: | 2026–06 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2606.08285 |
| By: | Ruizhe Zhou; Xiaoyang Liu; Gaoyuan Du; Yi Zheng; Shouxi Ren; Deepayan Chakrabarti; Dengdu Jiang |
| Abstract: | Deploying machine learning in regulated financial environments -- credit risk, fraud detection, and anti-money laundering -- exposes critical vulnerabilities in algorithmic reproducibility. While early financial ML addressed statistical challenges such as backtest overfitting, deep neural networks and Generative AI have introduced mechanical nondeterminism rooted in hardware and architecture. This survey provides a systems perspective on reproducibility failures across three modalities now dominant in financial AI: tabular models (post-hoc explanation variance), graph networks (stochastic sampling and temporal asynchrony), and LLM-based agentic workflows (batch-dependent divergence and trajectory drift). We supplement the literature analysis with first-party experiments on public financial datasets -- quantifying explanation rank instability in credit scoring, prediction flip rates in GNN-based fraud detection, and tensor-parallel-induced output divergence in LLM entity extraction. We propose a layered evaluation framework linking modality-specific metrics (RBO, D_cos, TDI, PSD) to audit readiness, and empirically validate the complementarity of logit-level and semantic-level determinism measures. |
| Date: | 2026–05 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2605.23955 |