nep-big 2026-04-27 papers

on Big Data

Issue of 2026–04–27
twenty-one papers chosen by
Tom Coupé, University of Canterbury

Returns to Education in the United States: A Comparison of OLS and Double Machine Learning Methods By Helal, Al Mansor; Hiraki, Ryotaro; Patrinos, Harry
Forecasting Forced Displacement Flows Using Machine Learning with Text Data By Ramón Talvi Robledo; Christopher Rauh; Ben Seimon; Hannes Mueller; Laura Mayoral
Cross-Stock Predictability via LLM-Augmented Semantic Networks By Yikuan Huang; Zheqi Fan; Kaiqi Hu; Yifan Ye
Machine Spirits: Speculation and Adaptation of LLM Agents in Asset Markets By Maxime Saxena; Marco Pangallo; Fabio Caccioli; R. Maria del Rio-Chanona
Estimating Demand Shocks from Foot Traffic: A Big-Data Approach By Marina Azzimonti; David Wiczer; Yang Xuan
Estimating Government Worker Skills By Kevin Michael Frick; Jonas Gathen
Ideological Bias in LLMs' Economic Causal Reasoning By Donggyu Lee; Hyeok Yun; Jungwon Kim; Junsik Min; Sungwon Park; Sangyoon Park; Jihee Kim
SynPop-DE: Synthetic population of 40 million German households using generative neural networks By Napiontek, Jakob; Pichler, Peter-Paul
Understanding the Mechanism of Altruism in Large Language Models By Shuhuai Zhang; Shu Wang; Zijun Yao; Chuanhao Li; Xiaozhi Wang; Songfa Zhong; Tracy Xiao Liu
Watching Trade from Space: Nowcasting and Spatial Extrapolation of Port-Level Maritime Trade Using Satellite Imagery By Yonggeun Jung
Strategic Reasoning and Sensitivity to Stakes in the Dictator and Ultimatum Games: LLMs vs. Human Proposers By Polachek, Solomon; Romano, Kenneth; Tonguc, Ozlem
Monitoring global trade by products, using Big Data By Graham Pilgrim; Yann Dorville; Annabelle Mourougane
LLM-assisted proposal writing in competitive R&D funding: Evidence from Horizon Europe By Santoleri Pietro; Rentocchini Francesco; Lelli Francesco
Diverging signals from economic uncertainty measures: Uncovering coherence through news narratives By Andrés Azqueta-Gavaldón; Marina Diakonova; Corinna Ghirelli; Javier J. Pérez
Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure By Nattavudh Powdthavee
The Inefficient Pricing of News By Antoine Didisheim; Bryan T. Kelly; Mohammad Pourmohammadi; Hanqing Tian
Dissecting AI Trading: Behavioral Finance and Market Bubbles By Shumiao Ouyang; Pengfei Sui
ChatGPT as a Time Capsule: The Limits of Price Discovery By Sebastian Lehner; Alejandro Lopez-Lira
Behavioral Transfer in AI Agents: Evidence and Privacy Implications By Shilei Luo; Zhiqi Zhang; Hengchen Dai; Dennis Zhang
Information Aggregation with AI Agents By Spyros Galanis
Information, Social Media and International Trade Theory and Evidence Using Twenty Million Online Postings By George Cui; Kailin Gao

Returns to Education in the United States: A Comparison of OLS and Double Machine Learning Methods

By:	Helal, Al Mansor (University of Arkansas); Hiraki, Ryotaro (University of Arkansas); Patrinos, Harry (University of Arkansas, Fayetteville)
Abstract:	This study examines the economic returns to education in the U.S. using 2024 CPS data and compares Ordinary Least Squares (OLS) regression with a Double Machine Learning (DML) framework incorporating models such as random forests, boosted trees, lasso, GAMs, and neural networks (MLP). Results show consistent returns of 8 to 9 percent per additional year of schooling across methods. Simulations reveal that all predictors perform well under linear assumptions if hyperparameters are optimally adjusted, while OLS/Lasso suffer from nonlinearity. Findings suggest that OLS remains robust in low-dimensional, near-linear contexts, offering practical guidance for economists and policymakers balancing model complexity and interpretability in education research.
Keywords:	returns to education, machine learning
JEL:	I20 J31 J24 D62 O15
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18523

Forecasting Forced Displacement Flows Using Machine Learning with Text Data

By:	Ramón Talvi Robledo; Christopher Rauh; Ben Seimon; Hannes Mueller; Laura Mayoral
Abstract:	Forced displacement is an important policy challenge, yet forecasting is hindered by sparse, annually observed flow data and reporting delays. This article proposes a forecasting method for country outflows and dyadic flows tailored to this sparse data setting. We combine slow-moving structural predictors with high-frequency text-based signals, compress high-dimensional news into low-dimensional topic representations via Latent Dirichlet Allocation to mitigate overfitting, and estimate a stacked ensemble of gradient-boosted trees that captures non-linear origin–destination interactions while making optimal use of the available data. We further apply conformal prediction to construct statistically valid prediction intervals for bilateral flows. Analyzing the text component yields that destination-specific search intensity of migration terms is a central predictor of subsequent dyadic displacement flows.
Keywords:	conformal prediction, dyadic, early warning, forced displacement, forecasting, Google trends, machine learning
JEL:	P16 C53 D72
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:bge:wpaper:1573

Cross-Stock Predictability via LLM-Augmented Semantic Networks

By:	Yikuan Huang; Zheqi Fan; Kaiqi Hu; Yifan Ye
Abstract:	Text-based financial networks are increasingly used to study cross-stock return predictability. A common approach constructs links from similarities in firms' disclosure embeddings, but such networks often contain spurious edges because textual proximity does not necessarily imply economic connection. We propose a two-stage framework that first builds a sparse candidate graph from 10-K embeddings and then uses a large language model to classify and filter candidate edges according to their economic relations. The refined graph is used to aggregate pair-level mean-reversion signals into stock-level trading signals with relation-aware and distance-based weights. In a backtest on S&P 500 constituents from 2011 to 2019, LLM-based edge filtering improves the long-short Sharpe ratio from 0.742 to 0.820 and reduces maximum drawdown from $-$10.47% to $-$7.85%. These results suggest that LLM-based reasoning can improve the economic fidelity of text-derived financial networks and strengthen cross-stock predictability.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.19476

Machine Spirits: Speculation and Adaptation of LLM Agents in Asset Markets

By:	Maxime Saxena; Marco Pangallo; Fabio Caccioli; R. Maria del Rio-Chanona
Abstract:	As Large Language Models (LLMs) become increasingly integrated into financial systems, understanding their behavioural properties is crucial. Do LLMs conform to the rational expectations paradigm, do they exhibit human-like "animal spirits", or do they instead manifest distinct "machine spirits"? We investigate these questions with a simulated financial market, exploring the behaviour of 15 LLMs spanning a range of sizes, capabilities, and providers. Our results show that LLMs exhibit a spectrum of economic behaviours, from stable coordination on the fundamental value to human-like speculative bubbles. These behaviours are generally inconsistent with the rational expectations hypothesis. We also consider an ecology of heterogeneous agents, a more realistic setting compared to markets with identical LLM agents. These mixed markets can produce outcomes which vary substantially across repeated simulations. Even the most advanced models fail to consistently stabilise the market, with price bubbles sometimes forming despite only a minority of agents naturally forming bubbles. Instead, advanced models in mixed markets adapt their forecasting strategies to the behaviour of other agents. This adaptation can allow them to successfully exploit less sophisticated counterparts and achieve higher profits, but can also contribute to increased market volatility. These findings suggest that the introduction of AI agents into financial markets fundamentally reshapes their ecology. In particular, heterogeneous populations of LLMs can generate endogenous instability, while individual-level adaptation may amplify, rather than mitigate, market volatility.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.18602

Estimating Demand Shocks from Foot Traffic: A Big-Data Approach

By:	Marina Azzimonti; David Wiczer; Yang Xuan
Abstract:	This study leverages high-frequency foot-traffic data from SafeGraph to estimate demand shocks in customer-facing establishments across New York City’s retail, service, and health sectors. Recognizing that variations in foot traffic can arise from both unpredictable demand shocks and firm-driven strategies to attract customers, we present a theoretical framework that isolates establishment-level demand fluctuations from firm-level strategic choices. Implementing this empirically, we employ an unsupervised machine learning approach to classify establishments into distinct categories that are largely orthogonal to location and sector. We find important heterogeneity in the persistence of shocks, important heterogeneity in their trends, and that estimation on a pooled sample importantly understates the variance experienced by some establishments.
Keywords:	Consumer-facing; brands; service; retail trade; health; demand dynamics; demand shocks; Foot Traffic
JEL:	E21 L14 L80
Date:	2026–04–01
URL:	https://d.repec.org/n?u=RePEc:fip:fednsr:103026

Estimating Government Worker Skills

By:	Kevin Michael Frick; Jonas Gathen
Abstract:	We propose a new approach to estimate government worker skills, a setting where output is hard to observe and wages may be uninformative about skills. The approach uses wages in comparable jobs in the private sector and machine learning tools to link skills to skill-related observables. We apply the approach to rich Indonesian household-level panel data from 1988-2014, showing two main applications. First, government skills have continuously declined relative to the private sector, driven by the most skilled workers ending up in the private sector. Second, the Indonesian government pays a wage premium of 43% conditional on skills.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.15819

Ideological Bias in LLMs' Economic Causal Reasoning

By:	Donggyu Lee; Hyeok Yun; Jungwon Kim; Junsik Min; Sungwon Park; Sangyoon Park; Jihee Kim
Abstract:	Do large language models (LLMs) exhibit systematic ideological bias when reasoning about economic causal effects? As LLMs are increasingly used in policy analysis and economic reporting, where directionally correct causal judgments are essential, this question has direct practical stakes. We present a systematic evaluation by extending the EconCausal benchmark with ideology-contested cases - instances where intervention-oriented (pro-government) and market-oriented (pro-market) perspectives predict divergent causal signs. From 10, 490 causal triplets (treatment-outcome pairs with empirically verified effect directions) derived from top-tier economics and finance journals, we identify 1, 056 ideology-contested instances and evaluate 20 state-of-the-art LLMs on their ability to predict empirically supported causal directions. We find that ideology-contested items are consistently harder than non-contested ones, and that across 18 of 20 models, accuracy is systematically higher when the empirically verified causal sign aligns with intervention-oriented expectations than with market-oriented ones. Moreover, when models err, their incorrect predictions disproportionately lean intervention-oriented, and this directional skew is not eliminated by one-shot in-context prompting. These results highlight that LLMs are not only less accurate on ideologically contested economic questions, but systematically less reliable in one ideological direction than the other, underscoring the need for direction-aware evaluation in high-stakes economic and policy settings.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.21334

SynPop-DE: Synthetic population of 40 million German households using generative neural networks

By:	Napiontek, Jakob (Potsdam Institute for Climate Impact Research (PIK)); Pichler, Peter-Paul
Abstract:	Household microdata combining socio-demographic, housing, income and expenditure attributes are a core resource for many studies in quantitative social science, such as modelling the household-level impacts of the energy transition. Yet no such data are openly available for Germany's full population. SynPop-DE provides a synthetic population of 40, 235, 916 households and their 82, 039, 613 members in all 400 German districts, calibrated to the 2022 census, with 34 attributes per household. Synthetic households are generated by estimating the joint attribute distribution of the German Household Budget Survey through a two-stage machine learning architecture. While an autoencoder first compresses high-dimensional categorical data into a continuous latent space, a generative adversarial network subsequently learns to sample new records from this representation. These records are then aligned with census marginals for all German districts using iterative proportional updating to ensure spatial representativeness. Validation along three dimensions confirms that the model learns attribute relationships and generates synthetic households that reproduce the statistical properties of the survey data (fidelity), supports downstream analyses with accuracy comparable to the original survey (utility), and prevents disclosure of individual respondents (privacy). The dataset is openly available at https://synpop.de.
Date:	2026–04–12
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:zha8v_v1

Understanding the Mechanism of Altruism in Large Language Models

By:	Shuhuai Zhang; Shu Wang; Zijun Yao; Chuanhao Li; Xiaozhi Wang; Songfa Zhong; Tracy Xiao Liu
Abstract:	Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meaningful shifts in allocations. Leveraging this contrast, we identify a set of SAE features (0.024% of all features across the model's layers) whose activations are strongly associated with the behavioral shift. To interpret these features, we use benchmark tasks motivated by dual-process theories to classify a subset as primarily heuristic (System 1) or primarily deliberative (System 2). Causal interventions validate their functional role: activation patching and continuous steering of this feature direction reliably shift allocation distributions, with System 2 features exerting a more proximal influence on the model's final output than System 1 features. The same steering direction generalizes across multiple social-preference games. Together, these results enhance our understanding of artificial cognition by translating altruistic behaviors into identifiable network states and provide a framework for aligning LLM behavior with human values, thereby informing more transparent and value-aligned deployment.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.19260

Watching Trade from Space: Nowcasting and Spatial Extrapolation of Port-Level Maritime Trade Using Satellite Imagery

By:	Yonggeun Jung
Abstract:	Satellite data are increasingly used to measure economic activity, yet port-level trade remains largely unmeasured from space. This paper combines synthetic aperture radar imagery, nighttime lights, and port characteristics to measure monthly port-level maritime trade using only publicly available data. The model achieves strong out-of-sample accuracy for U.S. ports, with satellite signals and port attributes playing complementary roles. While absolute levels are difficult to extrapolate beyond the training domain, percentage changes are reliably recovered, as we confirm through a leave-one-region-out exercise and Monte Carlo simulation. Applying the framework to Russian ports after the 2022 sanctions, we detect shifts consistent with trade reorientation toward the Far East. The approach complements AIS-based methods by remaining robust to strategic signal manipulation.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.15444

Strategic Reasoning and Sensitivity to Stakes in the Dictator and Ultimatum Games: LLMs vs. Human Proposers

By:	Polachek, Solomon (Binghamton University, New York); Romano, Kenneth (State University of New York at Binghamton (Binghamton University)); Tonguc, Ozlem (State University of New York at Binghamton (Binghamton University))
Abstract:	This study examines how large language models (LLMs) respond to varying stake sizes in the Dictator and Ultimatum games using the high-stakes design introduced by Andersen et al. (2011). We test ten leading LLMs chosen for their accessibility, prominence, and differences in reasoning capabilities. Results reveal substantial variation across models: Only 5 of 10 models exhibit strategic behavior by offering more in the Ultimatum Game (UG) than in the Dictator Game (DG). Relative to humans, 4 models are consistently more generous, 2 consistently less, and 4 vary with stake size. Only 1 model shows a monotonic decline in UG offers as stakes increase; the remaining 9 are non-monotonic or stable. Unlike humans, most models reduce UG offers when endowed with wealth. Prompting for "human-likeâ€ decisions generally increases generosity in the UG. These findings are important for evaluating whether LLMs can serve as realistic proxies for human subjects in behavioral experiments and highlight key limitations and future directions for model development.
Keywords:	ultimatum game, dictator game, fairness, payoff stakes, artificial intelligence
JEL:	D01 C72 C90
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp18545

Monitoring global trade by products, using Big Data

By:	Graham Pilgrim; Yann Dorville; Annabelle Mourougane
Abstract:	This paper develops a novel methodology to derive timely, experimental estimates of trade by commodity with global coverage using messages from the Automatic Identification System (AIS). By transforming high-frequency vessel movements into trade proxies, the approach makes it possible to monitor global cross-border flows in near real time for 23 commodity groups worldwide, covering 97.8% of existing berths across 3534 ports. The methodology improves upon Pilgrim et al. (2024) by exploiting information at the berth level, which increases the accuracy of port delineation and allows, with the use of satellite imagery and a rule-based approach, to get a mapping of commodities. While the resulting trade estimates are experimental and not designed to replace official trade statistics and are surrounded by uncertainties, especially regarding containerised trade, they provide valuable and complementary information on trade dynamics, particularly in periods of heightened uncertainty or rapid change. Their main strength lies in their ability to capture turning points, disruptions and emerging trends well ahead of traditional data releases. The methodology also allows to derive timely estimates of transit trade.
Keywords:	Big data, Maritime trade, Port activity, Port congestion
JEL:	C55 C81 F17
Date:	2026–05–06
URL:	https://d.repec.org/n?u=RePEc:oec:stdaaa:2026/02-en

LLM-assisted proposal writing in competitive R&D funding: Evidence from Horizon Europe

By:	Santoleri Pietro (European Commission - JRC); Rentocchini Francesco (European Commission - JRC); Lelli Francesco
Abstract:	Large language models (LLMs) can lower the cost of producing complex text, potentially reshaping competition for research and development (R&D) funding to private firms. We provide the first evidence on this issue using data covering the universe of firm applications to a major competitive R\&D funding program: Horizon Europe. We find that LLM-assisted writing rises sharply following the public release of ChatGPT in late 2022, with around 40% of proposal abstracts exhibiting LLM-modified content by the end of 2024. Adoption is heterogeneous across applicants and is more common among younger, and less innovative firms, as well as among firms located in countries with lower levels of English proficiency, economic development and R&D intensity. In cross-sectional analyses, proposals that rely extensively on LLM-generated text are associated with lower evaluation scores and funding probabilities, whereas partial LLM assistance is only weakly related to such outcomes. However, analyses exploiting repeated submissions of the same proposals do not indicate that adopting LLM-assisted writing causally worsens evaluation results. Overall, the findings suggest that generative AI may reduce barriers to participation in competitive funding without clear evidence that LLM-assisted writing itself alters evaluation decisions.
Date:	2026–03
URL:	https://d.repec.org/n?u=RePEc:ipt:termod:202602

Diverging signals from economic uncertainty measures: Uncovering coherence through news narratives

By:	Andrés Azqueta-Gavaldón (Banco de España); Marina Diakonova (Banco de España); Corinna Ghirelli (Banco de España); Javier J. Pérez (Banco de España)
Abstract:	The proliferation of economic uncertainty indicators —ranging from text-based indices like the Economic Policy Uncertainty (EPU) index to market-based measures such as the VIX and the ECB’s Country-Level Index of Financial Stress (CLIFS)— has enriched the analytical toolkit of economists and policymakers. Yet these indicators often diverge, sending conflicting signals about the state of uncertainty in the economy. This paper argues that such divergence is not a flaw but a feature: each indicator captures a distinct dimension of uncertainty. Using topic modeling techniques applied to national news corpora, we construct a taxonomy of uncertainty narratives across five European countries and classify episodes of divergence between the EPU and CLIFS indicators. Our findings reveal systematic patterns: EPU peaks are predominantly driven by political and institutional developments, CLIFS peaks by financial market stress and joint peaks by systemic crises. These results underscore the multidimensional nature of uncertainty and highlight the need for structured interpretative frameworks. By linking narrative content to indicator behavior, our approach offers a novel lens for understanding uncertainty dynamics and provides practical tools for researchers and policymakers navigating an increasingly complex informational environment.
Keywords:	economic policy uncertainty, financial uncertainty, natural language processing, open access data
JEL:	D8 C43 C55 E32
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:bde:wpaper:2614e

Large Language Models Outperform Humans in Fraud Detection and Resistance to Motivated Investor Pressure

By:	Nattavudh Powdthavee
Abstract:	Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3, 360 AI advisory conversations with a 1, 201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1, 000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.20652

The Inefficient Pricing of News

By:	Antoine Didisheim; Bryan T. Kelly; Mohammad Pourmohammadi; Hanqing Tian
Abstract:	The stock market fails to efficiently process information in news text (Chen et al., 2026). But news itself is highly predictable by prevailing stock characteristics, which complicates inferences about market efficiency. After purging news of its predictable content, the resulting “news shocks” more than double the monthly return predictive power of raw news, and they continue to significantly predict returns up to 18 months ahead. The magnitude and longevity of the news shock anomaly is larger than every anomaly in the Jensen et al. (2022) universe. The news shock anomaly derives from negative-tone and quantitative topics to which investors underreact and from high-attention and ambiguous topics to which investors overreact.
JEL:	C45 C58 G02 G1 G11 G12 G14 G17 G40 G41
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:35093

Dissecting AI Trading: Behavioral Finance and Market Bubbles

By:	Shumiao Ouyang; Pengfei Sui
Abstract:	We study how AI agents form expectations and trade in experimental asset markets. Using a simulated open-call auction populated by autonomous Large Language Model (LLM) agents, we document three main findings. First, AI agents exhibit classic behavioral patterns: a pronounced disposition effect and recency-weighted extrapolative beliefs. Second, these individual-level patterns aggregate into equilibrium dynamics that replicate classic experimental findings (Smith et al., 1988), including the predictive power of excess demand for future prices and the positive relationship between disagreement and trading volume. Third, by analyzing the agents' reasoning text through a twenty-mechanism scoring framework, we show that targeted prompt interventions causally amplify or suppress specific behavioral mechanisms, significantly altering the magnitude of market bubbles.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.18373

ChatGPT as a Time Capsule: The Limits of Price Discovery

By:	Sebastian Lehner; Alejandro Lopez-Lira
Abstract:	Frozen large language model (LLM) checkpoints extract information from pre-cutoff public text that is associated with future fundamentals and equity returns beyond standard contemporaneous valuation measures. Because each frozen checkpoint has a fixed knowledge cutoff, it can be interpreted as a compressed representation of publicly available textual information at a given point in time. We treat twelve OpenAI snapshots spanning 2021-2025 as time-stamped summaries of the public textual record and extract a sector-neutral LLM outlook score for roughly 7, 000 U.S. equities per cross-section. The outlook score is positively associated with analyst revisions, target-price changes and one-month cross-sectional returns in both Fama-MacBeth regressions and pooled panels with model fixed effects (t = 6.02), after direct controls for market-implied valuation and standard factors. Predictability broadly increases with the return horizon, despite a non-monotonic intermediate dip, and, in the pooled panel, is stronger for firms with high analyst coverage, consistent with the view that the bottleneck is not investor inattention but the cost of aggregating dispersed qualitative information across many documents.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.21433

Behavioral Transfer in AI Agents: Evidence and Privacy Implications

By:	Shilei Luo; Zhiqi Zhang; Hengchen Dai; Dennis Zhang
Abstract:	AI agents powered by large language models are increasingly acting on behalf of humans in social and economic environments. Prior research has focused on their task performance and effects on human outcomes, but less is known about the relationship between agents and the specific individuals who deploy them. We ask whether agents systematically reflect the behavioral characteristics of their human owners, functioning as behavioral extensions rather than producing generic outputs. We study this question using 10, 659 matched human-agent pairs from Moltbook, a social media platform where each autonomous agent is publicly linked to its owner's Twitter/X account. By comparing agents' posts on Moltbook with their owners' Twitter/X activity across features spanning topics, values, affect, and linguistic style, we find systematic transfer between agents and their specific owners. This transfer persists among agents without explicit configuration, and pairs that align on one behavioral dimension tend to align on others. These patterns are consistent with transfer emerging through accumulated interaction between owners (or owners' computer environments) and their agents in everyday use. We further show that agents with stronger behavioral transfer are more likely to disclose owner-related personal information in public discourse, suggesting that the same owner-specific context that drives behavioral transfer may also create privacy risk during ordinary use. Taken together, our results indicate that AI agents do not simply generate content, but reflect owner-related context in ways that can propagate human behavioral heterogeneity into digital environments, with implications for privacy, platform design, and the governance of agentic systems.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.19925

Information Aggregation with AI Agents

By:	Spyros Galanis
Abstract:	Can Large Language Models (AI agents) aggregate dispersed private information through trading and reason about the knowledge of others by observing price movements? We conduct a controlled experiment where AI agents trade in a prediction market after receiving private signals, measuring information aggregation by the log error of the last price. We find that although the median market is effective at aggregating information in the easy information structures, increasing the complexity has a significant and negative impact, suggesting that AI agents may suffer from the same limitations as humans when reasoning about others. Consistent with our theoretical predictions, information aggregation remains unaffected by allowing cheap talk communication, changing the duration of the market or initial price, and strategic prompting-thus demonstrating that prediction markets are robust. We establish that "smarter" AI agents perform better at aggregation and they are more profitable. Surprisingly, giving them feedback about past performance makes them worse at aggregation and reduces their profits.
Date:	2026–04
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2604.20050

Information, Social Media and International Trade Theory and Evidence Using Twenty Million Online Postings

By:	George Cui; Kailin Gao
Abstract:	We employ novel data and theoretical frameworks to investigate how a social media platform facilitates information exchange among firms. Our analysis is based on an extensive dataset comprising over 20 million firm-to-firm online interactions on a prominent social platform where participants share information about international trade. We document four empirical patterns. First, we find that firms’ exports grow significantly after the firm begins using the social media platform. Second, firms located geographically closer exchange more information. Third, firms in sectors that have stronger production network relationships interact more on the platform. Finally, firms in more developed regions are more likely to adopt the social media platform. Motivated by these empirical patterns, we develop a quantitative general equilibrium trade model with information frictions and endogenous learning and information sharing.
Keywords:	Information; Global Value Chains; Online Platforms; social media platform; IMF working papers; information sharing; novel data; information friction; Social networks; Exports; Stocks; Trade balance; Global; Asia and Pacific
Date:	2026–04–03
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2026/064

This nep-big issue is ©2026 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.