nep-big 2025-08-18 papers

on Big Data

Issue of 2025–08–18
29 papers chosen by
Tom Coupé, University of Canterbury

Deep Learning Models for Financial Data Analysis: A Focused Review of Recent Advances By Duane, Jackson; Ren, Alicia; Zhang, Wei
FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification By Baptiste Lefort; Eric Benhamou; Beatrice Guez; Jean-Jacques Ohana; Ethan Setrouk; Alban Etienne
Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis By Md Talha Mohsin
Machine learning regionalisation of input data for microsimulation models: An application of a hybrid GBM / IPF method to build a tax-benefit model for the Essex region in the UK By Richiardi, Matteo; Rejoice, Frimpong
Machine Learning based Enterprise Financial Audit Framework and High Risk Identification By Tingyu Yuan; Xi Zhang; Xuanjing Chen
MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading By Siyi Wu; Zhaoyang Guan; Leyi Zhao; Xinyuan Song; Xinyu Ying; Hanlin Zhang; Michele Pak; Yangfan He; Yi Xin; Jianhui Wang; Tianyu Shi
Financial inclusion and large language models By Ozili, Peterson K; Obiora, Kingsley I; Onuzo, Chinwendu
Public Communication and Collusion: New Screening Tools for Competition Authorities By Tomaso Duso; Joseph E., Jr. Harrington; Carl Kreuzberg; Geza Sapi
In the Shadow of War: Assessing Conflict-Driven Disruptions in the Kyrgyzstan-Russia Labor Pipeline via a Gradient Boosting Approach to Nowcasting By Schultze, Michelle
Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors By Jiayi Guo; Zhiyu Quan; Linfeng Zhang
How AI Detects Financial Fraud: A Review of Emerging Deep Learning Methods By Mori, Misato
An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI By Francis Boabang; Samuel Asante Gyamerah
How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index By Golo Henseke; Rhys Davies; Alan Felstead; Duncan Gallie; Francis Green; Ying Zhou
Valuing Time in Silicon: Can Large Language Model Replicate Human Value of Travel Time By Yingnan Yan; Tianming Liu; Yafeng Yin
Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining By Junjie Zhao; Chengxi Zhang; Chenkai Wang; Peng Yang
A New and Efficient Debiased Estimation of General Treatment Models by Balanced Neural Networks Weighting By Zeqi Wu; Meilin Wang; Wei Huang; Zheng Zhang
Defining Current and Expected Financial Constraints Using AI: Reinterpreting the Cash Flow Sensitivity of Cash By Rachel Cho; Christoph Görtz; Danny McGowan; Max Schröder
Policy Learning under Unobserved Confounding: A Robust and Efficient Approach By Zequn Jin; Gaoqian Xu; Xi Zheng; Yahong Zhou
Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals By Dhanashekar Kandaswamy; Ashutosh Sahoo; Akshay SP; Gurukiran S; Parag Paul; Girish G N
The ordinary meaning bot: Simulating human surveys with LLMs By Johannes Kruse
What Matters Most? A Quantitative Meta-Analysis of AI-Based Predictors for Startup Success By Seyed Mohammad Ali Jafari; Ali Mobini Dehkordi; Ehsan Chitsaz; Yadollah Yaghoobzadeh
Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities By Georges Sfeir; Gabriel Nova; Stephane Hess; Sander van Cranenburgh
FinSurvival: A Suite of Large Scale Survival Modeling Tasks from Finance By Aaron Green; Zihan Nie; Hanzhen Qin; Oshani Seneviratne; Kristin P. Bennett
Can Air Pollution Affect Our Sentiments: Social Media Evidence from Japan By Zehao Lin; Ying Liu; Congrong Pan; Lutz Sager
Your AI, Not Your View: The Bias of LLMs in Investment Analysis By Hoyoung Lee; Junhyuk Seo; Suhwan Park; Junhyeong Lee; Wonbin Ahn; Chanyeol Choi; Alejandro Lopez-Lira; Yongjae Lee
Affective Signals and Issue Salience in Swiss Reddit Discourse: Insights on Public Reaction to Government Measures During COVID-19 and the Ukraine Crisis By König, Leonard Maximilian
Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach By Wei Lu; Daniel L. Chen; Christian B. Hansen
Uncovering Economic Policy Uncertainty During Conflict By Brochet, S.; Mueller, H.; Rauh, C.
Human Realignment: An Empirical Study of LLMs as Legal Decision-Aids in Moral Dilemmas By Christoph Engel; Yoan Hermstrüwer; Alison Kim

Deep Learning Models for Financial Data Analysis: A Focused Review of Recent Advances

By:	Duane, Jackson; Ren, Alicia; Zhang, Wei
Abstract:	This paper presents a focused review of recent academic advances in the application of deep learning techniques to algorithmic trading. While traditional machine learning models have long been used in financial forecasting, the last decade has seen a rapid expansion in the use of deep learning architectures due to their ability to model non-linear dependencies, learn hierarchical features, and process high-dimensional sequential data. We categorize and synthesize developments across three primary paradigms: supervised deep learning models for price prediction and signal generation, unsupervised and generative approaches for feature extraction and data augmentation, and reinforcement learning agents for decision-making in trading environments. By analyzing over 30 recent peer-reviewed studies, we highlight how modern models such as attention-based networks, graph neural networks, and deep Q-learning have enhanced the robustness and adaptability of trading algorithms. We also discuss key limitations—including overfitting, data non-stationarity, and lack of interpretability—and summarize efforts to address them. This review serves as a resource for researchers seeking a clear, academically grounded perspective on how deep learning is currently reshaping algorithmic trading systems.
Date:	2025–07–23
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:ctxf9_v1

FinMarBa: A Market-Informed Dataset for Financial Sentiment Classification

By:	Baptiste Lefort; Eric Benhamou; Beatrice Guez; Jean-Jacques Ohana; Ethan Setrouk; Alban Etienne
Abstract:	This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22932

Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis

By:	Md Talha Mohsin
Abstract:	Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide variety of Financial Natural Language Processing (FinNLP) tasks. However, systematic comparisons among widely used LLMs remain underexplored. Given the rapid advancement and growing influence of LLMs in financial analysis, this study conducts a thorough comparative evaluation of five leading LLMs, GPT, Claude, Perplexity, Gemini and DeepSeek, using 10-K filings from the 'Magnificent Seven' technology companies. We create a set of domain-specific prompts and then use three methodologies to evaluate model performance: human annotation, automated lexical-semantic metrics (ROUGE, Cosine Similarity, Jaccard), and model behavior diagnostics (prompt-level variance and across-model similarity). The results show that GPT gives the most coherent, semantically aligned, and contextually relevant answers; followed by Claude and Perplexity. Gemini and DeepSeek, on the other hand, have more variability and less agreement. Also, the similarity and stability of outputs change from company to company and over time, showing that they are sensitive to how prompts are written and what source material is used.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22936

Machine learning regionalisation of input data for microsimulation models: An application of a hybrid GBM / IPF method to build a tax-benefit model for the Essex region in the UK

By:	Richiardi, Matteo; Rejoice, Frimpong
Abstract:	Development of microsimulation models often requires reweighting some input dataset to reflect the characteristics of a different population of interest. In this paper we explore a machine learning approach whereas a variant of decision trees (Gradient Boosted Machine) is used to replicate the joint distribution of target variables observed in a large commercially available but slightly biased dataset, with an additional raking step to remove the bias and ensure consistency of relevant marginal distributions with official statistics. The method is applied to build a regional variant of UKMOD, an open-source static tax-benefit model for the UK belonging to the EUROMOD family, with an application to the Greater Essex region in the UK.
Date:	2025–08–11
URL:	https://d.repec.org/n?u=RePEc:ese:cempwp:cempa9-25

Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

By:	Tingyu Yuan; Xi Zhang; Xuanjing Chen
Abstract:	In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.06266

MountainLion: A Multi-Modal LLM-Based Agent System for Interpretable and Adaptive Financial Trading

By:	Siyi Wu; Zhaoyang Guan; Leyi Zhao; Xinyuan Song; Xinyu Ying; Hanlin Zhang; Michele Pak; Yangfan He; Yi Xin; Jianhui Wang; Tianyu Shi
Abstract:	Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{MountainLion}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20474

Financial inclusion and large language models

By:	Ozili, Peterson K; Obiora, Kingsley I; Onuzo, Chinwendu
Abstract:	Large language models have gained popularity, and it is important to understand their applications in the financial inclusion domain. This study identifies the benefits and risks of using large language models (LLMs) in the financial inclusion domain. We show that LLMs can be used to (i) summarize the key themes in financial inclusion communications, (ii) gain insights from the tone of financial inclusion communications, (iii) bring discipline to financial inclusion communications, (iv) improve financial inclusion decision making, and (v) enhance context-sensitive text analysis and evaluation. However, the use of large language models in the financial inclusion domain poses risks relating to biased interpretations of LLM-generated responses, data privacy risk, misinformation and falsehood risks. We emphasize that LLMs can be used safely in the financial inclusion domain to summarise financial inclusion speeches and communication, but they should not be used in situations where finding the truth is important to make decisions that promote financial inclusion.
Keywords:	financial inclusion, large language models, LLM, algorithm, risk, benefit, communication, speech, artificial intelligence, digital financial inclusion
JEL:	G20 G21 G23
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:pra:mprapa:125562

Public Communication and Collusion: New Screening Tools for Competition Authorities

By:	Tomaso Duso; Joseph E., Jr. Harrington; Carl Kreuzberg; Geza Sapi
Abstract:	Competition authorities increasingly rely on economic screening tools to identify markets where firms deviate from competitive norms. Traditional screening methods assume that collusion occurs through secret agreements. However, recent research highlights that firms can use public announcements to coordinate decisions, reducing competition while avoiding detection. We propose a novel approach to screening for collusion in public corporate statements. Using natural language processing, we analyze more than 300, 000 earnings call transcripts issued worldwide between 2004 and 2022. By identifying expressions commonly associated with collusion, our method provides competition authorities with a tool to detect potentially anticompetitive behavior in public communications. Our approach can extend beyond earnings calls to other sources, such as news articles, trade press, and industry reports. Our method informed the European Commission’s 2024 unannounced inspections in the car tire sector, prompted by concerns over price coordination through public communication.
Keywords:	communication, collusion, NLP, screening, text analysis
JEL:	C23 D22 L1 L4 L64
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_12029

In the Shadow of War: Assessing Conflict-Driven Disruptions in the Kyrgyzstan-Russia Labor Pipeline via a Gradient Boosting Approach to Nowcasting

By:	Schultze, Michelle
Abstract:	Kyrgyzstan serves as a key case study for the broader Central Asia–Russia labor pipeline, which supported an estimated 8 million migrants annually in 2020. Prior to the Russo-Ukraine war, remittances from Russia accounted for approximately 30% of Kyrgyzstan’s GDP, driven by over 10% of its population working in Russia. However, understanding wartime migration dynamics is challenging due to suspected political interference in Russian data, restricted foreign access to this data, and the informality that characterizes Central Asian migration patterns. This study incorporates Yandex Wordstat, Google Trends, XGBoost (which outperforms other machine learning methods), and autoregressive models to "nowcast" missing data. The results reveal a push effect linked to war onset in February 2022 and war intensity. However, all three of the analyzed migration datasets suggest a potential delayed labor substitution effect as Central Asian migrants fill vacancies left by conscripted Russian workers, proxied by casualty data from Mediazona and the BBC. The study also examines remittance trends, which seem to increase along with the labor substitution effect after a two-month lag. These results are robust to Russia- and Kyrgyzstan-side socioeconomic controls such as wage levels and population dynamics. This study provides new insight into the largely opaque Central Asia–Russia labor pipeline, a critical element in development policymaking for both regions. It also introduces a novel methodology for nowcasting migration trends, particularly through Yandex Wordstat, which has been largely overlooked in English-language scholarship.
Date:	2025–07–24
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:z2wch_v1

Entity-Specific Cyber Risk Assessment using InsurTech Empowered Risk Factors

By:	Jiayi Guo; Zhiyu Quan; Linfeng Zhang
Abstract:	The lack of high-quality public cyber incident data limits empirical research and predictive modeling for cyber risk assessment. This challenge persists due to the reluctance of companies to disclose incidents that could damage their reputation or investor confidence. Therefore, from an actuarial perspective, potential resolutions conclude two aspects: the enhancement of existing cyber incident datasets and the implementation of advanced modeling techniques to optimize the use of the available data. A review of existing data-driven methods highlights a significant lack of entity-specific organizational features in publicly available datasets. To address this gap, we propose a novel InsurTech framework that enriches cyber incident data with entity-specific attributes. We develop various machine learning (ML) models: a multilabel classification model to predict the occurrence of cyber incident types (e.g., Privacy Violation, Data Breach, Fraud and Extortion, IT Error, and Others) and a multioutput regression model to estimate their annual frequencies. While classifier and regressor chains are implemented to explore dependencies among cyber incident types as well, no significant correlations are observed in our datasets. Besides, we apply multiple interpretable ML techniques to identify and cross-validate potential risk factors developed by InsurTech across ML models. We find that InsurTech empowered features enhance prediction occurrence and frequency estimation robustness compared to only using conventional risk factors. The framework generates transparent, entity-specific cyber risk profiles, supporting customized underwriting and proactive cyber risk mitigation. It provides insurers and organizations with data-driven insights to support decision-making and compliance planning.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.08193

How AI Detects Financial Fraud: A Review of Emerging Deep Learning Methods

By:	Mori, Misato
Abstract:	Financial fraud generates persistent risk and capital loss across sectors. This study investigates artificial intelligence (AI) methodologies for financial fraud detection, with emphasis on Retrieval-Augmented Generation (RAG). The review covers supervised classification, unsupervised anomaly detection, and graph-based relational modeling using deep neural networks, transformers, and hybrid architectures. Challenges include class imbalance, concept drift, and decision interpretability. We describe the RAG framework integrating retrievers and generative language models with external knowledge bases. Empirical comparisons on synthetic and real-time fraud datasets show improved F1-score, precision, and contextual reasoning in contrast to fine-tuned transformers and static classifiers. Applications include transaction monitoring, policy violation detection, account takeover analysis, and social engineering prevention. Evaluation highlights retrieval-grounded generation as an effective fraud signal augmentation mechanism. The paper concludes with architectural implications for deploying scalable, compliant, and adaptive fraud detection pipelines in multi-domain financial systems.
Date:	2025–07–16
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:5yjm4_v1

An Enhanced Focal Loss Function to Mitigate Class Imbalance in Auto Insurance Fraud Detection with Explainable AI

By:	Francis Boabang; Samuel Asante Gyamerah
Abstract:	In insurance fraud prediction, handling class imbalance remains a critical challenge. This paper presents a novel multistage focal loss function designed to enhance the performance of machine learning models in such imbalanced settings by helping to escape local minima and converge to a good solution. Building upon the foundation of the standard focal loss, our proposed approach introduces a dynamic, multi-stage convex and nonconvex mechanism that progressively adjusts the focus on hard-to-classify samples across training epochs. This strategic refinement facilitates more stable learning and improved discrimination between fraudulent and legitimate cases. Through extensive experimentation on a real-world insurance dataset, our method achieved better performance than the traditional focal loss, as measured by accuracy, precision, F1-score, recall and Area Under the Curve (AUC) metrics on the auto insurance dataset. These results demonstrate the efficacy of the multistage focal loss in boosting model robustness and predictive accuracy in highly skewed classification tasks, offering significant implications for fraud detection systems in the insurance industry. An explainable model is included to interpret the results.
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2508.02283

How Exposed Are UK Jobs to Generative AI? Developing and Applying a Novel Task-Based Index

By:	Golo Henseke; Rhys Davies; Alan Felstead; Duncan Gallie; Francis Green; Ying Zhou
Abstract:	We introduce the Generative AI Susceptibility Index (GAISI), a task-based measure of UK job exposure to large language models (LLMs), such as ChatGPT. GAISI is derived from probabilistic task ratings by LLMs and linked to worker-reported task data from the Skills and Employment Surveys. It reflects the share of job activities where an LLM or LLM-powered system can reduce task completion time by at least 25 per cent beyond existing productivity tools. The index demonstrates high reliability, strong alignment with AI capabilities, and superior predictive power compared to existing exposure measures. By 2023-24, nearly all UK jobs exhibited some exposure, yet only a minority were heavily affected. Aggregate exposure has risen since 2017, primarily due to occupational shifts rather than changes in task profiles. The price premium for AI-exposed tasks declined relative to 2017, measuring approximately 11 per cent lower in 2023-24. Job postings in high-exposure roles also fell by 6.5 per cent following the release of ChatGPT. GAISI offers a robust framework for assessing generative AI's impact on work, providing early evidence that displacement effects may already outweigh productivity gains.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22748

Valuing Time in Silicon: Can Large Language Model Replicate Human Value of Travel Time

By:	Yingnan Yan; Tianming Liu; Yafeng Yin
Abstract:	As a key advancement in artificial intelligence, large language models (LLMs) are set to transform transportation systems. While LLMs offer the potential to simulate human travelers in future mixed-autonomy transportation systems, their behavioral fidelity in complex scenarios remains largely unconfirmed by existing research. This study addresses this gap by conducting a comprehensive analysis of the value of travel time (VOT) of a popular LLM, GPT-4o. We employ a full factorial experimental design to systematically examine the LLM's sensitivity to various transportation contexts, including the choice setting, travel purpose, income, and socio-demographic factors. Our results reveal a high degree of behavioral similarity between the LLM and humans. The LLM exhibits an aggregate VOT similar to that of humans, and demonstrates human-like sensitivity to travel purpose, income, and the time-cost trade-off ratios of the alternatives. Furthermore, the behavioral patterns of LLM are remarkably consistent across varied contexts. However, we also find that the LLM's context sensitivity is less pronounced than that observed in humans. Overall, this study provides a foundational benchmark for the future development of LLMs as proxies for human travelers, demonstrating their value and robustness while highlighting that their blunted contextual sensitivity requires careful consideration.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.22244

Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining

By:	Junjie Zhao; Chengxi Zhang; Chenkai Wang; Peng Yang
Abstract:	Reinforcement learning (RL) has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies. However, existing methods are hampered by the sparse rewards given the underlying Markov Decision Process. This inefficiency limits the exploration of the vast symbolic search space and destabilizes the training process. To address this, Trajectory-level Reward Shaping (TLRS), a novel reward shaping method, is proposed. TLRS provides dense, intermediate rewards by measuring the subsequence-level similarity between partially generated expressions and a set of expert-designed formulas. Furthermore, a reward centering mechanism is introduced to reduce training variance. Extensive experiments on six major Chinese and U.S. stock indices show that TLRS significantly improves the predictive power of mined factors, boosting the Rank Information Coefficient by 9.29% over existing potential-based shaping algorithms. Notably, TLRS achieves a major leap in computational efficiency by reducing its time complexity with respect to the feature dimension from linear to constant, which is a significant improvement over distance-based baselines.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20263

A New and Efficient Debiased Estimation of General Treatment Models by Balanced Neural Networks Weighting

By:	Zeqi Wu; Meilin Wang; Wei Huang; Zheng Zhang
Abstract:	Estimation and inference of treatment effects under unconfounded treatment assignments often suffer from bias and the `curse of dimensionality' due to the nonparametric estimation of nuisance parameters for high-dimensional confounders. Although debiased state-of-the-art methods have been proposed for binary treatments under particular treatment models, they can be unstable for small sample sizes. Moreover, directly extending them to general treatment models can lead to computational complexity. We propose a balanced neural networks weighting method for general treatment models, which leverages deep neural networks to alleviate the curse of dimensionality while retaining optimal covariate balance through calibration, thereby achieving debiased and robust estimation. Our method accommodates a wide range of treatment models, including average, quantile, distributional, and asymmetric least squares treatment effects, for discrete, continuous, and mixed treatments. Under regularity conditions, we show that our estimator achieves rate double robustness and $\sqrt{N}$-asymptotic normality, and its asymptotic variance achieves the semiparametric efficiency bound. We further develop a statistical inference procedure based on weighted bootstrap, which avoids estimating the efficient influence/score functions. Simulation results reveal that the proposed method consistently outperforms existing alternatives, especially when the sample size is small. Applications to the 401(k) dataset and the Mother's Significant Features dataset further illustrate the practical value of the method for estimating both average and quantile treatment effects under binary and continuous treatments, respectively.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.04044

Defining Current and Expected Financial Constraints Using AI: Reinterpreting the Cash Flow Sensitivity of Cash

By:	Rachel Cho; Christoph Görtz; Danny McGowan; Max Schröder
Abstract:	We propose a new approach to identify firm-level financial constraints by applying artificial intelligence to text of 10-K filings by U.S. public firms from 1993 to 2021. Leveraging transformer-based natural language processing, our model captures contextual and semantic nuances often missed by traditional text classification techniques, enabling more accurate detection of financial constraints. A key contribution is to differentiate between constraints that affect firms presently and those anticipated in the future. These two types of constraints are associated with distinctly different financial profiles: while firms expecting future constraints tend to accumulate cash preemptively, currently constrained firms exhibit reduced liquidity and higher leverage. We show that only firms anticipating financial constraints exhibit significant cash flow sensitivity of cash, whereas currently constrained and unconstrained firms do not. This calls for a narrower interpretation of this widely used cash-based constraints measure, as it may conflate distinct firm types – unconstrained and currently constrained – and fail to capture all financially constrained firms. Our findings underscore the critical role of constraint timing in shaping corporate financial behavior.
Keywords:	financial constraints, artificial intelligence, expectations, cash, cash flow, corporate finance behavior
JEL:	G31 G32 D92
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_12054

Policy Learning under Unobserved Confounding: A Robust and Efficient Approach

By:	Zequn Jin; Gaoqian Xu; Xi Zheng; Yahong Zhou
Abstract:	This paper develops a robust and efficient method for policy learning from observational data in the presence of unobserved confounding, complementing existing instrumental variable (IV) based approaches. We employ the marginal sensitivity model (MSM) to relax the commonly used yet restrictive unconfoundedness assumption by introducing a sensitivity parameter that captures the extent of selection bias induced by unobserved confounders. Building on this framework, we consider two distributionally robust welfare criteria, defined as the worst-case welfare and policy improvement functions, evaluated over an uncertainty set of counterfactual distributions characterized by the MSM. Closed-form expressions for both welfare criteria are derived. Leveraging these identification results, we construct doubly robust scores and estimate the robust policies by maximizing the proposed criteria. Our approach accommodates flexible machine learning methods for estimating nuisance components, even when these converge at moderately slow rate. We establish asymptotic regret bounds for the resulting policies, providing a robust guarantee against the most adversarial confounding scenario. The proposed method is evaluated through extensive simulation studies and empirical applications to the JTPA study and Head Start program.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20550

Deep Reputation Scoring in DeFi: zScore-Based Wallet Ranking from Liquidity and Trading Signals

By:	Dhanashekar Kandaswamy; Ashutosh Sahoo; Akshay SP; Gurukiran S; Parag Paul; Girish G N
Abstract:	As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20494

The ordinary meaning bot: Simulating human surveys with LLMs

By:	Johannes Kruse (Max Planck Institute for Research on Collective Goods, Bonn)
Abstract:	This comment shows how large language models (LLMs) can help courts discern the "ordinary meaning" of statutory terms. Instead of relying on expert-heavy corpusâ€‘linguistic techniques (Gries 2025), the author simulates a human survey with GPTâ€‘4o. Demographically realistic AI agents replicate the 2, 835 participants in Tobia's 2020 study on vehicle and yield response distributions with no statistically significant difference from the human data (Kolmogorovâ€“Smirnov p = 0.915). The paper addresses concerns about hallucinations, reproducibility, data leakage, and explainability, and introduces the lockedâ€‘prompt "Ordinary Meaning Bot, " arguing that LLM-based survey simulation is a practical, accurate alternative to dictionaries, intuition, or complex corpus analysis.
Keywords:	ordinary meaning; large language models; prompt engineering; human survey simulation; alignment
JEL:	K1 Z0
Date:	2025–08
URL:	https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_12

What Matters Most? A Quantitative Meta-Analysis of AI-Based Predictors for Startup Success

By:	Seyed Mohammad Ali Jafari; Ali Mobini Dehkordi; Ehsan Chitsaz; Yadollah Yaghoobzadeh
Abstract:	Background: Predicting startup success with machine learning is a rapidly growing field, yet findings on key predictors are often fragmented and context-specific. This makes it difficult to discern robust patterns and highlights a need for a systematic synthesis of the evidence. Methods: This study conducts a quantitative meta-analysis to synthesize the literature on predictor importance in AI-based startup evaluation. We performed a systematic review to identify a final sample of 13 empirical studies that report rankable feature importance. From these papers, we extracted and categorized 58 unique predictors, synthesizing their importance using a Weighted Importance Score (WIS) that balances a feature's average rank with its frequency of appearance. We also conducted a moderator analysis to investigate how predictor importance changes with context (e.g., success definition). Results: Our aggregate analysis reveals that the most consistently powerful predictors are a quartet of foundational attributes: Firm Characteristics (e.g., age, location), Investor Structure (e.g., investor quality), Digital and Social Traction (e.g., online momentum), and Funding History. The moderator analysis further reveals that this hierarchy is highly context-dependent. For instance, predicting near-term funding milestones elevates the importance of the deal's immediate context, while predicting long-term exits prioritizes fundamental firm and investor characteristics. Conclusion: The factors that best predict startup success are not universal but are contingent on the startup's goals, stage, and the data used for evaluation. Our findings point to a potential "convenience bias" in the literature, where predictor importance may be tied to data accessibility. We conclude by underscoring the need for standardized reporting practices to enable more robust, cumulative knowledge building in the field.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.09675

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

By:	Georges Sfeir; Gabriel Nova; Stephane Hess; Sander van Cranenburgh
Abstract:	Large Language Models (LLMs) are widely used to support various workflows across different disciplines, yet their potential in choice modelling remains relatively unexplored. This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models. We implement a systematic experimental framework involving thirteen versions of six leading LLMs (ChatGPT, Claude, DeepSeek, Gemini, Gemma, and Llama) evaluated under five experimental configurations. These configurations vary along three dimensions: modelling goal (suggesting vs. suggesting and estimating MNLs); prompting strategy (Zero-Shot vs. Chain-of-Thoughts); and information availability (full dataset vs. data dictionary only). Each LLM-suggested specification is implemented, estimated, and evaluated based on goodness-of-fit metrics, behavioural plausibility, and model complexity. Findings reveal that proprietary LLMs can generate valid and behaviourally sound utility specifications, particularly when guided by structured prompts. Open-weight models such as Llama and Gemma struggled to produce meaningful specifications. Claude 4 Sonnet consistently produced the best-fitting and most complex models, while GPT models suggested models with robust and stable modelling outcomes. Some LLMs performed better when provided with just data dictionary, suggesting that limiting raw data access may enhance internal reasoning capabilities. Among all LLMs, GPT o3 was uniquely capable of correctly estimating its own specifications by executing self-generated code. Overall, the results demonstrate both the promise and current limitations of LLMs as assistive agents in choice modelling, not only for model specification but also for supporting modelling decision and estimation, and provide practical guidance for integrating these tools into choice modellers' workflows.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.21790

FinSurvival: A Suite of Large Scale Survival Modeling Tasks from Finance

By:	Aaron Green; Zihan Nie; Hanzhen Qin; Oshani Seneviratne; Kristin P. Bennett
Abstract:	Survival modeling predicts the time until an event occurs and is widely used in risk analysis; for example, it's used in medicine to predict the survival of a patient based on censored data. There is a need for large-scale, realistic, and freely available datasets for benchmarking artificial intelligence (AI) survival models. In this paper, we derive a suite of 16 survival modeling tasks from publicly available transaction data generated by lending of cryptocurrencies in Decentralized Finance (DeFi). Each task was constructed using an automated pipeline based on choices of index and outcome events. For example, the model predicts the time from when a user borrows cryptocurrency coins (index event) until their first repayment (outcome event). We formulate a survival benchmark consisting of a suite of 16 survival-time prediction tasks (FinSurvival). We also automatically create 16 corresponding classification problems for each task by thresholding the survival time using the restricted mean survival time. With over 7.5 million records, FinSurvival provides a suite of realistic financial modeling tasks that will spur future AI survival modeling research. Our evaluation indicated that these are challenging tasks that are not well addressed by existing methods. FinSurvival enables the evaluation of AI survival models applicable to traditional finance, industry, medicine, and commerce, which is currently hindered by the lack of large public datasets. Our benchmark demonstrates how AI models could assess opportunities and risks in DeFi. In the future, the FinSurvival benchmark pipeline can be used to create new benchmarks by incorporating more DeFi transactions and protocols as the use of cryptocurrency grows.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.14160

Can Air Pollution Affect Our Sentiments: Social Media Evidence from Japan

By:	Zehao Lin; Ying Liu; Congrong Pan; Lutz Sager
Abstract:	We estimate the effect of air pollution on sentiment using social media data from a panel of Japanese cities. To address concerns about potential endogeneity from unobserved simultaneous determinants of air pollution and sentiment, as well as measurement error, we instrument for air pollution using plausibly exogenous variation in atmospheric wind patterns. We find that a one-standard-deviation increase in fine (PM2.5) and small (PM10) particle concentrations reduces overall sentiment by 0.79% and 1.64% standard deviation respectively, which is composed of a more pronounced increase in negative sentiment and a smaller decrease in positive sentiment. Our unique dataset allows us to separately estimate effects on negative sentiment categories including anger, anxiety, and sadness. Our results suggest sentiment as one candidate mechanism, besides physiological and cognitive pathways, to explain the increasingly evident non-health damages from air pollution exposure on work productivity, road safety, sleep and crime.
Keywords:	air pollution, Twitter, sentiment, Japan
JEL:	I31 Q51 Q53
Date:	2025
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_12030

Your AI, Not Your View: The Bias of LLMs in Investment Analysis

By:	Hoyoung Lee; Junhyuk Seo; Suhwan Park; Junhyeong Lee; Wonbin Ahn; Chanyeol Choi; Alejandro Lopez-Lira; Yongjae Lee
Abstract:	In finance, Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data. These conflicts become particularly problematic when LLMs are deployed in real-world investment services, where misalignment between a model's embedded preferences and those of the financial institution can lead to unreliable recommendations. Yet little research has examined what investment views LLMs actually hold. We propose an experimental framework to investigate such conflicts, offering the first quantitative analysis of confirmation bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract models' latent preferences and measure their persistence. Focusing on sector, size, and momentum, our analysis reveals distinct, model-specific tendencies. In particular, we observe a consistent preference for large-cap stocks and contrarian strategies across most models. These preferences often harden into confirmation bias, with models clinging to initial judgments despite counter-evidence.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20957

Affective Signals and Issue Salience in Swiss Reddit Discourse: Insights on Public Reaction to Government Measures During COVID-19 and the Ukraine Crisis

By:	König, Leonard Maximilian
Abstract:	This study explores the interplay of issue salience and affective signals in Swiss public reactions to government measures on the Reddit forum r/Switzerland during the COVID-19 pandemic and the Russia-Ukraine war. Using an Exploratory Data Analysis approach, this study applied topic modeling (BERTopic) to a large corpus of posts (2019-2022) to identify shifts in online public attention and emotional responses, and transformer-based sentiment and emotion analysis to quantify sentiment and discrete emotions as affectives. The results reveal a Swiss online public that is highly responsive to events, with attention shifting rapidly, and whose discourse is deeply imbued with emotional content, predominantly negative in the face of restrictive policies or unsettling international developments. These insights underscore the value of computational social science in unpacking the complexities of online public opinion and offer a foundation for future research into the evolving nature of digital democracy and crisis governance.
Date:	2025–07–17
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:28exs_v2

Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach

By:	Wei Lu; Daniel L. Chen; Christian B. Hansen
Abstract:	Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles.
Date:	2025–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2507.20796

Uncovering Economic Policy Uncertainty During Conflict

By:	Brochet, S.; Mueller, H.; Rauh, C.
Abstract:	The correct measurement of economic policy uncertainty (EPU) plays a critical role in many policy settings - in particular where economic policy decisions need to be taken in response to large shocks. One such large shock is armed conflict. But, counterintuitively, the standard text-based EPU index systematically declines during armed conflict periods. Using a global news corpus covering 192 countries and over 5 million articles, we show that this decline is driven not by reduced uncertainty, but by a crowding out of reporting on economics and policy. We show that a combination of topic modeling and two-way fixed effects can be used to adjust the measurement of EPU, providing a new view on political risk during armed conflict. After adjustment, the EPU aligns more closely with firm perceptions, political risk insurance and investment during armed conflict.
Keywords:	Economic Policy Uncertainty (EPU), Armed Conflict, Media Crowding-Out, Topic Modeling, Latent Dirichlet Allocation (LDA), Measurement Bias, Text-Based Indices, Macroeconomic Uncertainty
JEL:	C61 C62 G11 G12 D85
Date:	2025–07–25
URL:	https://d.repec.org/n?u=RePEc:cam:camjip:2520

Human Realignment: An Empirical Study of LLMs as Legal Decision-Aids in Moral Dilemmas

By:	Christoph Engel (Max Planck Institute for Research on Collective Goods, Bonn); Yoan Hermstrüwer (University of Zurich); Alison Kim (University of Zurich)
Abstract:	Recent advances in AI create possibilities for delegating legal decision-making to machines or enhancing human adjudication through AI assistance. Using classic normative conflicts-the trolley problem and similar moral dilemmas-as a proof of concept, we examine the alignment between AI legal reasoning and human judgment. In our baseline experiment, we find a pronounced mismatch between decisions made by GPT and those of human subjects. This misalignment raises substantive concerns for AI-powered legal decision-aids. We investigate whether explicit normative guidance can address this misalignment, with mixed results. GPT-3.5 is susceptible to such intervention, but frequently refuses to decide when faced with a moral dilemma. GPT-4 is outright utilitarian, and essentially ignores the instruction to decide on deontological grounds. GPT-o3-mini faithfully implements this instruction, but is unwilling to balance deontological and utilitarian concerns if instructed to do so. At least for the time being, explicit normative instructions are not fully able to realign AI advice with the normative convictions of the legislator.
Keywords:	large language models, human-AI alignment, rule of law, moral dilemmas, trolley problems
JEL:	C99 D63 D81 K10 K40 Z13
Date:	2025–04
URL:	https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_03

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.