|
on Big Data |
By: | Duane, Jackson; Ren, Alicia; Zhang, Wei |
Abstract: | This paper presents a focused review of recent academic advances in the application of deep learning techniques to algorithmic trading. While traditional machine learning models have long been used in financial forecasting, the last decade has seen a rapid expansion in the use of deep learning architectures due to their ability to model non-linear dependencies, learn hierarchical features, and process high-dimensional sequential data. We categorize and synthesize developments across three primary paradigms: supervised deep learning models for price prediction and signal generation, unsupervised and generative approaches for feature extraction and data augmentation, and reinforcement learning agents for decision-making in trading environments. By analyzing over 30 recent peer-reviewed studies, we highlight how modern models such as attention-based networks, graph neural networks, and deep Q-learning have enhanced the robustness and adaptability of trading algorithms. We also discuss key limitations—including overfitting, data non-stationarity, and lack of interpretability—and summarize efforts to address them. This review serves as a resource for researchers seeking a clear, academically grounded perspective on how deep learning is currently reshaping algorithmic trading systems. |
Date: | 2025–07–23 |
URL: | https://d.repec.org/n?u=RePEc:osf:osfxxx:ctxf9_v1 |
By: | Baptiste Lefort; Eric Benhamou; Beatrice Guez; Jean-Jacques Ohana; Ethan Setrouk; Alban Etienne |
Abstract: | This paper presents a novel hierarchical framework for portfolio optimization, integrating lightweight Large Language Models (LLMs) with Deep Reinforcement Learning (DRL) to combine sentiment signals from financial news with traditional market indicators. Our three-tier architecture employs base RL agents to process hybrid data, meta-agents to aggregate their decisions, and a super-agent to merge decisions based on market data and sentiment analysis. Evaluated on data from 2018 to 2024, after training on 2000-2017, the framework achieves a 26% annualized return and a Sharpe ratio of 1.2, outperforming equal-weighted and S&P 500 benchmarks. Key contributions include scalable cross-modal integration, a hierarchical RL structure for enhanced stability, and open-source reproducibility. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.22932 |
By: | Md Talha Mohsin |
Abstract: | Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide variety of Financial Natural Language Processing (FinNLP) tasks. However, systematic comparisons among widely used LLMs remain underexplored. Given the rapid advancement and growing influence of LLMs in financial analysis, this study conducts a thorough comparative evaluation of five leading LLMs, GPT, Claude, Perplexity, Gemini and DeepSeek, using 10-K filings from the 'Magnificent Seven' technology companies. We create a set of domain-specific prompts and then use three methodologies to evaluate model performance: human annotation, automated lexical-semantic metrics (ROUGE, Cosine Similarity, Jaccard), and model behavior diagnostics (prompt-level variance and across-model similarity). The results show that GPT gives the most coherent, semantically aligned, and contextually relevant answers; followed by Claude and Perplexity. Gemini and DeepSeek, on the other hand, have more variability and less agreement. Also, the similarity and stability of outputs change from company to company and over time, showing that they are sensitive to how prompts are written and what source material is used. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.22936 |
By: | Richiardi, Matteo; Rejoice, Frimpong |
Abstract: | Development of microsimulation models often requires reweighting some input dataset to reflect the characteristics of a different population of interest. In this paper we explore a machine learning approach whereas a variant of decision trees (Gradient Boosted Machine) is used to replicate the joint distribution of target variables observed in a large commercially available but slightly biased dataset, with an additional raking step to remove the bias and ensure consistency of relevant marginal distributions with official statistics. The method is applied to build a regional variant of UKMOD, an open-source static tax-benefit model for the UK belonging to the EUROMOD family, with an application to the Greater Essex region in the UK. |
Date: | 2025–08–11 |
URL: | https://d.repec.org/n?u=RePEc:ese:cempwp:cempa9-25 |
By: | Tingyu Yuan; Xi Zhang; Xuanjing Chen |
Abstract: | In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.06266 |
By: | Siyi Wu; Zhaoyang Guan; Leyi Zhao; Xinyuan Song; Xinyu Ying; Hanlin Zhang; Michele Pak; Yangfan He; Yi Xin; Jianhui Wang; Tianyu Shi |
Abstract: | Cryptocurrency trading is a challenging task requiring the integration of heterogeneous data from multiple modalities. Traditional deep learning and reinforcement learning approaches typically demand large training datasets and encode diverse inputs into numerical representations, often at the cost of interpretability. Recent progress in large language model (LLM)-based agents has demonstrated the capacity to process multi-modal data and support complex investment decision-making. Building on these advances, we present \textbf{MountainLion}, a multi-modal, multi-agent system for financial trading that coordinates specialized LLM-based agents to interpret financial data and generate investment strategies. MountainLion processes textual news, candlestick charts, and trading signal charts to produce high-quality financial reports, while also enabling modification of reports and investment recommendations through data-driven user interaction and question answering. A central reflection module analyzes historical trading signals and outcomes to continuously refine decision processes, and the system is capable of real-time report analysis, summarization, and dynamic adjustment of investment strategies. Empirical results confirm that MountainLion systematically enriches technical price triggers with contextual macroeconomic and capital flow signals, providing a more interpretable, robust, and actionable investment framework that improves returns and strengthens investor confidence. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20474 |
By: | Ozili, Peterson K; Obiora, Kingsley I; Onuzo, Chinwendu |
Abstract: | Large language models have gained popularity, and it is important to understand their applications in the financial inclusion domain. This study identifies the benefits and risks of using large language models (LLMs) in the financial inclusion domain. We show that LLMs can be used to (i) summarize the key themes in financial inclusion communications, (ii) gain insights from the tone of financial inclusion communications, (iii) bring discipline to financial inclusion communications, (iv) improve financial inclusion decision making, and (v) enhance context-sensitive text analysis and evaluation. However, the use of large language models in the financial inclusion domain poses risks relating to biased interpretations of LLM-generated responses, data privacy risk, misinformation and falsehood risks. We emphasize that LLMs can be used safely in the financial inclusion domain to summarise financial inclusion speeches and communication, but they should not be used in situations where finding the truth is important to make decisions that promote financial inclusion. |
Keywords: | financial inclusion, large language models, LLM, algorithm, risk, benefit, communication, speech, artificial intelligence, digital financial inclusion |
JEL: | G20 G21 G23 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:pra:mprapa:125562 |
By: | Tomaso Duso; Joseph E., Jr. Harrington; Carl Kreuzberg; Geza Sapi |
Abstract: | Competition authorities increasingly rely on economic screening tools to identify markets where firms deviate from competitive norms. Traditional screening methods assume that collusion occurs through secret agreements. However, recent research highlights that firms can use public announcements to coordinate decisions, reducing competition while avoiding detection. We propose a novel approach to screening for collusion in public corporate statements. Using natural language processing, we analyze more than 300, 000 earnings call transcripts issued worldwide between 2004 and 2022. By identifying expressions commonly associated with collusion, our method provides competition authorities with a tool to detect potentially anticompetitive behavior in public communications. Our approach can extend beyond earnings calls to other sources, such as news articles, trade press, and industry reports. Our method informed the European Commission’s 2024 unannounced inspections in the car tire sector, prompted by concerns over price coordination through public communication. |
Keywords: | communication, collusion, NLP, screening, text analysis |
JEL: | C23 D22 L1 L4 L64 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12029 |
By: | Schultze, Michelle |
Abstract: | Kyrgyzstan serves as a key case study for the broader Central Asia–Russia labor pipeline, which supported an estimated 8 million migrants annually in 2020. Prior to the Russo-Ukraine war, remittances from Russia accounted for approximately 30% of Kyrgyzstan’s GDP, driven by over 10% of its population working in Russia. However, understanding wartime migration dynamics is challenging due to suspected political interference in Russian data, restricted foreign access to this data, and the informality that characterizes Central Asian migration patterns. This study incorporates Yandex Wordstat, Google Trends, XGBoost (which outperforms other machine learning methods), and autoregressive models to "nowcast" missing data. The results reveal a push effect linked to war onset in February 2022 and war intensity. However, all three of the analyzed migration datasets suggest a potential delayed labor substitution effect as Central Asian migrants fill vacancies left by conscripted Russian workers, proxied by casualty data from Mediazona and the BBC. The study also examines remittance trends, which seem to increase along with the labor substitution effect after a two-month lag. These results are robust to Russia- and Kyrgyzstan-side socioeconomic controls such as wage levels and population dynamics. This study provides new insight into the largely opaque Central Asia–Russia labor pipeline, a critical element in development policymaking for both regions. It also introduces a novel methodology for nowcasting migration trends, particularly through Yandex Wordstat, which has been largely overlooked in English-language scholarship. |
Date: | 2025–07–24 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:z2wch_v1 |
By: | Jiayi Guo; Zhiyu Quan; Linfeng Zhang |
Abstract: | The lack of high-quality public cyber incident data limits empirical research and predictive modeling for cyber risk assessment. This challenge persists due to the reluctance of companies to disclose incidents that could damage their reputation or investor confidence. Therefore, from an actuarial perspective, potential resolutions conclude two aspects: the enhancement of existing cyber incident datasets and the implementation of advanced modeling techniques to optimize the use of the available data. A review of existing data-driven methods highlights a significant lack of entity-specific organizational features in publicly available datasets. To address this gap, we propose a novel InsurTech framework that enriches cyber incident data with entity-specific attributes. We develop various machine learning (ML) models: a multilabel classification model to predict the occurrence of cyber incident types (e.g., Privacy Violation, Data Breach, Fraud and Extortion, IT Error, and Others) and a multioutput regression model to estimate their annual frequencies. While classifier and regressor chains are implemented to explore dependencies among cyber incident types as well, no significant correlations are observed in our datasets. Besides, we apply multiple interpretable ML techniques to identify and cross-validate potential risk factors developed by InsurTech across ML models. We find that InsurTech empowered features enhance prediction occurrence and frequency estimation robustness compared to only using conventional risk factors. The framework generates transparent, entity-specific cyber risk profiles, supporting customized underwriting and proactive cyber risk mitigation. It provides insurers and organizations with data-driven insights to support decision-making and compliance planning. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.08193 |
By: | Mori, Misato |
Abstract: | Financial fraud generates persistent risk and capital loss across sectors. This study investigates artificial intelligence (AI) methodologies for financial fraud detection, with emphasis on Retrieval-Augmented Generation (RAG). The review covers supervised classification, unsupervised anomaly detection, and graph-based relational modeling using deep neural networks, transformers, and hybrid architectures. Challenges include class imbalance, concept drift, and decision interpretability. We describe the RAG framework integrating retrievers and generative language models with external knowledge bases. Empirical comparisons on synthetic and real-time fraud datasets show improved F1-score, precision, and contextual reasoning in contrast to fine-tuned transformers and static classifiers. Applications include transaction monitoring, policy violation detection, account takeover analysis, and social engineering prevention. Evaluation highlights retrieval-grounded generation as an effective fraud signal augmentation mechanism. The paper concludes with architectural implications for deploying scalable, compliant, and adaptive fraud detection pipelines in multi-domain financial systems. |
Date: | 2025–07–16 |
URL: | https://d.repec.org/n?u=RePEc:osf:osfxxx:5yjm4_v1 |
By: | Francis Boabang; Samuel Asante Gyamerah |
Abstract: | In insurance fraud prediction, handling class imbalance remains a critical challenge. This paper presents a novel multistage focal loss function designed to enhance the performance of machine learning models in such imbalanced settings by helping to escape local minima and converge to a good solution. Building upon the foundation of the standard focal loss, our proposed approach introduces a dynamic, multi-stage convex and nonconvex mechanism that progressively adjusts the focus on hard-to-classify samples across training epochs. This strategic refinement facilitates more stable learning and improved discrimination between fraudulent and legitimate cases. Through extensive experimentation on a real-world insurance dataset, our method achieved better performance than the traditional focal loss, as measured by accuracy, precision, F1-score, recall and Area Under the Curve (AUC) metrics on the auto insurance dataset. These results demonstrate the efficacy of the multistage focal loss in boosting model robustness and predictive accuracy in highly skewed classification tasks, offering significant implications for fraud detection systems in the insurance industry. An explainable model is included to interpret the results. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2508.02283 |
By: | Golo Henseke; Rhys Davies; Alan Felstead; Duncan Gallie; Francis Green; Ying Zhou |
Abstract: | We introduce the Generative AI Susceptibility Index (GAISI), a task-based measure of UK job exposure to large language models (LLMs), such as ChatGPT. GAISI is derived from probabilistic task ratings by LLMs and linked to worker-reported task data from the Skills and Employment Surveys. It reflects the share of job activities where an LLM or LLM-powered system can reduce task completion time by at least 25 per cent beyond existing productivity tools. The index demonstrates high reliability, strong alignment with AI capabilities, and superior predictive power compared to existing exposure measures. By 2023-24, nearly all UK jobs exhibited some exposure, yet only a minority were heavily affected. Aggregate exposure has risen since 2017, primarily due to occupational shifts rather than changes in task profiles. The price premium for AI-exposed tasks declined relative to 2017, measuring approximately 11 per cent lower in 2023-24. Job postings in high-exposure roles also fell by 6.5 per cent following the release of ChatGPT. GAISI offers a robust framework for assessing generative AI's impact on work, providing early evidence that displacement effects may already outweigh productivity gains. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.22748 |
By: | Yingnan Yan; Tianming Liu; Yafeng Yin |
Abstract: | As a key advancement in artificial intelligence, large language models (LLMs) are set to transform transportation systems. While LLMs offer the potential to simulate human travelers in future mixed-autonomy transportation systems, their behavioral fidelity in complex scenarios remains largely unconfirmed by existing research. This study addresses this gap by conducting a comprehensive analysis of the value of travel time (VOT) of a popular LLM, GPT-4o. We employ a full factorial experimental design to systematically examine the LLM's sensitivity to various transportation contexts, including the choice setting, travel purpose, income, and socio-demographic factors. Our results reveal a high degree of behavioral similarity between the LLM and humans. The LLM exhibits an aggregate VOT similar to that of humans, and demonstrates human-like sensitivity to travel purpose, income, and the time-cost trade-off ratios of the alternatives. Furthermore, the behavioral patterns of LLM are remarkably consistent across varied contexts. However, we also find that the LLM's context sensitivity is less pronounced than that observed in humans. Overall, this study provides a foundational benchmark for the future development of LLMs as proxies for human travelers, demonstrating their value and robustness while highlighting that their blunted contextual sensitivity requires careful consideration. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.22244 |
By: | Junjie Zhao; Chengxi Zhang; Chenkai Wang; Peng Yang |
Abstract: | Reinforcement learning (RL) has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies. However, existing methods are hampered by the sparse rewards given the underlying Markov Decision Process. This inefficiency limits the exploration of the vast symbolic search space and destabilizes the training process. To address this, Trajectory-level Reward Shaping (TLRS), a novel reward shaping method, is proposed. TLRS provides dense, intermediate rewards by measuring the subsequence-level similarity between partially generated expressions and a set of expert-designed formulas. Furthermore, a reward centering mechanism is introduced to reduce training variance. Extensive experiments on six major Chinese and U.S. stock indices show that TLRS significantly improves the predictive power of mined factors, boosting the Rank Information Coefficient by 9.29% over existing potential-based shaping algorithms. Notably, TLRS achieves a major leap in computational efficiency by reducing its time complexity with respect to the feature dimension from linear to constant, which is a significant improvement over distance-based baselines. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20263 |
By: | Zeqi Wu; Meilin Wang; Wei Huang; Zheng Zhang |
Abstract: | Estimation and inference of treatment effects under unconfounded treatment assignments often suffer from bias and the `curse of dimensionality' due to the nonparametric estimation of nuisance parameters for high-dimensional confounders. Although debiased state-of-the-art methods have been proposed for binary treatments under particular treatment models, they can be unstable for small sample sizes. Moreover, directly extending them to general treatment models can lead to computational complexity. We propose a balanced neural networks weighting method for general treatment models, which leverages deep neural networks to alleviate the curse of dimensionality while retaining optimal covariate balance through calibration, thereby achieving debiased and robust estimation. Our method accommodates a wide range of treatment models, including average, quantile, distributional, and asymmetric least squares treatment effects, for discrete, continuous, and mixed treatments. Under regularity conditions, we show that our estimator achieves rate double robustness and $\sqrt{N}$-asymptotic normality, and its asymptotic variance achieves the semiparametric efficiency bound. We further develop a statistical inference procedure based on weighted bootstrap, which avoids estimating the efficient influence/score functions. Simulation results reveal that the proposed method consistently outperforms existing alternatives, especially when the sample size is small. Applications to the 401(k) dataset and the Mother's Significant Features dataset further illustrate the practical value of the method for estimating both average and quantile treatment effects under binary and continuous treatments, respectively. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.04044 |
By: | Rachel Cho; Christoph Görtz; Danny McGowan; Max Schröder |
Abstract: | We propose a new approach to identify firm-level financial constraints by applying artificial intelligence to text of 10-K filings by U.S. public firms from 1993 to 2021. Leveraging transformer-based natural language processing, our model captures contextual and semantic nuances often missed by traditional text classification techniques, enabling more accurate detection of financial constraints. A key contribution is to differentiate between constraints that affect firms presently and those anticipated in the future. These two types of constraints are associated with distinctly different financial profiles: while firms expecting future constraints tend to accumulate cash preemptively, currently constrained firms exhibit reduced liquidity and higher leverage. We show that only firms anticipating financial constraints exhibit significant cash flow sensitivity of cash, whereas currently constrained and unconstrained firms do not. This calls for a narrower interpretation of this widely used cash-based constraints measure, as it may conflate distinct firm types – unconstrained and currently constrained – and fail to capture all financially constrained firms. Our findings underscore the critical role of constraint timing in shaping corporate financial behavior. |
Keywords: | financial constraints, artificial intelligence, expectations, cash, cash flow, corporate finance behavior |
JEL: | G31 G32 D92 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12054 |
By: | Zequn Jin; Gaoqian Xu; Xi Zheng; Yahong Zhou |
Abstract: | This paper develops a robust and efficient method for policy learning from observational data in the presence of unobserved confounding, complementing existing instrumental variable (IV) based approaches. We employ the marginal sensitivity model (MSM) to relax the commonly used yet restrictive unconfoundedness assumption by introducing a sensitivity parameter that captures the extent of selection bias induced by unobserved confounders. Building on this framework, we consider two distributionally robust welfare criteria, defined as the worst-case welfare and policy improvement functions, evaluated over an uncertainty set of counterfactual distributions characterized by the MSM. Closed-form expressions for both welfare criteria are derived. Leveraging these identification results, we construct doubly robust scores and estimate the robust policies by maximizing the proposed criteria. Our approach accommodates flexible machine learning methods for estimating nuisance components, even when these converge at moderately slow rate. We establish asymptotic regret bounds for the resulting policies, providing a robust guarantee against the most adversarial confounding scenario. The proposed method is evaluated through extensive simulation studies and empirical applications to the JTPA study and Head Start program. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20550 |
By: | Dhanashekar Kandaswamy; Ashutosh Sahoo; Akshay SP; Gurukiran S; Parag Paul; Girish G N |
Abstract: | As decentralized finance (DeFi) evolves, distinguishing between user behaviors - liquidity provision versus active trading - has become vital for risk modeling and on-chain reputation. We propose a behavioral scoring framework for Uniswap that assigns two complementary scores: a Liquidity Provision Score that assesses strategic liquidity contributions, and a Swap Behavior Score that reflects trading intent, volatility exposure, and discipline. The scores are constructed using rule-based blueprints that decompose behavior into volume, frequency, holding time, and withdrawal patterns. To handle edge cases and learn feature interactions, we introduce a deep residual neural network with densely connected skip blocks inspired by the U-Net architecture. We also incorporate pool-level context such as total value locked (TVL), fee tiers, and pool size, allowing the system to differentiate similar user behaviors across pools with varying characteristics. Our framework enables context-aware and scalable DeFi user scoring, supporting improved risk assessment and incentive design. Experiments on Uniswap v3 data show its usefulness for user segmentation and protocol-aligned reputation systems. Although we refer to our metric as zScore, it is independently developed and methodologically different from the cross-protocol system proposed by Udupi et al. Our focus is on role-specific behavioral modeling within Uniswap using blueprint logic and supervised learning. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20494 |
By: | Johannes Kruse (Max Planck Institute for Research on Collective Goods, Bonn) |
Abstract: | This comment shows how large language models (LLMs) can help courts discern the "ordinary meaning" of statutory terms. Instead of relying on expert-heavy corpus‑linguistic techniques (Gries 2025), the author simulates a human survey with GPT‑4o. Demographically realistic AI agents replicate the 2, 835 participants in Tobia's 2020 study on vehicle and yield response distributions with no statistically significant difference from the human data (Kolmogorov–Smirnov p = 0.915). The paper addresses concerns about hallucinations, reproducibility, data leakage, and explainability, and introduces the locked‑prompt "Ordinary Meaning Bot, " arguing that LLM-based survey simulation is a practical, accurate alternative to dictionaries, intuition, or complex corpus analysis. |
Keywords: | ordinary meaning; large language models; prompt engineering; human survey simulation; alignment |
JEL: | K1 Z0 |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_12 |
By: | Seyed Mohammad Ali Jafari; Ali Mobini Dehkordi; Ehsan Chitsaz; Yadollah Yaghoobzadeh |
Abstract: | Background: Predicting startup success with machine learning is a rapidly growing field, yet findings on key predictors are often fragmented and context-specific. This makes it difficult to discern robust patterns and highlights a need for a systematic synthesis of the evidence. Methods: This study conducts a quantitative meta-analysis to synthesize the literature on predictor importance in AI-based startup evaluation. We performed a systematic review to identify a final sample of 13 empirical studies that report rankable feature importance. From these papers, we extracted and categorized 58 unique predictors, synthesizing their importance using a Weighted Importance Score (WIS) that balances a feature's average rank with its frequency of appearance. We also conducted a moderator analysis to investigate how predictor importance changes with context (e.g., success definition). Results: Our aggregate analysis reveals that the most consistently powerful predictors are a quartet of foundational attributes: Firm Characteristics (e.g., age, location), Investor Structure (e.g., investor quality), Digital and Social Traction (e.g., online momentum), and Funding History. The moderator analysis further reveals that this hierarchy is highly context-dependent. For instance, predicting near-term funding milestones elevates the importance of the deal's immediate context, while predicting long-term exits prioritizes fundamental firm and investor characteristics. Conclusion: The factors that best predict startup success are not universal but are contingent on the startup's goals, stage, and the data used for evaluation. Our findings point to a potential "convenience bias" in the literature, where predictor importance may be tied to data accessibility. We conclude by underscoring the need for standardized reporting practices to enable more robust, cumulative knowledge building in the field. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.09675 |
By: | Georges Sfeir; Gabriel Nova; Stephane Hess; Sander van Cranenburgh |
Abstract: | Large Language Models (LLMs) are widely used to support various workflows across different disciplines, yet their potential in choice modelling remains relatively unexplored. This work examines the potential of LLMs as assistive agents in the specification and, where technically feasible, estimation of Multinomial Logit models. We implement a systematic experimental framework involving thirteen versions of six leading LLMs (ChatGPT, Claude, DeepSeek, Gemini, Gemma, and Llama) evaluated under five experimental configurations. These configurations vary along three dimensions: modelling goal (suggesting vs. suggesting and estimating MNLs); prompting strategy (Zero-Shot vs. Chain-of-Thoughts); and information availability (full dataset vs. data dictionary only). Each LLM-suggested specification is implemented, estimated, and evaluated based on goodness-of-fit metrics, behavioural plausibility, and model complexity. Findings reveal that proprietary LLMs can generate valid and behaviourally sound utility specifications, particularly when guided by structured prompts. Open-weight models such as Llama and Gemma struggled to produce meaningful specifications. Claude 4 Sonnet consistently produced the best-fitting and most complex models, while GPT models suggested models with robust and stable modelling outcomes. Some LLMs performed better when provided with just data dictionary, suggesting that limiting raw data access may enhance internal reasoning capabilities. Among all LLMs, GPT o3 was uniquely capable of correctly estimating its own specifications by executing self-generated code. Overall, the results demonstrate both the promise and current limitations of LLMs as assistive agents in choice modelling, not only for model specification but also for supporting modelling decision and estimation, and provide practical guidance for integrating these tools into choice modellers' workflows. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.21790 |
By: | Aaron Green; Zihan Nie; Hanzhen Qin; Oshani Seneviratne; Kristin P. Bennett |
Abstract: | Survival modeling predicts the time until an event occurs and is widely used in risk analysis; for example, it's used in medicine to predict the survival of a patient based on censored data. There is a need for large-scale, realistic, and freely available datasets for benchmarking artificial intelligence (AI) survival models. In this paper, we derive a suite of 16 survival modeling tasks from publicly available transaction data generated by lending of cryptocurrencies in Decentralized Finance (DeFi). Each task was constructed using an automated pipeline based on choices of index and outcome events. For example, the model predicts the time from when a user borrows cryptocurrency coins (index event) until their first repayment (outcome event). We formulate a survival benchmark consisting of a suite of 16 survival-time prediction tasks (FinSurvival). We also automatically create 16 corresponding classification problems for each task by thresholding the survival time using the restricted mean survival time. With over 7.5 million records, FinSurvival provides a suite of realistic financial modeling tasks that will spur future AI survival modeling research. Our evaluation indicated that these are challenging tasks that are not well addressed by existing methods. FinSurvival enables the evaluation of AI survival models applicable to traditional finance, industry, medicine, and commerce, which is currently hindered by the lack of large public datasets. Our benchmark demonstrates how AI models could assess opportunities and risks in DeFi. In the future, the FinSurvival benchmark pipeline can be used to create new benchmarks by incorporating more DeFi transactions and protocols as the use of cryptocurrency grows. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.14160 |
By: | Zehao Lin; Ying Liu; Congrong Pan; Lutz Sager |
Abstract: | We estimate the effect of air pollution on sentiment using social media data from a panel of Japanese cities. To address concerns about potential endogeneity from unobserved simultaneous determinants of air pollution and sentiment, as well as measurement error, we instrument for air pollution using plausibly exogenous variation in atmospheric wind patterns. We find that a one-standard-deviation increase in fine (PM2.5) and small (PM10) particle concentrations reduces overall sentiment by 0.79% and 1.64% standard deviation respectively, which is composed of a more pronounced increase in negative sentiment and a smaller decrease in positive sentiment. Our unique dataset allows us to separately estimate effects on negative sentiment categories including anger, anxiety, and sadness. Our results suggest sentiment as one candidate mechanism, besides physiological and cognitive pathways, to explain the increasingly evident non-health damages from air pollution exposure on work productivity, road safety, sleep and crime. |
Keywords: | air pollution, Twitter, sentiment, Japan |
JEL: | I31 Q51 Q53 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:ces:ceswps:_12030 |
By: | Hoyoung Lee; Junhyuk Seo; Suhwan Park; Junhyeong Lee; Wonbin Ahn; Chanyeol Choi; Alejandro Lopez-Lira; Yongjae Lee |
Abstract: | In finance, Large Language Models (LLMs) face frequent knowledge conflicts due to discrepancies between pre-trained parametric knowledge and real-time market data. These conflicts become particularly problematic when LLMs are deployed in real-world investment services, where misalignment between a model's embedded preferences and those of the financial institution can lead to unreliable recommendations. Yet little research has examined what investment views LLMs actually hold. We propose an experimental framework to investigate such conflicts, offering the first quantitative analysis of confirmation bias in LLM-based investment analysis. Using hypothetical scenarios with balanced and imbalanced arguments, we extract models' latent preferences and measure their persistence. Focusing on sector, size, and momentum, our analysis reveals distinct, model-specific tendencies. In particular, we observe a consistent preference for large-cap stocks and contrarian strategies across most models. These preferences often harden into confirmation bias, with models clinging to initial judgments despite counter-evidence. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20957 |
By: | König, Leonard Maximilian |
Abstract: | This study explores the interplay of issue salience and affective signals in Swiss public reactions to government measures on the Reddit forum r/Switzerland during the COVID-19 pandemic and the Russia-Ukraine war. Using an Exploratory Data Analysis approach, this study applied topic modeling (BERTopic) to a large corpus of posts (2019-2022) to identify shifts in online public attention and emotional responses, and transformer-based sentiment and emotion analysis to quantify sentiment and discrete emotions as affectives. The results reveal a Swiss online public that is highly responsive to events, with attention shifting rapidly, and whose discourse is deeply imbued with emotional content, predominantly negative in the face of restrictive policies or unsettling international developments. These insights underscore the value of computational social science in unpacking the complexities of online public opinion and offer a foundation for future research into the evolving nature of digital democracy and crisis governance. |
Date: | 2025–07–17 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:28exs_v2 |
By: | Wei Lu; Daniel L. Chen; Christian B. Hansen |
Abstract: | Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We evaluate LLM preferences using canonical economic games, finding substantial deviations from human behavior. Models like GPT-4o show excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistently with payoff-maximizing strategies. We propose a supervised fine-tuning pipeline that uses synthetic datasets derived from economic reasoning to align LLM agents with economic preferences, focusing on two stylized preference structures. In the first, utility depends only on individual payoffs (homo economicus), while utility also depends on a notion of Kantian universalizability in the second preference structure (homo moralis). We find that fine-tuning based on small datasets shifts LLM agent behavior toward the corresponding economic agent. We further assess the fine-tuned agents' behavior in two applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. These examples illustrate how different normative objectives embedded via realizations from structured preference structures can influence market and moral outcomes. This work contributes a replicable, cost-efficient, and economically grounded pipeline to align AI preferences using moral-economic principles. |
Date: | 2025–07 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.20796 |
By: | Brochet, S.; Mueller, H.; Rauh, C. |
Abstract: | The correct measurement of economic policy uncertainty (EPU) plays a critical role in many policy settings - in particular where economic policy decisions need to be taken in response to large shocks. One such large shock is armed conflict. But, counterintuitively, the standard text-based EPU index systematically declines during armed conflict periods. Using a global news corpus covering 192 countries and over 5 million articles, we show that this decline is driven not by reduced uncertainty, but by a crowding out of reporting on economics and policy. We show that a combination of topic modeling and two-way fixed effects can be used to adjust the measurement of EPU, providing a new view on political risk during armed conflict. After adjustment, the EPU aligns more closely with firm perceptions, political risk insurance and investment during armed conflict. |
Keywords: | Economic Policy Uncertainty (EPU), Armed Conflict, Media Crowding-Out, Topic Modeling, Latent Dirichlet Allocation (LDA), Measurement Bias, Text-Based Indices, Macroeconomic Uncertainty |
JEL: | C61 C62 G11 G12 D85 |
Date: | 2025–07–25 |
URL: | https://d.repec.org/n?u=RePEc:cam:camjip:2520 |
By: | Christoph Engel (Max Planck Institute for Research on Collective Goods, Bonn); Yoan Hermstrüwer (University of Zurich); Alison Kim (University of Zurich) |
Abstract: | Recent advances in AI create possibilities for delegating legal decision-making to machines or enhancing human adjudication through AI assistance. Using classic normative conflicts-the trolley problem and similar moral dilemmas-as a proof of concept, we examine the alignment between AI legal reasoning and human judgment. In our baseline experiment, we find a pronounced mismatch between decisions made by GPT and those of human subjects. This misalignment raises substantive concerns for AI-powered legal decision-aids. We investigate whether explicit normative guidance can address this misalignment, with mixed results. GPT-3.5 is susceptible to such intervention, but frequently refuses to decide when faced with a moral dilemma. GPT-4 is outright utilitarian, and essentially ignores the instruction to decide on deontological grounds. GPT-o3-mini faithfully implements this instruction, but is unwilling to balance deontological and utilitarian concerns if instructed to do so. At least for the time being, explicit normative instructions are not fully able to realign AI advice with the normative convictions of the legislator. |
Keywords: | large language models, human-AI alignment, rule of law, moral dilemmas, trolley problems |
JEL: | C99 D63 D81 K10 K40 Z13 |
Date: | 2025–04 |
URL: | https://d.repec.org/n?u=RePEc:mpg:wpaper:2025_03 |