nep-big New Economics Papers
on Big Data
Issue of 2025–09–08
seventeen papers chosen by
Tom Coupé, University of Canterbury


  1. Using Machine Learning to Generate, Clarify, and Improve Economic Models By Annie Liang
  2. An AI-powered Tool for Central Bank Business Liaisons: Quantitative Indicators and On-demand Insights from Firms By Nicholas Gray; Finn Lattimore; Kate McLoughlin; Callan Windsor
  3. Enhancing Trading Performance Through Sentiment Analysis with Large Language Models: Evidence from the S&P 500 By Haojie Liu; Zihan Lin; Randall R. Rojas
  4. The Behavioral Signature of GenAI in Scientific Communication By Nikolaos Askitas
  5. A Financial Brain Scan of the LLM By Hui Chen; Antoine Didisheim; Luciano Somoza; Hanqing Tian
  6. Identifying Catalyst Technologies in Clusters with Unsupervised Machine Learning. An application on patent clusters in the UK By Zehra Usta; Martin Andersson; Katarzyna Kopczewska; Maria Kubara
  7. Forecasting Commodity Price Shocks Using Temporal and Semantic Fusion of Prices Signals and Agentic Generative AI Extracted Economic News By Mohammed-Khalil Ghali; Cecil Pang; Oscar Molina; Carlos Gershenson-Garcia; Daehan Won
  8. Federal Reserve Communication and the COVID-19 Pandemic By Jonathan Benchimol; Sophia Kazinnik; Yossi Saadon
  9. A Multi-Task Evaluation of LLMs' Processing of Academic Text Input By Tianyi Li; Yu Qin; Olivia R. Liu Sheng
  10. Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models By Shanyan Lai
  11. Left Leaning Models: AI Assumptions on Economic Policy By Maxim Chupilkin
  12. Stock Market Performance Prediction: A Comparative Study Between Econometric Models and Artificial Intelligence-Based Models By Manel Labidi; Ying Zhang; Matthieu Petit Guillaume; Aurélien Krauth
  13. What Hinders Electric Vehicle Diffusion? Insights from a Neural Network Approach By Bonacina, Monica; Demir, Mert; Sileo, Antonio; Zanoni, Angela
  14. FinCast: A Foundation Model for Financial Time-Series Forecasting By Zhuohang Zhu; Haodong Chen; Qiang Qu; Vera Chung
  15. Firm-Level Input Price Changes and Their Effects: A Deep Learning Approach By Sudheer Chava; Wendi Du; Indrajit Mitra; Agam Shah; Linghang Zeng
  16. Is All the Information in the Price? LLM Embeddings versus the EMH in Stock Clustering By Bingyang Wang; Grant Johnson; Maria Hybinette; Tucker Balch
  17. Can LLMs Identify Tax Abuse? By Andrew Blair-Stanek; Nils Holzenberger; Benjamin Van Durme

  1. By: Annie Liang
    Abstract: Machine learning algorithms can now outperform classic economic models in predicting quantities ranging from bargaining outcomes, to choice under uncertainty, to an individual's future jobs and wages. Yet this predictive accuracy comes at a cost: most machine learning algorithms function as black boxes, offering little insight into \emph{why} outcomes occur. This article asks whether machine learning can guide the development of new economic theories. Economic models serve an important purpose beyond prediction -- they uncover the general mechanisms behind observed behaviors. A model that identifies the causal pathways of economic development is more valuable than one that merely predicts which countries will escape poverty, because it enables policymakers to encourage that development in countries where it might not have happened otherwise. Similarly, a model that predicts imperfectly across many domains can be more valuable than one that is highly accurate in a specific domain, since the former allows insights and data obtained from one setting to inform decisions and policy in another. Applying machine learning algorithms off-the-shelf is unlikely to yield such models. But recent work shows that, when reconceived with the aims of an economic modeler in mind, machine learning methods can improve both prediction and understanding. These approaches range from adversarially training algorithms to expose the limits of existing models, to imposing economic theory as a constraint on algorithmic search. Advances in large language models complement these strategies and open new research directions.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.19136
  2. By: Nicholas Gray (Reserve Bank of Australia); Finn Lattimore (Reserve Bank of Australia); Kate McLoughlin (Reserve Bank of Australia); Callan Windsor (Reserve Bank of Australia)
    Abstract: In a world of high policy uncertainty, central banks are relying more on soft information sources to complement traditional economic statistics and model-based forecasts. One valuable source of soft information comes from intelligence gathered through central bank liaison programs – structured programs in which central bank staff regularly talk with firms to gather insights. This paper introduces a new text analytics and retrieval tool that efficiently processes, organises, and analyses liaison intelligence gathered from firms using modern natural language processing techniques. The textual dataset spans around 25 years, integrates new information as soon as it becomes available, and covers a wide range of business sizes and industries. The tool uses both traditional text analysis techniques and powerful language models to provide analysts and researchers with three key capabilities: (1) quickly querying the entire history of business liaison meeting notes; (2) zooming in on particular topics to examine their frequency (topic exposure) and analysing the associated tone and uncertainty of the discussion; and (3) extracting precise numerical values from the text, such as firms' reported figures for wages and prices growth. We demonstrate how these capabilities are useful for assessing economic conditions by generating text-based indicators of wages growth and incorporating them into a nowcasting model. We find that adding these text-based features to current best-in-class predictive models, combined with the use of machine learning methods designed to handle many predictors, significantly improves the performance of nowcasts for wages growth. Predictive gains are driven by a small number of features, indicating a sparse signal in contrast to other predictive problems in macroeconomics, where the signal is typically dense.
    Keywords: central banking; macroeconomic policy; wages and labour costs; machine learning; econometric modelling; information retrieval systems; firm behaviour
    JEL: C5 C8 D2 E5 E6 J3
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:rba:rbardp:rdp2025-06
  3. By: Haojie Liu; Zihan Lin; Randall R. Rojas
    Abstract: This study integrates real-time sentiment analysis from financial news, GPT-2 and FinBERT, with technical indicators and time-series models like ARIMA and ETS to optimize S&P 500 trading strategies. By merging sentiment data with momentum and trend-based metrics, including a benchmark buy-and-hold and sentiment-based approach, is evaluated through assets values and returns. Results show that combining sentiment-driven insights with traditional models improves trading performance, offering a more dynamic approach to stock trading that adapts to market changes in volatile environments.
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2507.09739
  4. By: Nikolaos Askitas
    Abstract: We examine the uptake and measurable effects of GPT-assisted writing in economics working paper abstracts. Focusing on the IZA discussion paper series, we detect a significant stylistic shift following the public release of ChatGPT-3.5 in March 2023. This shift appears in core textual metrics—including mean word length, type-token ratio, and readability—and reflects growing alignment with machine-generated writing. While the release of ChatGPT constitutes an exogenous technological shock, adoption is endogenous: authors choose whether to incorporate AI assistance. To capture and estimate the magnitude of this behavioral response, we combine stylometric analysis, machine learning classification, and prompt-based similarity testing. Event-study regressions with fixed effects and placebo checks confirm that the observed shift is abrupt, persistent, and not attributable to pre-existing trends. A similarity experiment using OpenAI’s API shows that post-ChatGPT abstracts more closely resemble their GPT-optimised counterparts than do pre-ChatGPT texts. A classifier trained on these variants achieves 97% accuracy and increasingly flags post-March 2023 abstracts as GPT-like. Rather than indicating wholesale substitution, our findings suggest selective human–AI augmentation in professional writing. The framework introduced here generalises to other settings where writing plays a central role—including resumes, job descriptions, legal briefs, research proposals, and software documentation.
    Keywords: GPT adoption, academic writing, text analysis, natural language processing (NLP), machine learning, event study, linguistic metrics, AI-assisted writing, diffusion of technology
    JEL: C55 C88 O33 C81 L86 J24
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_12069
  5. By: Hui Chen; Antoine Didisheim; Luciano Somoza; Hanqing Tian
    Abstract: Emerging techniques in computer science make it possible to "brain scan" large language models (LLMs), identify the plain-English concepts that guide their reasoning, and steer them while holding other factors constant. We show that this approach can map LLM-generated economic forecasts to concepts such as sentiment, technical analysis, and timing, and compute their relative importance without reducing performance. We also show that models can be steered to be more or less risk-averse, optimistic, or pessimistic, which allows researchers to correct or simulate biases. The method is transparent, lightweight, and replicable for empirical research in the social sciences.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.21285
  6. By: Zehra Usta; Martin Andersson; Katarzyna Kopczewska; Maria Kubara
    Abstract: A common proposition is that certain technologies play a catalytic role in regions by paving the way for the emergence of new related technologies, contributing to the development and diversification of technology clusters. This paper employs unsupervised machine learning algorithms with temporally informed association rule mining to identify catalytic patents in clusters in the UK. Using data spanning over 30 years (1980-2015) we show clear asymmetric relationships between patents. Some act as evident catalysts that drive future patent activity in clusters. The results point to a strong empirical relevance of asymmetric relatedness between patents in the development of clusters of technology. They also highlight the usefulness of machine learning algorithms to better understand the long-term evolution of clusters and show how temporally informed association rule mining can be used to analyses asymmetries in relatedness and to identify catalyst technologies.
    Keywords: clusters, innovation, cluster dynamics, technological relatedness, asymmetric relatedness, innovation catalysts, patents
    JEL: O31 O33 R12
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:egu:wpaper:2528
  7. By: Mohammed-Khalil Ghali; Cecil Pang; Oscar Molina; Carlos Gershenson-Garcia; Daehan Won
    Abstract: Accurate forecasting of commodity price spikes is vital for countries with limited economic buffers, where sudden increases can strain national budgets, disrupt import-reliant sectors, and undermine food and energy security. This paper introduces a hybrid forecasting framework that combines historical commodity price data with semantic signals derived from global economic news, using an agentic generative AI pipeline. The architecture integrates dual-stream Long Short-Term Memory (LSTM) networks with attention mechanisms to fuse structured time-series inputs with semantically embedded, fact-checked news summaries collected from 1960 to 2023. The model is evaluated on a 64-year dataset comprising normalized commodity price series and temporally aligned news embeddings. Results show that the proposed approach achieves a mean AUC of 0.94 and an overall accuracy of 0.91 substantially outperforming traditional baselines such as logistic regression (AUC = 0.34), random forest (AUC = 0.57), and support vector machines (AUC = 0.47). Additional ablation studies reveal that the removal of attention or dimensionality reduction leads to moderate declines in performance, while eliminating the news component causes a steep drop in AUC to 0.46, underscoring the critical value of incorporating real-world context through unstructured text. These findings demonstrate that integrating agentic generative AI with deep learning can meaningfully improve early detection of commodity price shocks, offering a practical tool for economic planning and risk mitigation in volatile market environments while saving the very high costs of operating a full generative AI agents pipeline.
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.06497
  8. By: Jonathan Benchimol (Bank of Israel); Sophia Kazinnik (Stanford University); Yossi Saadon (Bank of Israel)
    Abstract: In this study, we examine the Federal Reserve’s communication strategies during the COVID-19 pandemic, comparing them with communication during previous periods of economic stress. Using specialized dictionaries tailored to COVID-19, unconventional monetary policy (UMP), and financial stability, combined with sentiment analysis and topic modeling techniques, we identify a distinct focus in Fed communication during the pandemic on financial stability, market volatility, social welfare, and UMP, characterized by notable contextual uncertainty. Through comparative analysis, we juxtapose the Fed’s communication during the COVID-19 crisis with its responses during the dot-com and global financial crises, examining content, sentiment, and timing dimensions. Our findings reveal that Fed communication and policy actions were more reactive to the COVID-19 crisis than to previous crises. Additionally, declining sentiment related to financial stability in interest rate announcements and minutes anticipated subsequent accommodative monetary policy decisions. We further document that communicating about UMP has become the “new normal†for the Fed’s Federal Open Market Committee meeting minutes and Chairman’s speeches since the Global Financial Crisis, reflecting an institutional adaptation in communication strategy following periods of economic distress. These findings contribute to our understanding of how central bank communication evolves during crises and how communication strategies adapt to exceptional economic circumstances.
    Keywords: Central bank communication, unconventional monetary policy, financial stability, text mining, COVID-19
    JEL: E
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:inf:wpaper:2025.10
  9. By: Tianyi Li; Yu Qin; Olivia R. Liu Sheng
    Abstract: How much large language models (LLMs) can aid scientific discovery, notably in assisting academic peer review, is in heated debate. Between a literature digest and a human-comparable research assistant lies their practical application potential. We organize individual tasks that computer science studies employ in separate terms into a guided and robust workflow to evaluate LLMs' processing of academic text input. We employ four tasks in the assessment: content reproduction/comparison/scoring/reflection, each demanding a specific role of the LLM (oracle/judgmental arbiter/knowledgeable arbiter/collaborator) in assisting scholarly works, and altogether testing LLMs with questions that increasingly require intellectual capabilities towards a solid understanding of scientific texts to yield desirable solutions. We exemplify a rigorous performance evaluation with detailed instructions on the prompts. Adopting first-rate Information Systems articles at three top journals as the input texts and an abundant set of text metrics, we record a compromised performance of the leading LLM - Google's Gemini: its summary and paraphrase of academic text is acceptably reliable; using it to rank texts through pairwise text comparison is faintly scalable; asking it to grade academic texts is prone to poor discrimination; its qualitative reflection on the text is self-consistent yet hardly insightful to inspire meaningful research. This evidence against an endorsement of LLMs' text-processing capabilities is consistent across metric-based internal (linguistic assessment), external (comparing to the ground truth), and human evaluation, and is robust to the variations of the prompt. Overall, we do not recommend an unchecked use of LLMs in constructing peer reviews.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.11779
  10. By: Shanyan Lai
    Abstract: This study investigates the pretrained RNN attention models with the mainstream attention mechanisms such as additive attention, Luong's three attentions, global self-attention (Self-att) and sliding window sparse attention (Sparse-att) for the empirical asset pricing research on top 420 large-cap US stocks. This is the first paper on the large-scale state-of-the-art (SOTA) attention mechanisms applied in the asset pricing context. They overcome the limitations of the traditional machine learning (ML) based asset pricing, such as mis-capturing the temporal dependency and short memory. Moreover, the enforced causal masks in the attention mechanisms address the future data leaking issue ignored by the more advanced attention-based models, such as the classic Transformer. The proposed attention models also consider the temporal sparsity characteristic of asset pricing data and mitigate potential overfitting issues by deploying the simplified model structures. This provides some insights for future empirical economic research. All models are examined in three periods, which cover pre-COVID-19 (mild uptrend), COVID-19 (steep uptrend with a large drawdown) and one year post-COVID-19 (sideways movement with high fluctuations), for testing the stability of these models under extreme market conditions. The study finds that in value-weighted portfolio back testing, Model Self-att and Model Sparse-att exhibit great capabilities in deriving the absolute returns and hedging downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80 respectively in the period with COVID-19. And Model Sparse-att performs more stably than Model Self-att from the perspective of absolute portfolio returns with respect to the size of stocks' market capitalization.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.19006
  11. By: Maxim Chupilkin
    Abstract: How does AI think about economic policy? While the use of large language models (LLMs) in economics is growing exponentially, their assumptions on economic issues remain a black box. This paper uses a conjoint experiment to tease out the main factors influencing LLMs' evaluation of economic policy. It finds that LLMs are most sensitive to unemployment, inequality, financial stability, and environmental harm and less sensitive to traditional macroeconomic concerns such as economic growth, inflation, and government debt. The results are remarkably consistent across scenarios and across models.
    Date: 2025–07
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2507.15771
  12. By: Manel Labidi (LEVIATAN); Ying Zhang (LEVIATAN); Matthieu Petit Guillaume (BH - Beyond Horizon - BH - Beyond Horizon); Aurélien Krauth (LEVIATAN)
    Abstract: In this article, we present a comparative study of the performance of econometric models (Mundlak model and GEE-Logit model) and artificial intelligence based models, such as stacking model and ensemble model integrating XG-Boost and LightGBM, as well as deep learning models (LSTM, GRU, Transformer-based encoder-decoder, TCN) in a classification task of listed securities into underperfor- ming and outperforming stocks, with a one-year investment horizon. We use annual historical data from 2019 to 2021. The results show that a stacking classification model out-performs the other models and offers a better balance between the true positive rate (70%) and the true negative rate (67%).
    Abstract: Dans cet article nous présentons une étude comparative des performances des modèles économétriques (modèle de Mundlak et modèle GEE-Logit) et ceux issus de l'intelligence artificielle comme le modèle en empilement et le modèle ensembliste intégrant XGBoost et LightGBM, ainsi que les modèles d'apprentissage profond (LSTM, GRU, encodeur-décodeur basé sur les Transformers, TCN) dans une tâche de classification de titres cotés en titres sous-performants et titres surperformants, pour un horizon d'investissement à un an. Nous utilisons des données historiques annuelles de 2019 à 2021. Les résultats montrent qu'un modèle de classification en empilement surpasse les autres modèles et offre un meilleur équilibre entre le taux de vrais positifs (70%) et le taux de vrais négatifs (67%).
    Keywords: Long Short-Term Memory, Gestion de portefeuilles décision d'investissement eXtreme Gradient Boosting Long Short-Term Memory Light Gradient Boosting Gated Recurrent Unit Temporal Convolutional Network modèle à pile modèle GEE-Logit modèle de Mundlak Portfolio management investment decision eXtreme Gradient Boosting Long Short-Term Memory Light Gradient Boosting Gated Recurrent Unit Temporal Convolutional Network stacking model GEE-Logit model Mundlak model 1. Autoregressive moving-average model 2. Autoregressive Integrated Moving Average model 3. Autoregressive Conditional Heteroskedasticity model 4. Generalized AutoRegressive Conditional Heteroskedasticity model, Gestion de portefeuilles, décision d'investissement, eXtreme Gradient Boosting, Light Gradient Boosting, Mundlak model 1. Autoregressive moving-average model 2. Autoregressive Integrated Moving Average model 3. Autoregressive Conditional Heteroskedasticity model 4. Generalized AutoRegressive Conditional Heteroskedasticity model, GEE-Logit model, stacking model, investment decision, modèle de Mundlak Portfolio management, modèle GEE-Logit, modèle à pile, Temporal Convolutional Network, Gated Recurrent Unit
    Date: 2025–07–02
    URL: https://d.repec.org/n?u=RePEc:hal:journl:hal-05168124
  13. By: Bonacina, Monica; Demir, Mert; Sileo, Antonio; Zanoni, Angela
    Abstract: The transition to a zero-emission vehicle fleet represents a pivotal element of Europe’s decarbonization strategy, with Italy’s participation being particularly significant given the size of its automotive market. This study investigates the potential for battery electric cars (BEVs) to drive decarbonization of Italy’s passenger vehicle fleet, focusing on the feasibility of targets set in the National Integrated Plan for Energy and Climate (PNIEC). Leveraging artificial neural networks, we integrate macroeconomic indicators, market-specific variables, and policy instruments to predict fleet dynamics and identify key factors influencing BEV adoption. We forecast that while BEV registrations will continue growing through 2030, the growth rate is projected to decelerate, presenting challenges for meeting ambitious policy targets. Our feature importance analysis demonstrates that BEV adoption is driven by an interconnected set of economic, infrastructural, and behavioral factors. Specifically, our model highlights that hybrid vehicle registrations and the vehicle purchase index exert the strongest influence on BEV registrations, suggesting that policy interventions should prioritize these areas to maximize impact. By offering data-driven insights and methodological innovations, our findings contribute to more effective policy design for accelerating sustainable mobility adoption while accounting for market realities and consumer behavior.
    Keywords: Climate Change, Environmental Economics and Policy, Sustainability
    Date: 2025–08–01
    URL: https://d.repec.org/n?u=RePEc:ags:feemwp:369002
  14. By: Zhuohang Zhu; Haodong Chen; Qiang Qu; Vera Chung
    Abstract: Financial time-series forecasting is critical for maintaining economic stability, guiding informed policymaking, and promoting sustainable investment practices. However, it remains challenging due to various underlying pattern shifts. These shifts arise primarily from three sources: temporal non-stationarity (distribution changes over time), multi-domain diversity (distinct patterns across financial domains such as stocks, commodities, and futures), and varying temporal resolutions (patterns differing across per-second, hourly, daily, or weekly indicators). While recent deep learning methods attempt to address these complexities, they frequently suffer from overfitting and typically require extensive domain-specific fine-tuning. To overcome these limitations, we introduce FinCast, the first foundation model specifically designed for financial time-series forecasting, trained on large-scale financial datasets. Remarkably, FinCast exhibits robust zero-shot performance, effectively capturing diverse patterns without domain-specific fine-tuning. Comprehensive empirical and qualitative evaluations demonstrate that FinCast surpasses existing state-of-the-art methods, highlighting its strong generalization capabilities.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.19609
  15. By: Sudheer Chava; Wendi Du; Indrajit Mitra; Agam Shah; Linghang Zeng
    Abstract: We develop firm-level measures of input and output price changes using textual analysis of earnings calls. We establish five facts: (1) Input prices increase (decrease) at the median firm once every seven (30) months. (2) Input price changes contain an equal blend of aggregate and firm-specific components. (3) A firm's stock price experiences a –1.15 percent return when our input price change measure is in the top tercile of price increases. (4) Our input price change measure predicts future changes in the cost of goods sold. (5) Firms pass through input price changes to output prices in the same quarter with a magnitude of 0.7.
    Keywords: deep learning; input price; cost pass-through
    JEL: D24 E12 E44 L11
    Date: 2025–08–19
    URL: https://d.repec.org/n?u=RePEc:fip:fedawp:101518
  16. By: Bingyang Wang; Grant Johnson; Maria Hybinette; Tucker Balch
    Abstract: This paper investigates whether artificial intelligence can enhance stock clustering compared to traditional methods. We consider this in the context of the semi-strong Efficient Markets Hypothesis (EMH), which posits that prices fully reflect all public information and, accordingly, that clusters based on price information cannot be improved upon. We benchmark three clustering approaches: (i) price-based clusters derived from historical return correlations, (ii) human-informed clusters defined by the Global Industry Classification Standard (GICS), and (iii) AI-driven clusters constructed from large language model (LLM) embeddings of stock-related news headlines. At each date, each method provides a classification in which each stock is assigned to a cluster. To evaluate a clustering, we transform it into a synthetic factor model following the Arbitrage Pricing Theory (APT) framework. This enables consistent evaluation of predictive performance in a roll forward, out-of-sample test. Using S&P 500 constituents from from 2022 through 2024, we find that price-based clustering consistently outperforms both rule-based and AI-based methods, reducing root mean squared error (RMSE) by 15.9% relative to GICS and 14.7% relative to LLM embeddings. Our contributions are threefold: (i) a generalizable methodology that converts any equity grouping: manual, machine, or market-driven, into a real-time factor model for evaluation; (ii) the first direct comparison of price-based, human rule-based, and AI-based clustering under identical conditions; and (iii) empirical evidence reinforcing that short-horizon return information is largely contained in prices. These results support the EMH while offering practitioners a practical diagnostic for monitoring evolving sector structures and provide academics a framework for testing alternative hypotheses about how quickly markets absorb information.
    Date: 2025–09
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2509.01590
  17. By: Andrew Blair-Stanek; Nils Holzenberger; Benjamin Van Durme
    Abstract: We investigate whether large language models can discover and analyze U.S. tax-minimization strategies. This real-world domain challenges even seasoned human experts, and progress can reduce tax revenue lost from well-advised, wealthy taxpayers. We evaluate the most advanced LLMs on their ability to (1) interpret and verify tax strategies, (2) fill in gaps in partially specified strategies, and (3) generate complete, end-to-end strategies from scratch. This domain should be of particular interest to the LLM reasoning community: unlike synthetic challenge problems or scientific reasoning tasks, U.S. tax law involves navigating hundreds of thousands of pages of statutes, case law, and administrative guidance, all updated regularly. Notably, LLM-based reasoning identified an entirely novel tax strategy, highlighting these models' potential to revolutionize tax agencies' fight against tax abuse.
    Date: 2025–08
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2508.20097

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.