|
on Big Data |
| By: | Safiye Turgay; Serkan Erdo\u{g}an; \v{Z}eljko Stevi\'c; Orhan Emre Elma; Tevfik Eren; Zhiyuan Wang; Mahmut Bayda\c{s} |
| Abstract: | In the face of increasing financial uncertainty and market complexity, this study presents a novel risk-aware financial forecasting framework that integrates advanced machine learning techniques with intuitionistic fuzzy multi-criteria decision-making (MCDM). Tailored to the BIST 100 index and validated through a case study of a major defense company in T\"urkiye, the framework fuses structured financial data, unstructured text data, and macroeconomic indicators to enhance predictive accuracy and robustness. It incorporates a hybrid suite of models, including extreme gradient boosting (XGBoost), long short-term memory (LSTM) network, graph neural network (GNN), to deliver probabilistic forecasts with quantified uncertainty. The empirical results demonstrate high forecasting accuracy, with a net profit mean absolute percentage error (MAPE) of 3.03% and narrow 95% confidence intervals for key financial indicators. The risk-aware analysis indicates a favorable risk-return profile, with a Sharpe ratio of 1.25 and a higher Sortino ratio of 1.80, suggesting relatively low downside volatility and robust performance under market fluctuations. Sensitivity analysis shows that the key financial indicator predictions are highly sensitive to variations of inflation, interest rates, sentiment, and exchange rates. Additionally, using an intuitionistic fuzzy MCDM approach, combining entropy weighting, evaluation based on distance from the average solution (EDAS), and the measurement of alternatives and ranking according to compromise solution (MARCOS) methods, the tabular data learning network (TabNet) outperforms the other models and is identified as the most suitable candidate for deployment. Overall, the findings of this work highlight the importance of integrating advanced machine learning, risk quantification, and fuzzy MCDM methodologies in financial forecasting, particularly in emerging markets. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.17936 |
| By: | Liying Zhang; Ying Gao |
| Abstract: | The increasing need for rapid recalibration of option pricing models in dynamic markets places stringent computational demands on data generation and valuation algorithms. In this work, we propose a hybrid algorithmic framework that integrates the smooth offset algorithm (SOA) with supervised machine learning models for the fast pricing of multiple path-independent options under exponential L\'evy dynamics. Building upon the SOA-generated dataset, we train neural networks, random forests, and gradient boosted decision trees to construct surrogate pricing operators. Extensive numerical experiments demonstrate that, once trained, these surrogates achieve order-of-magnitude acceleration over direct SOA evaluation. Importantly, the proposed framework overcomes key numerical limitations inherent to fast Fourier transform-based methods, including the consistency of input data and the instability in deep out-of-the-money option pricing. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.16115 |
| By: | Jiawei Du; Yi Hong |
| Abstract: | This study focuses on forecasting the ultimate forward rate (UFR) and developing a UFRbased bond yield prediction model using data from Chinese treasury bonds and macroeconomic variables spanning from December 2009 to December 2024. The de Kort-Vellekooptype methodology is applied to estimate the UFR, incorporating the optimal turning parameter determination technique proposed in this study, which helps mitigate anomalous fluctuations. In addition, both linear and nonlinear machine learning techniques are employed to forecast the UFR and ultra-long-term bond yields. The results indicate that nonlinear machine learning models outperform their linear counterparts in forecasting accuracy. Incorporating macroeconomic variables, particularly price index-related variables, significantly improves the accuracy of predictions. Finally, a novel UFR-based bond yield forecasting model is developed, demonstrating superior performance across different bond maturities. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.00011 |
| By: | Abraham Itzhak Weinberg |
| Abstract: | Financial market prediction is a challenging application of machine learning, where even small improvements in directional accuracy can yield substantial value. Most models struggle to exceed 55--57\% accuracy due to high noise, non-stationarity, and market efficiency. We introduce a hybrid ensemble framework combining quantum sentiment analysis, Decision Transformer architecture, and strategic model selection, achieving 60.14\% directional accuracy on S\&P 500 prediction, a 3.10\% improvement over individual models. Our framework addresses three limitations of prior approaches. First, architecture diversity dominates dataset diversity: combining different learning algorithms (LSTM, Decision Transformer, XGBoost, Random Forest, Logistic Regression) on the same data outperforms training identical architectures on multiple datasets (60.14\% vs.\ 52.80\%), confirmed by correlation analysis ($r>0.6$ among same-architecture models). Second, a 4-qubit variational quantum circuit enhances sentiment analysis, providing +0.8\% to +1.5\% gains per model. Third, smart filtering excludes weak predictors (accuracy $ |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.15738 |
| By: | Yan Liu; Ye Luo; Zigan Wang; Xiaowei Zhang |
| Abstract: | Machine learning is central to empirical asset pricing, but portfolio construction still relies on point predictions and largely ignores asset-specific estimation uncertainty. We propose a simple change: sort assets using uncertainty-adjusted prediction bounds instead of point predictions alone. Across a broad set of ML models and a U.S. equity panel, this approach improves portfolio performance relative to point-prediction sorting. These gains persist even when bounds are built from partial or misspecified uncertainty information. They arise mainly from reduced volatility and are strongest for flexible machine learning models. Identification and robustness exercises show that these improvements are driven by asset-level rather than time or aggregate predictive uncertainty. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.00593 |
| By: | Annalivia Polselli |
| Abstract: | The double machine learning (DML) method combines the predictive power of machine learning with statistical estimation to conduct inference about the structural parameter of interest. This paper presents the R package `xtdml`, which implements DML methods for partially linear panel regression models with low-dimensional fixed effects, high-dimensional confounding variables, proposed by Clarke and Polselli (2025). The package provides functionalities to: (a) learn nuisance functions with machine learning algorithms from the `mlr3` ecosystem, (b) handle unobserved individual heterogeneity choosing among first-difference transformation, within-group transformation, and correlated random effects, (c) transform the covariates with min-max normalization and polynomial expansion to improve learning performance. We showcase the use of `xtdml` with both simulated and real longitudinal data. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.15965 |
| By: | Zhenyu Gao; Wenxi Jiang; Yutong Yan |
| Abstract: | We develop a statistical test to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using state-of-the-art pre-training data detection techniques, we estimate the likelihood that a given prompt appeared in an LLM's training corpus, a statistic we term Lookahead Propensity (LAP). We formally show that a positive correlation between LAP and forecast accuracy indicates the presence and magnitude of lookahead bias, and apply the test to two forecasting tasks: news headlines predicting stock returns and earnings call transcripts predicting capital expenditures. Our test provides a cost-efficient, diagnostic tool for assessing the validity and reliability of LLM-generated forecasts. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.23847 |
| By: | Calil, Yuri Clements Daglia |
| Keywords: | Livestock Production/Industries, Marketing, Agricultural Finance |
| Date: | 2024 |
| URL: | https://d.repec.org/n?u=RePEc:ags:aaea24:343928 |
| By: | Yuhan Hou; Tianji Rao; Jeremy Tan; Adler Viton; Xiyue Zhang; David Ye; Abhishek Kodi; Sanjana Dulam; Aditya Paul; Yikai Feng |
| Abstract: | The Federal Open Market Committee (FOMC) sets the federal funds rate, shaping monetary policy and the broader economy. We introduce \emph{FedSight AI}, a multi-agent framework that uses large language models (LLMs) to simulate FOMC deliberations and predict policy outcomes. Member agents analyze structured indicators and unstructured inputs such as the Beige Book, debate options, and vote, replicating committee reasoning. A Chain-of-Draft (CoD) extension further improves efficiency and accuracy by enforcing concise multistage reasoning. Evaluated at 2023-2024 meetings, FedSight CoD achieved accuracy of 93.75\% and stability of 93.33\%, outperforming baselines including MiniFed and Ordinal Random Forest (RF), while offering transparent reasoning aligned with real FOMC communications. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.15728 |
| By: | Alina Voronina; Oleksandr Romanko; Ruiwen Cao; Roy H. Kwon; Rafael Mendoza-Arriaga |
| Abstract: | This paper investigates how Large Language Models (LLMs) from leading providers (OpenAI, Google, Anthropic, DeepSeek, and xAI) can be applied to quantitative sector-based portfolio construction. We use LLMs to identify investable universes of stocks within S&P 500 sector indices and evaluate how their selections perform when combined with classical portfolio optimization methods. Each model was prompted to select and weight 20 stocks per sector, and the resulting portfolios were compared with their respective sector indices across two distinct out-of-sample periods: a stable market phase (January-March 2025) and a volatile phase (April-June 2025). Our results reveal a strong temporal dependence in LLM portfolio performance. During stable market conditions, LLM-weighted portfolios frequently outperformed sector indices on both cumulative return and risk-adjusted (Sharpe ratio) measures. However, during the volatile period, many LLM portfolios underperformed, suggesting that current models may struggle to adapt to regime shifts or high-volatility environments underrepresented in their training data. Importantly, when LLM-based stock selection is combined with traditional optimization techniques, portfolio outcomes improve in both performance and consistency. This study contributes one of the first multi-model, cross-provider evaluations of generative AI algorithms in investment management. It highlights that while LLMs can effectively complement quantitative finance by enhancing stock selection and interpretability, their reliability remains market-dependent. The findings underscore the potential of hybrid AI-quantitative frameworks, integrating LLM reasoning with established optimization techniques, to produce more robust and adaptive investment strategies. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.24526 |
| By: | Filippo Gusella; Eugenio Vicario |
| Abstract: | Results in the Heterogeneous Agent Model (HAM) literature determine the proportion of fundamentalists and trend followers in the financial market. This proportion varies according to the periods analyzed. In this paper, we use a large language model (LLM) to construct a generative agent (GA) that determines the probability of adopting one of the two strategies based on current information. The probabilities of strategy adoption are compared with those in the HAM literature for the S&P 500 index between 1990 and 2020. Our findings suggest that the resulting artificial intelligence (AI) expectations align with those reported in the HAM literature. At the same time, extending the analysis to artificial market data helps us to filter the decision-making process of the AI agent. In the artificial market, results confirm the heterogeneity in expectations but reveal systematic asymmetry toward the fundamentalist behavior. |
| Keywords: | Heterogeneous Expectations, Large Language Model, Generative Agent, Funda mentalists, Trend Followers |
| JEL: | E30 E70 D84 |
| Date: | 2025 |
| URL: | https://d.repec.org/n?u=RePEc:frz:wpaper:wp2025_18.rdf |
| By: | Zihan Zhang; Lianyan Fu; Dehui Wang |
| Abstract: | Estimating causal effects from observational network data faces dual challenges of network interference and unmeasured confounding. To address this, we propose a general Difference-in-Differences framework that integrates double negative controls (DNC) and graph neural networks (GNNs). Based on the modified parallel trends assumption and DNC, semiparametric identification of direct and indirect causal effects is established. We then propose doubly robust estimators. Specifically, an approach combining GNNs with the generalized method of moments is developed to estimate the functions of high-dimensional covariates and network structure. Furthermore, we derive the estimator's asymptotic normality under the $\psi$-network dependence and approximate neighborhood interference. Simulations show the finite-sample performance of our estimators. Finally, we apply our method to analyze the impact of China's green credit policy on corporate green innovation. |
| Date: | 2026–01 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2601.00603 |
| By: | Sergio Alvarez-Telena; Marta Diez-Fernandez |
| Abstract: | In light of the recent convergence between Agentic AI and our field of Algorithmization, this paper seeks to restore conceptual clarity and provide a structured analytical framework for an increasingly fragmented discourse. First, (a) it examines the contemporary landscape and proposes precise definitions for the key notions involved, ranging from intelligence to Agentic AI. Second, (b) it reviews our prior body of work to contextualize the evolution of methodologies and technological advances developed over the past decade, highlighting their interdependencies and cumulative trajectory. Third, (c) by distinguishing Machine and Learning efforts within the field of Machine Learning (d) it introduces the first Machine in Machine Learning (M1) as the underlying platform enabling today's LLM-based Agentic AI, conceptualized as an extension of B2C information-retrieval user experiences now being repurposed for B2B transformation. Building on this distinction, (e) the white paper develops the notion of the second Machine in Machine Learning (M2) as the architectural prerequisite for holistic, production-grade B2B transformation, characterizing it as Strategies-based Agentic AI and grounding its definition in the structural barriers-to-entry that such systems must overcome to be operationally viable. Further, (f) it offers conceptual and technical insight into what appears to be the first fully realized implementation of an M2. Finally, drawing on the demonstrated accuracy of the two previous decades of professional and academic experience in developing the foundational architectures of Algorithmization, (g) it outlines a forward-looking research and transformation agenda for the coming two decades. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.24856 |
| By: | Tim J. Boonen; Xinyue Fan; Zixiao Quan |
| Abstract: | Machine learning improves predictive accuracy in insurance pricing but exacerbates trade-offs between competing fairness criteria across different discrimination measures, challenging regulators and insurers to reconcile profitability with equitable outcomes. While existing fairness-aware models offer partial solutions under GLM and XGBoost estimation methods, they remain constrained by single-objective optimization, failing to holistically navigate a conflicting landscape of accuracy, group fairness, individual fairness, and counterfactual fairness. To address this, we propose a novel multi-objective optimization framework that jointly optimizes all four criteria via the Non-dominated Sorting Genetic Algorithm II (NSGA-II), generating a diverse Pareto front of trade-off solutions. We use a specific selection mechanism to extract a premium on this front. Our results show that XGBoost outperforms GLM in accuracy but amplifies fairness disparities; the Orthogonal model excels in group fairness, while Synthetic Control leads in individual and counterfactual fairness. Our method consistently achieves a balanced compromise, outperforming single-model approaches. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.24747 |
| By: | Jianping Mei; Michael Moses; Jan Waelty; Yucheng Yang |
| Abstract: | We study how deep learning can improve valuation in the art market by incorporating the visual content of artworks into predictive models. Using a large repeated-sales dataset from major auction houses, we benchmark classical hedonic regressions and tree-based methods against modern deep architectures, including multi-modal models that fuse tabular and image data. We find that while artist identity and prior transaction history dominate overall predictive power, visual embeddings provide a distinct and economically meaningful contribution for fresh-to-market works where historical anchors are absent. Interpretability analyses using Grad-CAM and embedding visualizations show that models attend to compositional and stylistic cues. Our findings demonstrate that multi-modal deep learning delivers significant value precisely when valuation is hardest, namely first-time sales, and thus offers new insights for both academic research and practice in art market valuation. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.23078 |
| By: | Linuk Perera |
| Abstract: | This research introduces a novel quantitative methodology tailored for quantitative finance applications, enabling banks, stockbrokers, and investors to predict economic regimes and market signals in emerging markets, specifically Sri Lankan stock indices (S&P SL20 and ASPI) by integrating Environmental, Social, and Governance (ESG) sentiment analysis with macroeconomic indicators and advanced time-series forecasting. Designed to leverage quantitative techniques for enhanced risk assessment, portfolio optimization, and trading strategies in volatile environments, the architecture employs FinBERT, a transformer-based NLP model, to extract sentiment from ESG texts, followed by unsupervised clustering (UMAP/HDBSCAN) to identify 5 latent ESG regimes, validated via PCA. These regimes are mapped to economic conditions using a dense neural network and gradient boosting classifier, achieving 84.04% training and 82.0% validation accuracy. Concurrently, time-series models (SRNN, MLP, LSTM, GRU) forecast daily closing prices, with GRU attaining an R-squared of 0.801 and LSTM delivering 52.78% directional accuracy on intraday data. A strong correlation between S&P SL20 and S&P 500, observed through moving average and volatility trend plots, further bolsters forecasting precision. A rule-based fusion logic merges ESG and time-series outputs for final market signals. By addressing literature gaps that overlook emerging markets and holistic integration, this quant-driven framework combines global correlations and local sentiment analysis to offer scalable, accurate tools for quantitative finance professionals navigating complex markets like Sri Lanka. |
| Date: | 2025–12 |
| URL: | https://d.repec.org/n?u=RePEc:arx:papers:2512.20216 |