|
on Big Data |
By: | Nicolas Apfel; Holger Breinlich; Nick Green; Dennis Novy; J. M. C. Santos Silva; Tom Zylkin |
Abstract: | Gravity equations are often used to evaluate counterfactual trade policy scenarios, such as the effect of regional trade agreements on trade flows. In this paper, we argue that the suitability of gravity equations for this purpose crucially depends on their out-of-sample predictive power. We propose a methodology that compares different versions of the gravity equation, both among themselves and with machine learning-based forecast methods such as random forests and neural networks. We find that the 3-way gravity model is difficult to beat in terms of out-of-sample average predictive performance, further justifying its place as the predominant tool for applied trade policy analysis. However, when the goal is to predict individual bilateral trade flows, the 3-way model can be outperformed by an ensemble machine learning method. |
Date: | 2025–09 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.11271 |
By: | Thackway, William; Soundararaj, Balamurugan; Pettit, Christopher |
Abstract: | Despite housing supply shortages in financialised housing markets and acknowledgement of planning application (PA) assessment times as a supply side constraint, reliable and accessible information on PA assessment timeframes is limited. It is in this context that we built a model to predict and explain PA assessment timeframes in New South Wales, Australia. We constructed a dataset of 17, 000 PAs (submitted over 3 years) comprising PA attributes, environmental and zoning restrictions, and features derived from PA descriptions using natural language processing techniques. Quantile regression was applied using machine learning modelling to predict probabilistic intervals for assessment timeframes. We then employed an advanced model explanation tool to analyse feature contributions on an overall and individual PA basis. The best performing model, an extreme gradient boosted machine (XGB), achieved an R2 of 0.431, predicting 60.9% of assessment times within one month of actual values. While performance is moderate, the model significantly improves upon previous studies and the current best practice in NSW, which is simply average estimates, by council area, for PA assessment timeframes. The paper concludes by outlining suggestions for further improving model performance and on the benefits of a predictive tool for planners. |
Date: | 2025–09–18 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:prm25_v1 |
By: | Matilda Baret (University of Orléans); Yannick Lucotte (University of Orléans); Sessi Tokpavi (University of Orléans) |
Abstract: | The 21st century is facing climate change challenge, which has rapidly intensified, impacting global systems in various ways. The need to mitigate climate change necessitates deep, fast and sustainable reductions in greenhouse gas (GHG) emissions. Efforts should go through several channels including Scope 3 emissions, which encompass indirect emissions from a company’s entire value chain. However, accurately estimating Scope 3 emissions at the company level remains challenging due to data scarcity and reliability issues. This paper presents a new empirical methodology designed to estimate Scope 3 emissions at the company level, taking into account the dynamics of value chains and company-specific factors. Using input-output tables and sectoral emissions data, we reconstruct company value chains and calculate emissions from upstream and downstream sectors. We address the challenge of missing data by using parametric and machine learning techniques, to predict both reported and unreported emissions. Our model, applied to French companies’ data, shows that company-specific characteristics play a key role in Scope 3 emissions, and sectors’ emissions in the value chain as a whole significantly influence Scope 3 emissions. The results suggest that machine learning models, particularly Random Forest, outperform traditional models in predicting Scope 3 emissions. The study also highlights the importance of expanding data reporting and designing comprehensive climate policies to better manage emissions across all sectors. |
Keywords: | Climate change, climate policy, Scope 3 Emissions, value chain, machine learning, estimation, prediction |
JEL: | C Q |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:inf:wpaper:2025.12 |
By: | Hao Wang; Jingshu Peng; Yanyan Shen; Xujia Li; Lei Chen |
Abstract: | Stock recommendation is critical in Fintech applications, which use price series and alternative information to estimate future stock performance. Although deep learning models are prevalent in stock recommendation systems, traditional time-series forecasting training often fails to capture stock trends and rankings simultaneously, which are essential consideration factors for investors. To tackle this issue, we introduce a Multi-Task Learning (MTL) framework for stock recommendation, \textbf{M}omentum-\textbf{i}ntegrated \textbf{M}ulti-task \textbf{Stoc}k \textbf{R}ecommendation with Converge-based Optimization (\textbf{MiM-StocR}). To improve the model's ability to capture short-term trends, we novelly invoke a momentum line indicator in model training. To prioritize top-performing stocks and optimize investment allocation, we propose a list-wise ranking loss function called Adaptive-k ApproxNDCG. Moreover, due to the volatility and uncertainty of the stock market, existing MTL frameworks face overfitting issues when applied to stock time series. To mitigate this issue, we introduce the Converge-based Quad-Balancing (CQB) method. We conducted extensive experiments on three stock benchmarks: SEE50, CSI 100, and CSI 300. MiM-StocR outperforms state-of-the-art MTL baselines across both ranking and profitable evaluations. |
Date: | 2025–08 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.10461 |
By: | Daniel J. Wilson |
Abstract: | Understanding the effects of weather on macroeconomic data is critically important, but it is hampered by limited time series observations. Utilizing geographically granular panel data leverages greater observations but introduces a “missing intercept” problem: “global” (e.g., nationwide spillovers and GE) effects are absorbed by time fixed effects. Standard solutions are infeasible when the number of global regressors is large. To overcome these problems and estimate granular, global, and total weather effects, we implement a two-step approach utilizing machine learning techniques. We apply this approach to estimate weather effects on U.S. monthly employment growth, obtaining several novel findings: (1) weather, and especially its lags, has substantial explanatory power for local employment growth, (2) shocks to both granular and global weather have significant immediate impacts on a broad set of macroeconomic outcomes, (3) responses to granular shocks are short-lived while those to global shocks are more persistent, (4) favorable weather shocks are often more impactful than unfavorable shocks, and (5) responses of most macroeconomic outcomes to weather shocks have been stable over time but the consumption response has fallen. |
Keywords: | weather; Macroeconomic fluctuations; employment growth; granular shocks |
JEL: | Q52 Q54 R11 |
Date: | 2025–09–23 |
URL: | https://d.repec.org/n?u=RePEc:fip:fedfwp:101766 |
By: | Yi Lu; Aifan Ling; Chaoqun Wang; Yaxin Xu |
Abstract: | In recent years, China's bond market has seen a surge in defaults amid regulatory reforms and macroeconomic volatility. Traditional machine learning models struggle to capture financial data's irregularity and temporal dependencies, while most deep learning models lack interpretability-critical for financial decision-making. To tackle these issues, we propose EMDLOT (Explainable Multimodal Deep Learning for Time-series), a novel framework for multi-class bond default prediction. EMDLOT integrates numerical time-series (financial/macroeconomic indicators) and unstructured textual data (bond prospectuses), uses Time-Aware LSTM to handle irregular sequences, and adopts soft clustering and multi-level attention to boost interpretability. Experiments on 1994 Chinese firms (2015-2024) show EMDLOT outperforms traditional (e.g., XGBoost) and deep learning (e.g., LSTM) benchmarks in recall, F1-score, and mAP, especially in identifying default/extended firms. Ablation studies validate each component's value, and attention analyses reveal economically intuitive default drivers. This work provides a practical tool and a trustworthy framework for transparent financial risk modeling. |
Date: | 2025–09 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.10802 |
By: | Yufei Sun (Faculty of Economic Sciences, University of Warsaw) |
Abstract: | Pair trading remains a cornerstone strategy in quantitative finance, having consistently attracted scholarly attention from both economists and computer scientists. Over recent decades, research has expanded beyond traditional linear frameworks—such as regression- and cointegration-based models—to embrace advanced methodologies, including machine learning (ML), deep learning (DL), reinforcement learning (RL), and deep reinforcement learning (DRL). These techniques have demonstrated superior capacity to capture nonlinear dependencies and complex dynamics in financial data, thereby enhancing predictive performance and strategy design. Building on these academic developments, practitioners are increasingly deploying DL models to forecast asset price movements and volatility in equity and foreign exchange markets, leveraging the advantages of artificial intelligence (AI) for trading. In parallel, DRL has gained prominence in algorithmic trading, where agents can autonomously learn optimal trading policies by interacting with market environments, enabling systems that move beyond price prediction to dynamic signal generation and portfolio allocation. This paper provides a comprehensive survey of ML-, DL-, RL-, and DRL-based approaches to pair trading within quantitative finance. By systematically reviewing existing studies and highlighting their methodological contributions, it offers researchers a structured foundation for replication and further development. In addition, the paper outlines promising avenues for future research that extend the application of AI-driven methods in statistical arbitrage and market microstructure analysis. |
Keywords: | Pair Trading, Machine Learning, Deep Learning, Reinforcement Learning, Deep Reinforcement Learning, Artificial Intelligence, Quantitative Trading |
JEL: | C4 C45 C55 C65 G11 |
Date: | 2025 |
URL: | https://d.repec.org/n?u=RePEc:war:wpaper:2025-22 |
By: | yunwoo, Kim; Hwang, Junhyuk |
Abstract: | Existing ESG ratings have limitations like disclosure delays, inconsistencies, and uneven coverage, particularly in non-English markets. This paper addresses these issues by establishing the first machine learning benchmark for ESG prediction in the Korean market using news-derived time-series features. A standardized dataset of 278 Korean firms was constructed, and monthly sentiment and ESG-relevance features were generated from news using Korean-specific language models. A mask-aware CNN explicitly handles missing data by distinguishing observed months from imputed ones. The model achieved a Mean Absolute Error (MAE) of 17.9, a Root Mean Squared Error (RMSE) of 22.0, an 𝑅2 of 0.12, and a Spearman’s 𝜌 of 0.38, demonstrating that temporal modeling and explicit handling of missing data are crucial for improving predictive accuracy. |
Date: | 2025–09–12 |
URL: | https://d.repec.org/n?u=RePEc:osf:socarx:v2738_v1 |
By: | Dalenda Ben Ahmed (RED-ISGG - Recherche, Entreprises et décision - ISGGB - Institut Supérieur de Gestion de Gabès (Université de Gabès)); Jamel Eddine Henchiri (RED-ISGG - Recherche, Entreprises et décision - ISGGB - Institut Supérieur de Gestion de Gabès (Université de Gabès)) |
Abstract: | Purpose: This work is part of the behavioral literature, whose objective is to evaluate the capacity of investor sentiment to predict the Covid-19 crisis on the French and American stock markets. Design/methodology/approach: Our study is based on all companies listed in the CAC40 index for France and the Dow Jones index for the US during the first half of 2020, when the Covid19 pandemic started. The statistical methods "Time Series" model used to test our hypothesis on the contribution of investor sentiment, the interest rate and the inflation rate to development of stock market crises. Findings: The results show that the Covid-19 crisis is positively and significantly explained by investor sentiment. While, the interest rate and inflation negatively influence the probability of a stock market crisis. The results also show that the inclusion of psychological factors improves the explanatory power of our alert model and proves to be effective in predicting stock market crises. Originality: This work can be considered as the first one to evaluate the cointegration between the Covid-19 crisis and the investor's sentiment. |
Keywords: | Covid-19 -stock market crisis -investor sentiment -cointgration |
Date: | 2025–05–07 |
URL: | https://d.repec.org/n?u=RePEc:hal:journl:hal-05252697 |
By: | Adebola K. Ojo; Ifechukwude Jude Okafor |
Abstract: | Investors and stock market analysts face major challenges in predicting stock returns and making wise investment decisions. The predictability of equity stock returns can boost investor confidence, but it remains a difficult task. To address this issue, a study was conducted using a Long Short-term Memory (LSTM) model to predict future stock market movements. The study used a historical dataset from the Nigerian Stock Exchange (NSE), which was cleaned and normalized to design the LSTM model. The model was evaluated using performance metrics and compared with other deep learning models like Artificial and Convolutional Neural Networks (CNN). The experimental results showed that the LSTM model can predict future stock market prices and returns with over 90% accuracy when trained with a reliable dataset. The study concludes that LSTM models can be useful in predicting financial time-series-related problems if well-trained. Future studies should explore combining LSTM models with other deep learning techniques like CNN to create hybrid models that mitigate the risks associated with relying on a single model for future equity stock predictions. |
Date: | 2025–05 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2507.01964 |
By: | Yijia Xiao; Edward Sun; Tong Chen; Fang Wu; Di Luo; Wei Wang |
Abstract: | Developing professional, structured reasoning on par with human financial analysts and traders remains a central challenge in AI for finance, where markets demand interpretability and trust. Traditional time-series models lack explainability, while LLMs face challenges in turning natural-language analysis into disciplined, executable trades. Although reasoning LLMs have advanced in step-by-step planning and verification, their application to risk-sensitive financial decisions is underexplored. We present Trading-R1, a financially-aware model that incorporates strategic thinking and planning for comprehensive thesis composition, facts-grounded analysis, and volatility-adjusted decision making. Trading-R1 aligns reasoning with trading principles through supervised fine-tuning and reinforcement learning with a three-stage easy-to-hard curriculum. Training uses Tauric-TR1-DB, a 100k-sample corpus spanning 18 months, 14 equities, and five heterogeneous financial data sources. Evaluated on six major equities and ETFs, Trading-R1 demonstrates improved risk-adjusted returns and lower drawdowns compared to both open-source and proprietary instruction-following models as well as reasoning models. The system generates structured, evidence-based investment theses that support disciplined and interpretable trading decisions. Trading-R1 Terminal will be released at https://github.com/TauricResearch/Tradin g-R1. |
Date: | 2025–09 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.11420 |
By: | Sagi Schwartz; Qinling Wang; Fang Fang |
Abstract: | Predicting default is essential for banks to ensure profitability and financial stability. While modern machine learning methods often outperform traditional regression techniques, their lack of transparency limits their use in regulated environments. Explainable artificial intelligence (XAI) has emerged as a solution in domains like credit scoring. However, most XAI research focuses on post-hoc interpretation of black-box models, which does not produce models lightweight or transparent enough to meet regulatory requirements, such as those for Internal Ratings-Based (IRB) models. This paper proposes a hybrid approach: post-hoc interpretations of black-box models guide feature selection, followed by training glass-box models that maintain both predictive power and transparency. Using the Lending Club dataset, we demonstrate that this approach achieves performance comparable to a benchmark black-box model while using only 10 features - an 88.5% reduction. In our example, SHapley Additive exPlanations (SHAP) is used for feature selection, eXtreme Gradient Boosting (XGBoost) serves as the benchmark and the base black-box model, and Explainable Boosting Machine (EBM) and Penalized Logistic Tree Regression (PLTR) are the investigated glass-box models. We also show that model refinement using feature interaction analysis, correlation checks, and expert input can further enhance model interpretability and robustness. |
Date: | 2025–09 |
URL: | https://d.repec.org/n?u=RePEc:arx:papers:2509.11389 |