nep-big 2024-09-02 papers

on Big Data

Issue of 2024‒09‒02
28 papers chosen by
Tom Coupé, University of Canterbury

Data-driven Investors By Bonelli, Maxime
Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings By Felix Drinkall; Janet B. Pierrehumbert; Stefan Zohren
Deep Learning for Economists By Melissa Dell
FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets By Xiaohui Victor Li; Francesco Sanna Passino
Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets By Liyang Wang; Yu Cheng; Xingxin Gu; Zhizhong Wu
Distilling interpretable causal trees from causal forests By Patrick Rehill
Beyond Trend Following: Deep Learning for Market Trend Prediction By Fernando Berzal; Alberto Garcia
Machine learning in weekly movement prediction By Han Gui
Machine Learning and IRB Capital Requirements: Advantages, Risks, and Recommendations By Hurlin, Christophe; Pérignon, Christophe
Enhancing Black-Scholes Delta Hedging via Deep Learning By Chunhui Qiao; Xiangwei Wan
Calibrating the Heston Model with Deep Differential Networks By Chen Zhang; Giovanni Amici; Marco Morandotti
Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer By Siqiao Zhao; Zhikang Dong; Zeyu Cao; Raphael Douady
Nowcasting R&D Expenditures: A Machine Learning Approach By Atin Aboutorabi; Ga\'etan de Rassenfosse
Climate-Driven Doubling of Maize Loss Probability in U.S. Crop Insurance: Spatiotemporal Prediction and Possible Policy Responses By A Samuel Pottinger; Lawson Connor; Brookie Guzder-Williams; Maya Weltman-Fahs; Timothy Bowles
Deep learning for quadratic hedging in incomplete jump market By Nacira Agram; Bernt {\O}ksendal; Jan Rems
NeuralBeta: Estimating Beta Using Deep Learning By Yuxin Liu; Jimin Lin; Achintya Gopal
International Trade Flow Prediction with Bilateral Trade Provisions By Zijie Pan; Stepan Gordeev; Jiahui Zhao; Ziyi Meng; Caiwen Ding; Sandro Steinbach; Dongjin Song
Public Perceptions of Canada’s Investment Climate By Flora Lutz; Yuanchen Yang; Chengyu Huang
Comparative analysis of Mixed-Data Sampling (MIDAS) model compared to Lag-Llama model for inflation nowcasting By Adam Bahelka; Harmen de Weerd
Credit Risk Assessment Model for UAE Commercial Banks: A Machine Learning Approach By Aditya Saxena; Dr Parizad Dungore
The heterogeneous impact of the EU-Canada agreement with causal machine By Lionel Fontagn\'e; Francesca Micocci; Armando Rungi
Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity By Alireza Mohammadshafie; Akram Mirzaeinia; Haseebullah Jumakhan; Amir Mirzaeinia
Leveraging Natural Language and Item Response Theory Models for ESG Scoring By C\'esar Pedrosa Soares
A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading By Yuan Li; Bingqiao Luo; Qian Wang; Nuo Chen; Xu Liu; Bingsheng He
Explainable AI in Request-for-Quote By Qiqin Zhou
Big data and firm-level productivity: A cross-country comparison By Andres, Raphaela; Niebel, Thomas; Sack, Robin
Quantile Regression using Random Forest Proximities By Mingshu Li; Bhaskarjit Sarmah; Dhruv Desai; Joshua Rosaler; Snigdha Bhagat; Philip Sommer; Dhagash Mehta
Machine Learning-based Relative Valuation of Municipal Bonds By Preetha Saha; Jingrao Lyu; Dhruv Desai; Rishab Chauhan; Jerinsh Jeyapaulraj; Philip Sommer; Dhagash Mehta

By:	Bonelli, Maxime (HEC Paris)
Abstract:	Using data technologies, like machine learning, investors can gain a comparative advantage in forecasting outcomes frequently observed in historical data. I investigate the implications for capital allocation using venture capitalists (VCs) as a laboratory. VCs adopting data technologies tilt their investments towards startups developing businesses similar to those already explored, and become better at avoiding failures within this pool. However, these VCs become concurrently less likely to pick startups achieving rare major success. Plausibly exogenous variations in VCs' screening automation suggest a causality between data technologies adoption and these effects. These findings highlight potential downsides of investors embracing data technologies.
Keywords:	big data; machine learning; artificial intelligence; venture capital; entrepreneurship; innovation; capital allocation
JEL:	G24 L26 O30
Date:	2023–02–22
URL:	https://d.repec.org/n?u=RePEc:ebg:heccah:1470

Traditional Methods Outperform Generative LLMs at Forecasting Credit Ratings

By:	Felix Drinkall; Janet B. Pierrehumbert; Stefan Zohren
Abstract:	Large Language Models (LLMs) have been shown to perform well for many downstream tasks. Transfer learning can enable LLMs to acquire skills that were not targeted during pre-training. In financial contexts, LLMs can sometimes beat well-established benchmarks. This paper investigates how well LLMs perform in the task of forecasting corporate credit ratings. We show that while LLMs are very good at encoding textual information, traditional methods are still very competitive when it comes to encoding numeric and multimodal data. For our task, current LLMs perform worse than a more traditional XGBoost architecture that combines fundamental and macroeconomic data with high-density text-based embedding features.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.17624

Deep Learning for Economists

By:	Melissa Dell
Abstract:	Deep learning provides powerful methods to impute structured information from large-scale, unstructured text and image datasets. For example, economists might wish to detect the presence of economic activity in satellite images, or to measure the topics or entities mentioned in social media, the congressional record, or firm filings. This review introduces deep neural networks, covering methods such as classifiers, regression models, generative AI, and embedding models. Applications include classification, document digitization, record linkage, and methods for data exploration in massive scale text and image corpora. When suitable methods are used, deep learning models can be cheap to tune and can scale affordably to problems involving millions or billions of data points.. The review is accompanied by a companion website, EconDL, with user-friendly demo notebooks, software resources, and a knowledge base that provides technical details and additional applications.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.15339

FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets

By:	Xiaohui Victor Li; Francesco Sanna Passino
Abstract:	Dynamic knowledge graphs (DKGs) are popular structures to express different types of connections between objects over time. They can also serve as an efficient mathematical tool to represent information extracted from complex unstructured data sources, such as text or images. Within financial applications, DKGs could be used to detect trends for strategic thematic investing, based on information obtained from financial news articles. In this work, we explore the properties of large language models (LLMs) as dynamic knowledge graph generators, proposing a novel open-source fine-tuned LLM for this purpose, called the Integrated Contextual Knowledge Graph Generator (ICKG). We use ICKG to produce a novel open-source DKG from a corpus of financial news articles, called FinDKG, and we propose an attention-based GNN architecture for analysing it, called KGTransformer. We test the performance of the proposed model on benchmark datasets and FinDKG, demonstrating superior performance on link prediction tasks. Additionally, we evaluate the performance of the KGTransformer on FinDKG for thematic investing, showing it can outperform existing thematic ETFs.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.10909

Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets

By:	Liyang Wang; Yu Cheng; Xingxin Gu; Zhizhong Wu
Abstract:	With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions. This paper designs and optimizes a risk monitoring system based on big data and machine learning. By constructing a four-layer architecture, it effectively integrates large-scale financial data and advanced machine learning algorithms. Key technologies employed in the system include Long Short-Term Memory (LSTM) networks, Random Forest, Gradient Boosting Trees, and real-time data processing platform Apache Flink, ensuring the real-time and accurate nature of risk monitoring. Research findings demonstrate that the system significantly enhances efficiency and accuracy in risk management, particularly excelling in identifying and warning against market crash risks.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.19352

Distilling interpretable causal trees from causal forests

By:	Patrick Rehill
Abstract:	Machine learning methods for estimating treatment effect heterogeneity promise greater flexibility than existing methods that test a few pre-specified hypotheses. However, one problem these methods can have is that it can be challenging to extract insights from complicated machine learning models. A high-dimensional distribution of conditional average treatment effects may give accurate, individual-level estimates, but it can be hard to understand the underlying patterns; hard to know what the implications of the analysis are. This paper proposes the Distilled Causal Tree, a method for distilling a single, interpretable causal tree from a causal forest. This compares well to existing methods of extracting a single tree, particularly in noisy data or high-dimensional data where there are many correlated features. Here it even outperforms the base causal forest in most simulations. Its estimates are doubly robust and asymptotically normal just as those of the causal forest are.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.01023

Beyond Trend Following: Deep Learning for Market Trend Prediction

By:	Fernando Berzal; Alberto Garcia
Abstract:	Trend following and momentum investing are common strategies employed by asset managers. Even though they can be helpful in the proper situations, they are limited in the sense that they work just by looking at past, as if we were driving with our focus on the rearview mirror. In this paper, we advocate for the use of Artificial Intelligence and Machine Learning techniques to predict future market trends. These predictions, when done properly, can improve the performance of asset managers by increasing returns and reducing drawdowns.
Date:	2024–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.13685

Machine learning in weekly movement prediction

By:	Han Gui
Abstract:	To predict the future movements of stock markets, numerous studies concentrate on daily data and employ various machine learning (ML) models as benchmarks that often vary and lack standardization across different research works. This paper tries to solve the problem from a fresh standpoint by aiming to predict the weekly movements, and introducing a novel benchmark of random traders. This benchmark is independent of any ML model, thus making it more objective and potentially serving as a commonly recognized standard. During training process, apart from the basic features such as technical indicators, scaling laws and directional changes are introduced as additional features, furthermore, the training datasets are also adjusted by assigning varying weights to different samples, the weighting approach allows the models to emphasize specific samples. On back-testing, several trained models show good performance, with the multi-layer perception (MLP) demonstrating stability and robustness across extensive and comprehensive data that include upward, downward and cyclic trends. The unique perspective of this work that focuses on weekly movements, incorporates new features and creates an objective benchmark, contributes to the existing literature on stock market prediction.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.09831

Machine Learning and IRB Capital Requirements: Advantages, Risks, and Recommendations

By:	Hurlin, Christophe (University of Orleans); Pérignon, Christophe (HEC Paris)
Abstract:	This survey proposes a theoretical and practical reflection on the use of machine learning methods in the context of the Internal Ratings Based (IRB) approach to banks' capital requirements. While machine learning is still rarely used in the regulatory domain (IRB, IFRS 9, stress tests), recent discussions initiated by the European Banking Authority suggest that this may change in the near future. While technically complex, this subject is crucial given growing concerns about the potential financial instability caused by the banks' use of opaque internal models. Conversely, for their proponents, machine learning models offer the prospect of better measurement of credit risk and enhancing financial inclusion. This survey yields several conclusions and recommendations regarding (i) the accuracy of risk parameter estimations, (ii) the level of regulatory capital, (iii) the trade-off between performance and interpretability, (iv) international banking competition, and (v) the governance and operational risks of machine learning models.
Keywords:	Banking; Machine Learning; Artificial Intelligence; Internal models; Prudential regulation; Regulatory capital
JEL:	C10 C38 C55 G21 G29
Date:	2023–06–25
URL:	https://d.repec.org/n?u=RePEc:ebg:heccah:1480

Enhancing Black-Scholes Delta Hedging via Deep Learning

By:	Chunhui Qiao; Xiangwei Wan
Abstract:	This paper proposes a deep delta hedging framework for options, utilizing neural networks to learn the residuals between the hedging function and the implied Black-Scholes delta. This approach leverages the smoother properties of these residuals, enhancing deep learning performance. Utilizing ten years of daily S&P 500 index option data, our empirical analysis demonstrates that learning the residuals, using the mean squared one-step hedging error as the loss function, significantly improves hedging performance over directly learning the hedging function, often by more than 100%. Adding input features when learning the residuals enhances hedging performance more for puts than calls, with market sentiment being less crucial. Furthermore, learning the residuals with three years of data matches the hedging performance of directly learning with ten years of data, proving that our method demands less data.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.19367

Calibrating the Heston Model with Deep Differential Networks

By:	Chen Zhang; Giovanni Amici; Marco Morandotti
Abstract:	We propose a gradient-based deep learning framework to calibrate the Heston option pricing model (Heston, 1993). Our neural network, henceforth deep differential network (DDN), learns both the Heston pricing formula for plain-vanilla options and the partial derivatives with respect to the model parameters. The price sensitivities estimated by the DDN are not subject to the numerical issues that can be encountered in computing the gradient of the Heston pricing function. Thus, our network is an excellent pricing engine for fast gradient-based calibrations. Extensive tests on selected equity markets show that the DDN significantly outperforms non-differential feedforward neural networks in terms of calibration accuracy. In addition, it dramatically reduces the computational time with respect to global optimizers that do not use gradient information.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.15536

Hedge Fund Portfolio Construction Using PolyModel Theory and iTransformer

By:	Siqiao Zhao; Zhikang Dong; Zeyu Cao; Raphael Douady
Abstract:	When constructing portfolios, a key problem is that a lot of financial time series data are sparse, making it challenging to apply machine learning methods. Polymodel theory can solve this issue and demonstrate superiority in portfolio construction from various aspects. To implement the PolyModel theory for constructing a hedge fund portfolio, we begin by identifying an asset pool, utilizing over 10, 000 hedge funds for the past 29 years' data. PolyModel theory also involves choosing a wide-ranging set of risk factors, which includes various financial indices, currencies, and commodity prices. This comprehensive selection mirrors the complexities of the real-world environment. Leveraging on the PolyModel theory, we create quantitative measures such as Long-term Alpha, Long-term Ratio, and SVaR. We also use more classical measures like the Sharpe ratio or Morningstar's MRAR. To enhance the performance of the constructed portfolio, we also employ the latest deep learning techniques (iTransformer) to capture the upward trend, while efficiently controlling the downside, using all the features. The iTransformer model is specifically designed to address the challenges in high-dimensional time series forecasting and could largely improve our strategies. More precisely, our strategies achieve better Sharpe ratio and annualized return. The above process enables us to create multiple portfolio strategies aiming for high returns and low risks when compared to various benchmarks.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.03320

Nowcasting R&D Expenditures: A Machine Learning Approach

By:	Atin Aboutorabi; Ga\'etan de Rassenfosse
Abstract:	Macroeconomic data are crucial for monitoring countries' performance and driving policy. However, traditional data acquisition processes are slow, subject to delays, and performed at a low frequency. We address this 'ragged-edge' problem with a two-step framework. The first step is a supervised learning model predicting observed low-frequency figures. We propose a neural-network-based nowcasting model that exploits mixed-frequency, high-dimensional data. The second step uses the elasticities derived from the previous step to interpolate unobserved high-frequency figures. We apply our method to nowcast countries' yearly research and development (R&D) expenditure series. These series are collected through infrequent surveys, making them ideal candidates for this task. We exploit a range of predictors, chiefly Internet search volume data, and document the relevance of these data in improving out-of-sample predictions. Furthermore, we leverage the high frequency of our data to derive monthly estimates of R&D expenditures, which are currently unobserved. We compare our results with those obtained from the classical regression-based and the sparse temporal disaggregation methods. Finally, we validate our results by reporting a strong correlation with monthly R&D employment data.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.11765

Climate-Driven Doubling of Maize Loss Probability in U.S. Crop Insurance: Spatiotemporal Prediction and Possible Policy Responses

By:	A Samuel Pottinger; Lawson Connor; Brookie Guzder-Williams; Maya Weltman-Fahs; Timothy Bowles
Abstract:	Climate change not only threatens agricultural producers but also strains financial institutions. These important food system actors include government entities tasked with both insuring grower livelihoods and supporting response to continued global warming. We use an artificial neural network to predict future maize yields in the U.S. Corn Belt, finding alarming changes to institutional risk exposure within the Federal Crop Insurance Program. Specifically, our machine learning method anticipates more frequent and more severe yield losses that would result in the annual probability of Yield Protection (YP) claims to more than double at mid-century relative to simulations without continued climate change. Furthermore, our dual finding of relatively unchanged average yields paired with decreasing yield stability reveals targeted opportunities to adjust coverage formulas to include variability. This important structural shift may help regulators support grower adaptation to continued climate change by recognizing the value of risk-reducing strategies such as regenerative agriculture. Altogether, paired with open source interactive tools for deeper investigation, our risk profile simulations fill an actionable gap in current understanding, bridging granular historic yield estimation and climate-informed prediction of future insurer-relevant loss.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.02217

Deep learning for quadratic hedging in incomplete jump market

By:	Nacira Agram; Bernt {\O}ksendal; Jan Rems
Abstract:	We propose a deep learning approach to study the minimal variance pricing and hedging problem in an incomplete jump diffusion market. It is based upon a rigorous stochastic calculus derivation of the optimal hedging portfolio, optimal option price, and the corresponding equivalent martingale measure through the means of the Stackelberg game approach. A deep learning algorithm based on the combination of the feedforward and LSTM neural networks is tested on three different market models, two of which are incomplete. In contrast, the complete market Black-Scholes model serves as a benchmark for the algorithm's performance. The results that indicate the algorithm's good performance are presented and discussed. In particular, we apply our results to the special incomplete market model studied by Merton and give a detailed comparison between our results based on the minimal variance principle and the results obtained by Merton based on a different pricing principle. Using deep learning, we find that the minimal variance principle leads to typically higher option prices than those deduced from the Merton principle. On the other hand, the minimal variance principle leads to lower losses than the Merton principle.
Date:	2024–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.13688

NeuralBeta: Estimating Beta Using Deep Learning

By:	Yuxin Liu; Jimin Lin; Achintya Gopal
Abstract:	Traditional approaches to estimating beta in finance often involve rigid assumptions and fail to adequately capture beta dynamics, limiting their effectiveness in use cases like hedging. To address these limitations, we have developed a novel method using neural networks called NeuralBeta, which is capable of handling both univariate and multivariate scenarios and tracking the dynamic behavior of beta. To address the issue of interpretability, we introduce a new output layer inspired by regularized weighted linear regression, which provides transparency into the model's decision-making process. We conducted extensive experiments on both synthetic and market data, demonstrating NeuralBeta's superior performance compared to benchmark methods across various scenarios, especially instances where beta is highly time-varying, e.g., during regime shifts in the market. This model not only represents an advancement in the field of beta estimation, but also shows potential for applications in other financial contexts that assume linear relationships.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.01387

International Trade Flow Prediction with Bilateral Trade Provisions

By:	Zijie Pan; Stepan Gordeev; Jiahui Zhao; Ziyi Meng; Caiwen Ding; Sandro Steinbach; Dongjin Song
Abstract:	This paper presents a novel methodology for predicting international bilateral trade flows, emphasizing the growing importance of Preferential Trade Agreements (PTAs) in the global trade landscape. Acknowledging the limitations of traditional models like the Gravity Model of Trade, this study introduces a two-stage approach combining explainable machine learning and factorization models. The first stage employs SHAP Explainer for effective variable selection, identifying key provisions in PTAs, while the second stage utilizes Factorization Machine models to analyze the pairwise interaction effects of these provisions on trade flows. By analyzing comprehensive datasets, the paper demonstrates the efficacy of this approach. The findings not only enhance the predictive accuracy of trade flow models but also offer deeper insights into the complex dynamics of international trade, influenced by specific bilateral trade provisions.
Date:	2024–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.13698

Public Perceptions of Canada’s Investment Climate

By:	Flora Lutz; Yuanchen Yang; Chengyu Huang
Abstract:	Canada’s muted productivity growth during recent years has sparked concerns about the country’s investment climate. In this study, we develop a new natural language processing (NPL) based indicator, mining the richness of Twitter (now X) accounts to measure trends in the public perceptions of Canada’s investment climate. We find that while the Canadian investment climate appears to be generally favorable, there are signs of slippage in some categories in recent periods, such as with respect to governance and infrastructure. This result is confirmed by both survey-based and NLP-based indicators. We also find that our NLP-based indicators would suggest that perceptions of Canada’s investment climate are similar to perceptions of U.S. investment climate, except with respect to governance, where views of U.S. governance are notably more negative. Comparing our novel indicator relative to traditional survey-based indicators, we find that the NLP-based indicators are statistically significant in helping to predict investment flows, similar to survey-based measures. Meanwhile, the new NLP-based indicator offers insights into the nuances of data, allowing us to identify specific grievances. Finally, we construct a similar indicator for the U.S. and compare trends across countries.
Keywords:	Investment Climate; Canada; Machine Learning; Sentiment Analysis
Date:	2024–07–26
URL:	https://d.repec.org/n?u=RePEc:imf:imfwpa:2024/165

Comparative analysis of Mixed-Data Sampling (MIDAS) model compared to Lag-Llama model for inflation nowcasting

By:	Adam Bahelka; Harmen de Weerd
Abstract:	Inflation is one of the most important economic indicators closely watched by both public institutions and private agents. This study compares the performance of a traditional econometric model, Mixed Data Sampling regression, with one of the newest developments from the field of Artificial Intelligence, a foundational time series forecasting model based on a Long short-term memory neural network called Lag-Llama, in their ability to nowcast the Harmonized Index of Consumer Prices in the Euro area. Two models were compared and assessed whether the Lag-Llama can outperform the MIDAS regression, ensuring that the MIDAS regression is evaluated under the best-case scenario using a dataset spanning from 2010 to 2022. The following metrics were used to evaluate the models: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Squared Error (MSE), correlation with the target, R-squared and adjusted R-squared. The results show better performance of the pre-trained Lag-Llama across all metrics.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.08510

Credit Risk Assessment Model for UAE Commercial Banks: A Machine Learning Approach

By:	Aditya Saxena; Dr Parizad Dungore
Abstract:	Credit ratings are becoming one of the primary references for financial institutions of the country to assess credit risk in order to accurately predict the likelihood of business failure of an individual or an enterprise. Financial institutions, therefore, depend on credit rating tools and services to help them predict the ability of creditors to meet financial persuasions. Conventional credit rating is broadly categorized into two classes namely: good credit and bad credit. This approach lacks adequate precision to perform credit risk analysis in practice. Related studies have shown that data-driven machine learning algorithms outperform many conventional statistical approaches in solving this type of problem, both in terms of accuracy and efficiency. The purpose of this paper is to construct and validate a credit risk assessment model using Linear Discriminant Analysis as a dimensionality reduction technique to discriminate good creditors from bad ones and identify the best classifier for credit assessment of commercial banks based on real-world data. This will help commercial banks to avoid monetary losses and prevent financial crisis
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.12044

The heterogeneous impact of the EU-Canada agreement with causal machine

By:	Lionel Fontagn\'e; Francesca Micocci; Armando Rungi
Abstract:	This paper introduces a causal machine learning approach to investigate the impact of the EU-Canada Comprehensive Economic Trade Agreement (CETA). We propose a matrix completion algorithm on French customs data to obtain multidimensional counterfactuals at the firm, product and destination levels. We find a small but significant positive impact on average at the product-level intensive margin. On the other hand, the extensive margin shows product churning due to the treaty beyond regular entry-exit dynamics: one product in eight that was not previously exported substitutes almost as many that are no longer exported. When we delve into the heterogeneity, we find that the effects of the treaty are higher for products at a comparative advantage. Focusing on multiproduct firms, we find that they adjust their portfolio in Canada by reallocating towards their first and most exported product due to increasing local market competition after trade liberalization. Finally, multidimensional counterfactuals allow us to evaluate the general equilibrium effect of the CETA. Specifically, we observe trade diversion, as exports to other destinations are re-directed to Canada.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.07652

Deep Reinforcement Learning Strategies in Finance: Insights into Asset Holding, Trading Behavior, and Purchase Diversity

By:	Alireza Mohammadshafie; Akram Mirzaeinia; Haseebullah Jumakhan; Amir Mirzaeinia
Abstract:	Recent deep reinforcement learning (DRL) methods in finance show promising outcomes. However, there is limited research examining the behavior of these DRL algorithms. This paper aims to investigate their tendencies towards holding or trading financial assets as well as purchase diversity. By analyzing their trading behaviors, we provide insights into the decision-making processes of DRL models in finance applications. Our findings reveal that each DRL algorithm exhibits unique trading patterns and strategies, with A2C emerging as the top performer in terms of cumulative rewards. While PPO and SAC engage in significant trades with a limited number of stocks, DDPG and TD3 adopt a more balanced approach. Furthermore, SAC and PPO tend to hold positions for shorter durations, whereas DDPG, A2C, and TD3 display a propensity to remain stationary for extended periods.
Date:	2024–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.09557

Leveraging Natural Language and Item Response Theory Models for ESG Scoring

By:	C\'esar Pedrosa Soares
Abstract:	This paper explores an innovative approach to Environmental, Social, and Governance (ESG) scoring by integrating Natural Language Processing (NLP) techniques with Item Response Theory (IRT), specifically the Rasch model. The study utilizes a comprehensive dataset of news articles in Portuguese related to Petrobras, a major oil company in Brazil, collected from 2022 and 2023. The data is filtered and classified for ESG-related sentiments using advanced NLP methods. The Rasch model is then applied to evaluate the psychometric properties of these ESG measures, providing a nuanced assessment of ESG sentiment trends over time. The results demonstrate the efficacy of this methodology in offering a more precise and reliable measurement of ESG factors, highlighting significant periods and trends. This approach may enhance the robustness of ESG metrics and contribute to the broader field of sustainability and finance by offering a deeper understanding of the temporal dynamics in ESG reporting.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.20377

A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

By:	Yuan Li; Bingqiao Luo; Qian Wang; Nuo Chen; Xu Liu; Bingsheng He
Abstract:	The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge the gap by developing an LLM-based trading agent, CryptoTrade, which uniquely combines the analysis of on-chain and off-chain data. This approach leverages the transparency and immutability of on-chain data, as well as the timeliness and influence of off-chain signals, providing a comprehensive overview of the cryptocurrency market. CryptoTrade incorporates a reflective mechanism specifically engineered to refine its daily trading decisions by analyzing the outcomes of prior trading decisions. This research makes two significant contributions. Firstly, it broadens the applicability of LLMs to the domain of cryptocurrency trading. Secondly, it establishes a benchmark for cryptocurrency trading strategies. Through extensive experiments, CryptoTrade has demonstrated superior performance in maximizing returns compared to traditional trading strategies and time-series baselines across various cryptocurrencies and market conditions. Our code and data are available at \url{https://anonymous.4open.science/r/C ryptoTrade-Public-92FC/}.
Date:	2024–06
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.09546

Explainable AI in Request-for-Quote

By:	Qiqin Zhou
Abstract:	In the contemporary financial landscape, accurately predicting the probability of filling a Request-For-Quote (RFQ) is crucial for improving market efficiency for less liquid asset classes. This paper explores the application of explainable AI (XAI) models to forecast the likelihood of RFQ fulfillment. By leveraging advanced algorithms including Logistic Regression, Random Forest, XGBoost and Bayesian Neural Tree, we are able to improve the accuracy of RFQ fill rate predictions and generate the most efficient quote price for market makers. XAI serves as a robust and transparent tool for market participants to navigate the complexities of RFQs with greater precision.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2407.15038

Big data and firm-level productivity: A cross-country comparison

By:	Andres, Raphaela; Niebel, Thomas; Sack, Robin
Abstract:	Until today, the question of how digitalisation and, in particular, individual digital technologies affect productivity is still the subject of controversial debate. Using administrative firm-level data provided by the Dutch and the German statistical offices, we investigate the economic importance of data, in particular, the effect of the application of big data analytics (BDA) on labour productivity (LP) at the firm level. We find that a simple binary measure indicating the mere usage of BDA fails to capture the effect of BDA on LP. In contrast, measures of BDA intensity clearly show a positive and statistically significant relationship between BDA and LP, even after controlling for a firm's general digitalisation level.
Keywords:	big data analytics, productivity, administrative firm-level data
JEL:	L25 O14 O33
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:zbw:zewdip:300678

Quantile Regression using Random Forest Proximities

By:	Mingshu Li; Bhaskarjit Sarmah; Dhruv Desai; Joshua Rosaler; Snigdha Bhagat; Philip Sommer; Dhagash Mehta
Abstract:	Due to the dynamic nature of financial markets, maintaining models that produce precise predictions over time is difficult. Often the goal isn't just point prediction but determining uncertainty. Quantifying uncertainty, especially the aleatoric uncertainty due to the unpredictable nature of market drivers, helps investors understand varying risk levels. Recently, quantile regression forests (QRF) have emerged as a promising solution: Unlike most basic quantile regression methods that need separate models for each quantile, quantile regression forests estimate the entire conditional distribution of the target variable with a single model, while retaining all the salient features of a typical random forest. We introduce a novel approach to compute quantile regressions from random forests that leverages the proximity (i.e., distance metric) learned by the model and infers the conditional distribution of the target variable. We evaluate the proposed methodology using publicly available datasets and then apply it towards the problem of forecasting the average daily volume of corporate bonds. We show that using quantile regression using Random Forest proximities demonstrates superior performance in approximating conditional target distributions and prediction intervals to the original version of QRF. We also demonstrate that the proposed framework is significantly more computationally efficient than traditional approaches to quantile regressions.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.02355

Machine Learning-based Relative Valuation of Municipal Bonds

By:	Preetha Saha; Jingrao Lyu; Dhruv Desai; Rishab Chauhan; Jerinsh Jeyapaulraj; Philip Sommer; Dhagash Mehta
Abstract:	The trading ecosystem of the Municipal (muni) bond is complex and unique. With nearly 2\% of securities from over a million securities outstanding trading daily, determining the value or relative value of a bond among its peers is challenging. Traditionally, relative value calculation has been done using rule-based or heuristics-driven approaches, which may introduce human biases and often fail to account for complex relationships between the bond characteristics. We propose a data-driven model to develop a supervised similarity framework for the muni bond market based on CatBoost algorithm. This algorithm learns from a large-scale dataset to identify bonds that are similar to each other based on their risk profiles. This allows us to evaluate the price of a muni bond relative to a cohort of bonds with a similar risk profile. We propose and deploy a back-testing methodology to compare various benchmarks and the proposed methods and show that the similarity-based method outperforms both rule-based and heuristic-based methods.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.02273

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.