nep-big 2023-08-28 papers

on Big Data

Issue of 2023‒08‒28
25 papers chosen by
Tom Coupé, University of Canterbury

Economic Growth and Pollution in different Political Regimes By Andreas Kammerlander
Multi-Factor Inception: What to Do with All of These Features? By Tom Liu; Stefan Zohren
Datalism and Data Monopolies in the Era of A.I.: A Research Agenda By Catherine E. A. Mulligan; Phil Godsiff
Predict-AI-bility of how humans balance self-interest with the interest of others By Valerio Capraro; Roberto Di Paolo; Veronica Pizziol
Adversarial Deep Hedging: Learning to Hedge without Price Process Modeling By Masanori Hirano; Kentaro Minami; Kentaro Imajo
Using machine learning to map the European cleantech sector By Ambrois, Matteo; Butticè, Vincenzo; Caviggioli, Federico; Cerulli, Giovanni; Croce, Annalisa; De Marco, Antonio; Giordano, Andrea; Resce, Giuliano; Toschi, Laura; Ughetto, Elisa; Zinilli, Antonio
Modeling Inverse Demand Function with Explainable Dual Neural Networks By Zhiyu Cao; Zihan Chen; Prerna Mishra; Hamed Amini; Zachary Feinstein
Multimodal Document Analytics for Banking Process Automation By Christopher Gerling; Stefan Lessmann
ESG Reputation Risk Matters: An Event Study Based on Social Media Data By Maxime L. D. Nicolas; Adrien Desroziers; Fabio Caccioli; Tomaso Aste
Deep Dynamic Factor Models By Paolo Andreini; Cosimo Izzo; Giovanni Ricco
Including individual Customer Lifetime Value and competing risks in tree-based lapse management strategies By Mathias Valla; Xavier Milhaud; Anani Ayodélé Olympio
Sports Betting: an application of neural networks and modern portfolio theory to the English Premier League By V\'elez Jim\'enez; Rom\'an Alberto; Lecuanda Ontiveros; Jos\'e Manuel; Edgar Possani
Does Unfairness Hurt Women? The Effects of Losing Unfair Competitions By Piasenti, Stefano; Valente, Marica; van Veldhuizen, Roel; Pfeifer, Gregor
Dynamic Large Language Models on Blockchains By Yuanhao Gong
Financial Machine Learning By Bryan T. Kelly; Dacheng Xiu
Gender Stereotypes in User-Generated Content By Anna Kerkhof; Valentin Reich
Optimal Markowitz Portfolio Using Returns Forecasted with Time Series and Machine Learning Models By Damian Ślusarczyk; Robert Ślepaczuk
Nighttime Light Pollution and Economic Activities: A Spatio-Temporal Model with Common Factors for US Counties By Bresson, Georges; Etienne, Jean-Michel; Lacroix, Guy
Nowcasting world trade with machine learning: a three-step approach By Chinn, Menzie D.; Meunier, Baptiste; Stumpner, Sebastian
Contrasting the efficiency of stock price prediction models using various types of LSTM models aided with sentiment analysis By Varun Sangwan; Vishesh Kumar Singh; Bibin Christopher V
FinGPT: Democratizing Internet-scale Data for Financial Large Language Models By Xiao-Yang Liu; Guoxuan Wang; Daochen Zha
Deep Reinforcement Learning for Robust Goal-Based Wealth Management By Tessa Bauman; Bruno Ga\v{s}perov; Stjepan Begu\v{s}i\'c; Zvonko Kostanj\v{c}ar
Machine Learning-powered Pricing of the Multidimensional Passport Option By Josef Teichmann; Hanna Wutte
ENCODING URBAN TRAJECTORY AS A LANGUAGE: DEEP LEARNING INSIGHTS FOR HUMAN MOBILITY PATTERN By Park, Youngjun; Han, Sumin
VolTS: A Volatility-based Trading System to forecast Stock Markets Trend using Statistics and Machine Learning By Ivan Letteri

Economic Growth and Pollution in different Political Regimes

By:	Andreas Kammerlander (Department of International Economic Policy, University of Freiburg)
Abstract:	I examine the association between nighttime light luminosity and ten pollution measures (CO2, CO, NOx, SO2, NMVOC, NH3, BC, OC, PM10 and PM2.5) across dierent political regimes at a local level. Although the eects of the political system and economic growth on pollution have been widely analyzed at the country level, this is the rst study to do so at the grid level. The empirical analysis yields three major insights. First, economic growth is positively associated with a wide array of dierent pollution measures. Second, there are signicant dierences in the association between economic growth and air pollution across dierent political regimes. For example, the association between nighttime light luminosity and air pollution is strictly positive for autocracies. The association between nighttime luminosity and air pollution is substantially smaller but still positive for democracies. Furthermore, among democracies the relationship between nighttime light luminosity and air pollution is concave for nine out of ten pollutants; among autocracies, the relationship is either convex (ve out of ten pollutants) or the squared term is insignicant. Third, the dierences among political regimes is driven chiey by pollution emissions in the industry, energy, and transport sectors; there is no dierence between autocracies and democracies in terms of the eect of growth on emissions in the agricultural and residential sectors.
Keywords:	local economic growth, air pollution, nighttime lights, geo-data
JEL:	O18 Q53
Date:	2022–10
URL:	http://d.repec.org/n?u=RePEc:fre:wpaper:43&r=big

Multi-Factor Inception: What to Do with All of These Features?

By:	Tom Liu; Stefan Zohren
Abstract:	Cryptocurrency trading represents a nascent field of research, with growing adoption in industry. Aided by its decentralised nature, many metrics describing cryptocurrencies are accessible with a simple Google search and update frequently, usually at least on a daily basis. This presents a promising opportunity for data-driven systematic trading research, where limited historical data can be augmented with additional features, such as hashrate or Google Trends. However, one question naturally arises: how to effectively select and process these features? In this paper, we introduce Multi-Factor Inception Networks (MFIN), an end-to-end framework for systematic trading with multiple assets and factors. MFINs extend Deep Inception Networks (DIN) to operate in a multi-factor context. Similar to DINs, MFIN models automatically learn features from returns data and output position sizes that optimise portfolio Sharpe ratio. Compared to a range of rule-based momentum and reversion strategies, MFINs learn an uncorrelated, higher-Sharpe strategy that is not captured by traditional, hand-crafted factors. In particular, MFIN models continue to achieve consistent returns over the most recent years (2022-2023), where traditional strategies and the wider cryptocurrency market have underperformed.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.13832&r=big

Datalism and Data Monopolies in the Era of A.I.: A Research Agenda

By:	Catherine E. A. Mulligan; Phil Godsiff
Abstract:	The increasing use of data in various parts of the economic and social systems is creating a new form of monopoly: data monopolies. We illustrate that the companies using these strategies, Datalists, are challenging the existing definitions used within Monopoly Capital Theory (MCT). Datalists are pursuing a different type of monopoly control than traditional multinational corporations. They are pursuing monopolistic control over data to feed their productive processes, increasingly controlled by algorithms and Artificial Intelligence (AI). These productive processes use information about humans and the creative outputs of humans as the inputs but do not classify those humans as employees, so they are not paid or credited for their labour. This paper provides an overview of this evolution and its impact on monopoly theory. It concludes with an outline for a research agenda for economics in this space.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.08049&r=big

Predict-AI-bility of how humans balance self-interest with the interest of others

By:	Valerio Capraro; Roberto Di Paolo; Veronica Pizziol
Abstract:	Generative artificial intelligence holds enormous potential to revolutionize decision-making processes, from everyday to high-stake scenarios. However, as many decisions carry social implications, for AI to be a reliable assistant for decision-making it is crucial that it is able to capture the balance between self-interest and the interest of others. We investigate the ability of three of the most advanced chatbots to predict dictator game decisions across 78 experiments with human participants from 12 countries. We find that only GPT-4 (not Bard nor Bing) correctly captures qualitative behavioral patterns, identifying three major classes of behavior: self-interested, inequity-averse, and fully altruistic. Nonetheless, GPT-4 consistently overestimates other-regarding behavior, inflating the proportion of inequity-averse and fully altruistic participants. This bias has significant implications for AI developers and users.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.12776&r=big

Adversarial Deep Hedging: Learning to Hedge without Price Process Modeling

By:	Masanori Hirano; Kentaro Minami; Kentaro Imajo
Abstract:	Deep hedging is a deep-learning-based framework for derivative hedging in incomplete markets. The advantage of deep hedging lies in its ability to handle various realistic market conditions, such as market frictions, which are challenging to address within the traditional mathematical finance framework. Since deep hedging relies on market simulation, the underlying asset price process model is crucial. However, existing literature on deep hedging often relies on traditional mathematical finance models, e.g., Brownian motion and stochastic volatility models, and discovering effective underlying asset models for deep hedging learning has been a challenge. In this study, we propose a new framework called adversarial deep hedging, inspired by adversarial learning. In this framework, a hedger and a generator, which respectively model the underlying asset process and the underlying asset process, are trained in an adversarial manner. The proposed method enables to learn a robust hedger without explicitly modeling the underlying asset process. Through numerical experiments, we demonstrate that our proposed method achieves competitive performance to models that assume explicit underlying asset processes across various real market data.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.13217&r=big

Using machine learning to map the European cleantech sector

By:	Ambrois, Matteo; Butticè, Vincenzo; Caviggioli, Federico; Cerulli, Giovanni; Croce, Annalisa; De Marco, Antonio; Giordano, Andrea; Resce, Giuliano; Toschi, Laura; Ughetto, Elisa; Zinilli, Antonio
Abstract:	This working paper uses machine learning to identify Cleantech companies in the Orbis database, based on self-declared business descriptions. Identifying Cleantech companies is challenging, as there is no universally accepted definition of what constitutes Cleantech. This novel approach allows to scale-up the identification process by training an algorithm to mimic (human) expert assessment in order to identify Cleantech companies in a large dataset containing information on millions of European companies. The resulting dataset is used to construct a mapping of Cleantech companies in Europe and thereby provide a new perspective on the functioning of the EU cleantech sector. The paper serves as an introductory chapter to a series of analyses that will result from the CLEU project, a collaboration between the universities of Politecnico di Torino, Politecnico di Milano and Università degli Studi di Bologna. Notably, the project aims to deepen our understanding of the financing needs of the EU Cleantech sector. It was funded by the EIB's University Research Sponsorship (EIBURS) programme and supervised by the EIF's Research and Market Analysis Division.
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:eifwps:202391&r=big

Modeling Inverse Demand Function with Explainable Dual Neural Networks

By:	Zhiyu Cao; Zihan Chen; Prerna Mishra; Hamed Amini; Zachary Feinstein
Abstract:	Financial contagion has been widely recognized as a fundamental risk to the financial system. Particularly potent is price-mediated contagion, wherein forced liquidations by firms depress asset prices and propagate financial stress, enabling crises to proliferate across a broad spectrum of seemingly unrelated entities. Price impacts are currently modeled via exogenous inverse demand functions. However, in real-world scenarios, only the initial shocks and the final equilibrium asset prices are typically observable, leaving actual asset liquidations largely obscured. This missing data presents significant limitations to calibrating the existing models. To address these challenges, we introduce a novel dual neural network structure that operates in two sequential stages: the first neural network maps initial shocks to predicted asset liquidations, and the second network utilizes these liquidations to derive resultant equilibrium prices. This data-driven approach can capture both linear and non-linear forms without pre-specifying an analytical structure; furthermore, it functions effectively even in the absence of observable liquidation data. Experiments with simulated datasets demonstrate that our model can accurately predict equilibrium asset prices based solely on initial shocks, while revealing a strong alignment between predicted and true liquidations. Our explainable framework contributes to the understanding and modeling of price-mediated contagion and provides valuable insights for financial authorities to construct effective stress tests and regulatory policies.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.14322&r=big

Multimodal Document Analytics for Banking Process Automation

By:	Christopher Gerling; Stefan Lessmann
Abstract:	In response to growing FinTech competition and the need for improved operational efficiency, this research focuses on understanding the potential of advanced document analytics, particularly using multimodal models, in banking processes. We perform a comprehensive analysis of the diverse banking document landscape, highlighting the opportunities for efficiency gains through automation and advanced analytics techniques in the customer business. Building on the rapidly evolving field of natural language processing (NLP), we illustrate the potential of models such as LayoutXLM, a cross-lingual, multimodal, pre-trained model, for analyzing diverse documents in the banking sector. This model performs a text token classification on German company register extracts with an overall F1 score performance of around 80\%. Our empirical evidence confirms the critical role of layout information in improving model performance and further underscores the benefits of integrating image information. Interestingly, our study shows that over 75% F1 score can be achieved with only 30% of the training data, demonstrating the efficiency of LayoutXLM. Through addressing state-of-the-art document analysis frameworks, our study aims to enhance process efficiency and demonstrate the real-world applicability and benefits of multimodal models within banking.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.11845&r=big

ESG Reputation Risk Matters: An Event Study Based on Social Media Data

By:	Maxime L. D. Nicolas; Adrien Desroziers; Fabio Caccioli; Tomaso Aste
Abstract:	We investigate the response of shareholders to Environmental, Social, and Governance-related reputational risk (ESG-risk), focusing exclusively on the impact of social media. Using a dataset of 114 million tweets about firms listed on the S&P100 index between 2016 and 2022, we extract conversations discussing ESG matters. In an event study design, we define events as unusual spikes in message posting activity linked to ESG-risk, and we then examine the corresponding changes in the returns of related assets. By focusing on social media, we gain insight into public opinion and investor sentiment, an aspect not captured through ESG controversies news alone. To the best of our knowledge, our approach is the first to distinctly separate the reputational impact on social media from the physical costs associated with negative ESG controversy news. Our results show that the occurrence of an ESG-risk event leads to a statistically significant average reduction of 0.29% in abnormal returns. Furthermore, our study suggests this effect is predominantly driven by Social and Governance categories, along with the "Environmental Opportunities" subcategory. Our research highlights the considerable impact of social media on financial markets, particularly in shaping shareholders' perception of ESG reputation. We formulate several policy implications based on our findings.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.11571&r=big

Deep Dynamic Factor Models

By:	Paolo Andreini (Independent Researcher); Cosimo Izzo (Independent Researcher); Giovanni Ricco (CREST, Ecole Polytechnique, University of Warwick, OFCE-SciencesPo, CEPR)
Abstract:	A novel deep neural network framework – that we refer to as Deep Dynamic Factor Model (D2FM) –, is able to encode the information available, from hundreds of macroeconomic and financial time-series into a handful of unobserved latent states. While similar in spirit to traditional dynamic factor models (DFMs), differently from those, this new class of models allows for nonlinearities between factors and observables due to the autoencoder neural network structure. However, by design, the latent states of the model can still be interpreted as in a standard factor model. Both in a fully real-time out-of-sample nowcasting and forecasting exercise with US data and in a Monte Carlo experiment, the D2FM improves over the performances of a state-of-the-art DFM.
Keywords:	Machine Learning, Deep Learning, Autoencoders, Real-Time data, Time-Series, Forecasting, Nowcasting, Latent Component Models, Factor Models
JEL:	C22 C52 C53 C55
Date:	2023–05–20
URL:	http://d.repec.org/n?u=RePEc:crs:wpaper:2023-08&r=big

Including individual Customer Lifetime Value and competing risks in tree-based lapse management strategies

By:	Mathias Valla (LSAF - Laboratoire de Sciences Actuarielles et Financières [Lyon] - ISFA - Institut de Science Financière et d'Assurances, Faculty of Business and Economics - University of Leuven (KUL)); Xavier Milhaud (I2M - Institut de Mathématiques de Marseille - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Anani Ayodélé Olympio (SAF - Laboratoire de Sciences Actuarielle et Financière - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon)
Abstract:	A retention strategy based on an enlightened lapse model is a powerful profitability lever for a life insurer. Some machine learning models are excellent at predicting lapse, but from the insurer's perspective, predicting which policyholder is likely to lapse is not enough to design a retention strategy. In our paper, we define a lapse management framework with an appropriate validation metric based on Customer Lifetime Value and profitability. We include the risk of death in the study through competing risks considerations in parametric and tree-based models and show that further individualization of the existing approaches leads to increased performance. We show that survival tree-based models outperform parametric approaches and that the actuarial literature can significantly benefit from them. Then, we compare, on real data, how this framework leads to increased predicted gains for a life insurer and discuss the benefits of our model in terms of commercial and strategic decision-making.
Keywords:	Machine Learning, Life insurance, Customer lifetime value, Lapse, Lapse management strategy, Competing risks, Tree-based models
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03903047&r=big

Sports Betting: an application of neural networks and modern portfolio theory to the English Premier League

By:	V\'elez Jim\'enez; Rom\'an Alberto; Lecuanda Ontiveros; Jos\'e Manuel; Edgar Possani
Abstract:	This paper presents a novel approach for optimizing betting strategies in sports gambling by integrating Von Neumann-Morgenstern Expected Utility Theory, deep learning techniques, and advanced formulations of the Kelly Criterion. By combining neural network models with portfolio optimization, our method achieved remarkable profits of 135.8% relative to the initial wealth during the latter half of the 20/21 season of the English Premier League. We explore complete and restricted strategies, evaluating their performance, risk management, and diversification. A deep neural network model is developed to forecast match outcomes, addressing challenges such as limited variables. Our research provides valuable insights and practical applications in the field of sports betting and predictive modeling.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.13807&r=big

Does Unfairness Hurt Women? The Effects of Losing Unfair Competitions

By:	Piasenti, Stefano (Humboldt University Berlin); Valente, Marica (University of Innsbruck); van Veldhuizen, Roel (Lund University); Pfeifer, Gregor (University of Sydney)
Abstract:	How do men and women differ in their persistence after experiencing failure in a competitive environment? We tackle this question by combining a large online experiment (N=2, 086) with machine learning. We find that when losing is unequivocally due to merit, both men and women exhibit a significant decrease in subsequent tournament entry. However, when the prior tournament is unfair, i.e., a loss is no longer necessarily based on merit, women are more discouraged than men. These results suggest that transparent meritocratic criteria may play a key role in preventing women from falling behind after experiencing a loss.
Keywords:	competitiveness, gender, fairness, machine learning, online experiment
JEL:	C90 D91 J16 C14
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp16324&r=big

Dynamic Large Language Models on Blockchains

By:	Yuanhao Gong
Abstract:	Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.10549&r=big

Financial Machine Learning

By:	Bryan T. Kelly; Dacheng Xiu
Abstract:	We survey the nascent literature on machine learning in the study of financial markets. We highlight the best examples of what this line of research has to offer and recommend promising directions for future research. This survey is designed for both financial economists interested in grasping machine learning tools, as well as for statisticians and machine learners seeking interesting financial contexts where advanced methods may be deployed.
JEL:	C33 C4 C45 C55 C58 G1 G10 G11 G12 G17
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:31502&r=big

Gender Stereotypes in User-Generated Content

By:	Anna Kerkhof; Valentin Reich
Abstract:	Gender stereotypes pose an important hurdle on the way to gender equality. It is difficult to quantify the problem, though, as stereotypical beliefs are often subconscious or not openly expressed. User-generated content (UGC) opens up novel opportunities to overcome such challenges, as the anonymity of users may eliminate social pressures. This paper leverages over a million anonymous comments from a major German online discussion forum to study the prevalence and development of gender stereotypes over almost a decade. To that end, we develop an innovative and widely applicable text analysis procedure that overcomes conceptual challenges that arise whenever two variables in the training data are correlated, and changes in that correlation in the prediction sample are subject of examination themselves. Here, we apply the procedure to study the correlation between gender (i.e., does a comment discuss women or men) and gender stereotypical topics (e.g., work or family) in our comments, where we interpret a strong correlation as the presence of gender stereotypes. We find that men are indeed discussed relatively more often in the context of stereotypical male topics such as work and money, and that women are discussed relatively more often in the context of stereotypical female topics such as family, home, and physical appearance. While the prevalence of gender stereotypes related to stereotypical male topics diminishes over time, gender stereotypes related to female topics mostly persist.
Keywords:	gender bias, gender stereotypes, natural language processing, machine learning, user-generated content, word embeddings
JEL:	C55 J16
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_10578&r=big

Optimal Markowitz Portfolio Using Returns Forecasted with Time Series and Machine Learning Models

By:	Damian Ślusarczyk (University of Warsaw, Faculty of Economic Sciences); Robert Ślepaczuk (University of Warsaw, Quantitative Finance Research Group, Department of Quantitative Finance, Faculty of Economic Sciences)
Abstract:	We aim to answer the question of whether using forecasted stock returns based on machine learning and time series models in a mean-variance portfolio framework yields better results than relying on historical returns. Nevertheless, the problem of the efficient stock selection has been tested for more than 50 years, the issue of adequate construction of mean-variance portfolio framework and incorporating forecasts of returns in it has not been solved yet. Stock returns portfolios were created using ’raw’ historical returns and forecasted return based on ARIMA-GARCH and the XGBoost models. Two optimization problems were concerned: global maximum information ratio and global mini-mum variance. Then strategies were compared with two benchmarks – an equally weighted portfolio and buy and hold on the DJIA index. Strategies were tested on Dow Jones Industrial Average stocks in the period from 2007-01-01 to 2022-12-31 and daily data was used. The main portfolio performance metrics were information ratio* and information ratio**. The results showed that using forecasted returns we can enhance our portfolio selection based on Markowitz framework, but it is not a universal solution, and we have to control all the parameters and hyperparameters of selected models.
Keywords:	Algorithmic Investment Strategies, Markowitz framework, portfolio optimization, forecasting, ARIMA, GARCH, XGBoost, minimum variance
JEL:	C4 C14 C45 C53 C58 G13
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:war:wpaper:2023-17&r=big

Nighttime Light Pollution and Economic Activities: A Spatio-Temporal Model with Common Factors for US Counties

By:	Bresson, Georges (University of Paris 2); Etienne, Jean-Michel (Université Paris-Sud); Lacroix, Guy (Université Laval)
Abstract:	Excessive nighttime light is known to have detrimental effects on health and on the environment (fauna and flora). The paper investigates the link between nighttime light pollution and economic growth, air pollution, and urban density. We propose a county model of consumption which accounts for spatial interactions. The model naturally leads to a dynamic general nesting spatial model with unknown common factors. The model is estimated with data for 3071 continental US counties from 2012–2019 using a quasi-maximum likelihood estimator. Short run and long run county marginal effects emphasize the importance of spillover effects on radiance levels. Counties with high levels of radiance are less sensitive to additional growth than low-level counties. This has implications for policies that have been proposed to curtail nighttime light pollution.
Keywords:	nighttime light pollution, air pollution, GDP, satellite data, space-time panel data model
JEL:	C23 Q53
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp16342&r=big

Nowcasting world trade with machine learning: a three-step approach

By:	Chinn, Menzie D.; Meunier, Baptiste; Stumpner, Sebastian
Abstract:	We nowcast world trade using machine learning, distinguishing between tree-based methods (random forest, gradient boosting) and their regression-based counterparts (macroeconomic random forest, linear gradient boosting). While much less used in the literature, the latter are found to outperform not only the tree-based techniques, but also more “traditional” linear and non-linear techniques (OLS, Markov-switching, quantile regression). They do so significantly and consistently across different horizons and real-time datasets. To further improve performances when forecasting with machine learning, we propose a flexible three-step approach composed of (step 1) pre-selection, (step 2) factor extraction and (step 3) machine learning regression. We find that both pre-selection and factor extraction significantly improve the accuracy of machine-learning-based predictions. This three-step approach also outperforms workhorse benchmarks, such as a PCA-OLS model, an elastic net, or a dynamic factor model. Finally, on top of high accuracy, the approach is flexible and can be extended seamlessly beyond world trade. JEL Classification: C53, C55, E37
Keywords:	big data, factor model, forecasting, large dataset, pre-selection
Date:	2023–08
URL:	http://d.repec.org/n?u=RePEc:ecb:ecbwps:20232836&r=big

Contrasting the efficiency of stock price prediction models using various types of LSTM models aided with sentiment analysis

By:	Varun Sangwan; Vishesh Kumar Singh; Bibin Christopher V
Abstract:	Our research aims to find the best model that uses companies projections and sector performances and how the given company fares accordingly to correctly predict equity share prices for both short and long term goals.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.07868&r=big

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

By:	Xiao-Yang Liu; Guoxuan Wang; Daochen Zha
Abstract:	Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation /FinNLP
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.10485&r=big

Deep Reinforcement Learning for Robust Goal-Based Wealth Management

By:	Tessa Bauman; Bruno Ga\v{s}perov; Stjepan Begu\v{s}i\'c; Zvonko Kostanj\v{c}ar
Abstract:	Goal-based investing is an approach to wealth management that prioritizes achieving specific financial goals. It is naturally formulated as a sequential decision-making problem as it requires choosing the appropriate investment until a goal is achieved. Consequently, reinforcement learning, a machine learning technique appropriate for sequential decision-making, offers a promising path for optimizing these investment strategies. In this paper, a novel approach for robust goal-based wealth management based on deep reinforcement learning is proposed. The experimental results indicate its superiority over several goal-based wealth management benchmarks on both simulated and historical market data.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.13501&r=big

Machine Learning-powered Pricing of the Multidimensional Passport Option

By:	Josef Teichmann; Hanna Wutte
Abstract:	Introduced in the late 90s, the passport option gives its holder the right to trade in a market and receive any positive gain in the resulting traded account at maturity. Pricing the option amounts to solving a stochastic control problem that for $d>1$ risky assets remains an open problem. Even in a correlated Black-Scholes (BS) market with $d=2$ risky assets, no optimal trading strategy has been derived in closed form. In this paper, we derive a discrete-time solution for multi-dimensional BS markets with uncorrelated assets. Moreover, inspired by the success of deep reinforcement learning in, e.g., board games, we propose two machine learning-powered approaches to pricing general options on a portfolio value in general markets. These approaches prove to be successful for pricing the passport option in one-dimensional and multi-dimensional uncorrelated BS markets.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.14887&r=big

ENCODING URBAN TRAJECTORY AS A LANGUAGE: DEEP LEARNING INSIGHTS FOR HUMAN MOBILITY PATTERN

By:	Park, Youngjun; Han, Sumin
Abstract:	Rapid advancements in deep learning technology have shown great promise in helping us better understand the spatio-temporal characteristics of human mobility in urban areas. There exist two main approaches to spatial deep learning models for urban space - a convolutional neural network (CNN) which originated from visual data like satellite image, and a graph convolutional network (GCN) which is based on the urban topologies such as road network and regional boundaries. However, compared to language-based models that have recently achieved notable success, deep learning models for urban space still need further development. In this study, we propose a novel approach that addresses the trajectories of a trip as sentences of a language and adapts techniques like word embedding from natural language processing to gain insights into human mobility patterns in urban areas. Our approach involves processing sequences of spatial units that are generated by a human agent's trajectory, treating them as akin to word sequences in a language. Specifically, we represent individual trajectories as sequences of spatial vector units using 50×50 meters grid cells to divide the urban area. This representation captures the spatio-temporal changes of the trip, and enables us to employ natural language processing techniques, such as word embeddings and attention mechanisms, to analyze the urban trajectory sequences. Additionally, we leverage word embedding models from language processing to acquire compressed representations of the trajectory. These compressed representations contain richer information about the features, while minimizing the computational load.
Date:	2023–06–17
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:guf3z&r=big

VolTS: A Volatility-based Trading System to forecast Stock Markets Trend using Statistics and Machine Learning

By:	Ivan Letteri
Abstract:	Volatility-based trading strategies have attracted a lot of attention in financial markets due to their ability to capture opportunities for profit from market dynamics. In this article, we propose a new volatility-based trading strategy that combines statistical analysis with machine learning techniques to forecast stock markets trend. The method consists of several steps including, data exploration, correlation and autocorrelation analysis, technical indicator use, application of hypothesis tests and statistical models, and use of variable selection algorithms. In particular, we use the k-means++ clustering algorithm to group the mean volatility of the nine largest stocks in the NYSE and NasdaqGS markets. The resulting clusters are the basis for identifying relationships between stocks based on their volatility behaviour. Next, we use the Granger Causality Test on the clustered dataset with mid-volatility to determine the predictive power of a stock over another stock. By identifying stocks with strong predictive relationships, we establish a trading strategy in which the stock acting as a reliable predictor becomes a trend indicator to determine the buy, sell, and hold of target stock trades. Through extensive backtesting and performance evaluation, we find the reliability and robustness of our volatility-based trading strategy. The results suggest that our approach effectively captures profitable trading opportunities by leveraging the predictive power of volatility clusters, and Granger causality relationships between stocks. The proposed strategy offers valuable insights and practical implications to investors and market participants who seek to improve their trading decisions and capitalize on market trends. It provides valuable insights and practical implications for market participants looking to.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2307.13422&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.