nep-big 2024-05-13 papers

on Big Data

Issue of 2024‒05‒13
25 papers chosen by
Tom Coupé, University of Canterbury

Sentiment trading with large language models By Kirtac, Kemal; Germano, Guido
For What It's Worth: Measuring Land Value in the Era of Big Data and Machine Learning By Scott Wentland; Gary Cornwall; Jeremy G. Moulton
QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection By Nouhaila Innan; Alberto Marchisio; Muhammad Shafique; Mohamed Bennai
Harnessing Satellite Data to Improve Social Assistance Targeting in the Eastern Caribbean By Sophia Chen; Ryu Matsuura; Flavien Moreau; Ms. Joana Pereira
Neural Network Modeling for Forecasting Tourism Demand in Stopi\'{c}a Cave: A Serbian Cave Tourism Study By Buda Baji\'c; Sr{\dj}an Mili\'cevi\'c; Aleksandar Anti\'c; Slobodan Markovi\'c; Nemanja Tomi\'c
Machine learning-based similarity measure to forecast M&A from patent data By Giambattista Albora; Matteo Straccamore; Andrea Zaccaria
RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data By Yupeng Cao; Zhi Chen; Qingyun Pei; Fabrizio Dimino; Lorenzo Ausiello; Prashant Kumar; K. P. Subbalakshmi; Papa Momar Ndiaye
Early warning systems for financial markets of emerging economies By Artem Kraevskiy; Artem Prokhorov; Evgeniy Sokolovskiy
Pre-publication revisions of bank financial statements: a novel way to monitor banks? By Andre Guettler; Mahvish Naeem; Lars Norden; Bernardus F Nazar Van Doornik
Artificial Intelligence-based Analysis of Change in Public Finance between US and International Markets By Kapil Panda
A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations By Lorenc Kapllani; Long Teng
From Predictive Algorithms to Automatic Generation of Anomalies By Sendhil Mullainathan; Ashesh Rambachan
Quantifying Trade from Renaissance Merchant Letters By Fabio Gatti
Cultural and Creative Employment Across Italian Regions By Leogrande, Angelo
DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations By Armand Mihai Cismaru
Skewed signals? Confronting biases in Online Job Ads data By FERNANDEZ MACIAS Enrique; SOSTERO Matteo
Predicting Mergers and Acquisitions in Competitive Industries: A Model Based on Temporal Dynamics and Industry Networks By Dayu Yang
Stress index strategy enhanced with financial news sentiment analysis for the equity markets By Baptiste Lefort; Eric Benhamou; Jean-Jacques Ohana; David Saltiel; Beatrice Guez; Thomas Jacquot
Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting By Matteo Mogliani; Anna Simoni
Ethnic Inequality and Economic Growth: Evidence from Harmonized Satellite Data By Klaus Gründler; Andreas Link
StockGPT: A GenAI Model for Stock Prediction and Trading By Dat Mai
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past By Van Pham; Scott Cunningham
Developing An Attention-Based Ensemble Learning Framework for Financial Portfolio Optimisation By Zhenglong Li; Vincent Tam
Algorithmic Collusion by Large Language Models By Sara Fish; Yannai A. Gonczarowski; Ran I. Shorrer
Tracking Real Time Layoffs with SEC Filings: A Preliminary Investigation By Leland D. Crane; Emily Green; Molly Harnish; Will McClennan; Paul E. Soto; Betsy Vrankovich; Jacob Williams

Sentiment trading with large language models

By:	Kirtac, Kemal; Germano, Guido
Abstract:	We analyse the performance of the large language models (LLMs) OPT, BERT, and FinBERT, alongside the traditional Loughran-McDonald dictionary, in the sentiment analysis of 965, 375 U.S. financial news articles from 2010 to 2023. Our findings reveal that the GPT-3-based OPT model significantly outperforms the others, predicting stock market returns with an accuracy of 74.4%. A long-short strategy based on OPT, accounting for 10 basis points (bps) in transaction costs, yields an exceptional Sharpe ratio of 3.05. From August 2021 to July 2023, this strategy produces an impressive 355% gain, outperforming other strategies and traditional market portfolios. This underscores the transformative potential of LLMs in financial market prediction and portfolio management and the necessity of employing sophisticated language models to develop effective investment strategies based on news sentiment.
Keywords:	artificial intelligence investment strategies; generative pre-trained transformer (GPT); large language models; machine learning in stock return prediction; natural language processing (NLP)
JEL:	C53 G10 G11 G12 G14
Date:	2024–04–01
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:122592&r=big

For What It's Worth: Measuring Land Value in the Era of Big Data and Machine Learning

By:	Scott Wentland; Gary Cornwall; Jeremy G. Moulton
Abstract:	This paper develops a new method for valuing land, a key asset on a nation’s balance sheet. The method first employs an unsupervised machine learning method, kmeans clustering, to discretize unobserved heterogeneity, which we then combine with a supervised learning algorithm, gradient boosted trees (GBT), to obtain property-level price predictions and estimates of the land component. Our initial results from a large national dataset show this approach routinely outperforms hedonic regression methods (as used by the U.K.’s Office for National Statistics, for example) in out-of-sample price predictions. To exploit the best of both methods, we further explore a composite approach using model stacking, finding it outperforms all methods in out-of-sample tests and a benchmark test against nearby vacant land sales. In an application, we value residential, commercial, industrial, and agricultural land for the entire contiguous U.S. from 2006-2015. The results offer new insights into valuation and demonstrate how a unified method can build national and subnational estimates of land value from detailed, parcel-level data. We discuss further applications to economic policy and the property valuation literature more generally.
JEL:	E01
Date:	2023–06
URL:	http://d.repec.org/n?u=RePEc:bea:papers:0115&r=big

QFNN-FFD: Quantum Federated Neural Network for Financial Fraud Detection

By:	Nouhaila Innan; Alberto Marchisio; Muhammad Shafique; Mohamed Bennai
Abstract:	This study introduces the Quantum Federated Neural Network for Financial Fraud Detection (QFNN-FFD), a cutting-edge framework merging Quantum Machine Learning (QML) and quantum computing with Federated Learning (FL) to innovate financial fraud detection. Using quantum technologies' computational power and FL's data privacy, QFNN-FFD presents a secure, efficient method for identifying fraudulent transactions. Implementing a dual-phase training model across distributed clients surpasses existing methods in performance. QFNN-FFD significantly improves fraud detection and ensures data confidentiality, marking a significant advancement in fintech solutions and establishing a new standard for privacy-focused fraud detection.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.02595&r=big

Harnessing Satellite Data to Improve Social Assistance Targeting in the Eastern Caribbean

By:	Sophia Chen; Ryu Matsuura; Flavien Moreau; Ms. Joana Pereira
Abstract:	Prioritizing populations most in need of social assistance is an important policy decision. In the Eastern Caribbean, social assistance targeting is constrained by limited data and the need for rapid support in times of large economic and natural disaster shocks. We leverage recent advances in machine learning and satellite imagery processing to propose an implementable strategy in the face of these constraints. We show that local well-being can be predicted with high accuracy in the Eastern Caribbean region using satellite data and that such predictions can be used to improve targeting by reducing aggregation bias, better allocating resources across areas, and proxying for information difficult to verify.
Keywords:	Social assistance targeting; satellite data; machine learning; Eastern Caribbean; Small Island Developing States.
Date:	2024–04–05
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2024/084&r=big

Neural Network Modeling for Forecasting Tourism Demand in Stopi\'{c}a Cave: A Serbian Cave Tourism Study

By:	Buda Baji\'c; Sr{\dj}an Mili\'cevi\'c; Aleksandar Anti\'c; Slobodan Markovi\'c; Nemanja Tomi\'c
Abstract:	For modeling the number of visits in Stopi\'{c}a cave (Serbia) we consider the classical Auto-regressive Integrated Moving Average (ARIMA) model, Machine Learning (ML) method Support Vector Regression (SVR), and hybrid NeuralPropeth method which combines classical and ML concepts. The most accurate predictions were obtained with NeuralPropeth which includes the seasonal component and growing trend of time-series. In addition, non-linearity is modeled by shallow Neural Network (NN), and Google Trend is incorporated as an exogenous variable. Modeling tourist demand represents great importance for management structures and decision-makers due to its applicability in establishing sustainable tourism utilization strategies in environmentally vulnerable destinations such as caves. The data provided insights into the tourist demand in Stopi\'{c}a cave and preliminary data for addressing the issues of carrying capacity within the most visited cave in Serbia.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.04974&r=big

Machine learning-based similarity measure to forecast M&A from patent data

By:	Giambattista Albora; Matteo Straccamore; Andrea Zaccaria
Abstract:	Defining and finalizing Mergers and Acquisitions (M&A) requires complex human skills, which makes it very hard to automatically find the best partner or predict which firms will make a deal. In this work, we propose the MASS algorithm, a specifically designed measure of similarity between companies and we apply it to patenting activity data to forecast M&A deals. MASS is based on an extreme simplification of tree-based machine learning algorithms and naturally incorporates intuitive criteria for deals; as such, it is fully interpretable and explainable. By applying MASS to the Zephyr and Crunchbase datasets, we show that it outperforms LightGCN, a "black box" graph convolutional network algorithm. When similar companies have disjoint patenting activities, on the contrary, LightGCN turns out to be the most effective algorithm. This study provides a simple and powerful tool to model and predict M&A deals, offering valuable insights to managers and practitioners for informed decision-making.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.07179&r=big

RiskLabs: Predicting Financial Risk Using Large Language Model Based on Multi-Sources Data

By:	Yupeng Cao; Zhi Chen; Qingyun Pei; Fabrizio Dimino; Lorenzo Ausiello; Prashant Kumar; K. P. Subbalakshmi; Papa Momar Ndiaye
Abstract:	The integration of Artificial Intelligence (AI) techniques, particularly large language models (LLMs), in finance has garnered increasing academic attention. Despite progress, existing studies predominantly focus on tasks like financial text summarization, question-answering (Q$\&$A), and stock movement prediction (binary classification), with a notable gap in the application of LLMs for financial risk prediction. Addressing this gap, in this paper, we introduce \textbf{RiskLabs}, a novel framework that leverages LLMs to analyze and predict financial risks. RiskLabs uniquely combines different types of financial data, including textual and vocal information from Earnings Conference Calls (ECCs), market-related time series data, and contextual news data surrounding ECC release dates. Our approach involves a multi-stage process: initially extracting and analyzing ECC data using LLMs, followed by gathering and processing time-series data before the ECC dates to model and understand risk over different timeframes. Using multimodal fusion techniques, RiskLabs amalgamates these varied data features for comprehensive multi-task financial risk prediction. Empirical experiment results demonstrate RiskLab's effectiveness in forecasting both volatility and variance in financial markets. Through comparative experiments, we demonstrate how different data sources contribute to financial risk assessment and discuss the critical role of LLMs in this context. Our findings not only contribute to the AI in finance application but also open new avenues for applying LLMs in financial risk assessment.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.07452&r=big

Early warning systems for financial markets of emerging economies

By:	Artem Kraevskiy; Artem Prokhorov; Evgeniy Sokolovskiy
Abstract:	We develop and apply a new online early warning system (EWS) for what is known in machine learning as concept drift, in economics as a regime shift and in statistics as a change point. The system goes beyond linearity assumed in many conventional methods, and is robust to heavy tails and tail-dependence in the data, making it particularly suitable for emerging markets. The key component is an effective change-point detection mechanism for conditional entropy of the data, rather than for a particular indicator of interest. Combined with recent advances in machine learning methods for high-dimensional random forests, the mechanism is capable of finding significant shifts in information transfer between interdependent time series when traditional methods fail. We explore when this happens using simulations and we provide illustrations by applying the method to Uzbekistan's commodity and equity markets as well as to Russia's equity market in 2021-2023.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.03319&r=big

Pre-publication revisions of bank financial statements: a novel way to monitor banks?

By:	Andre Guettler; Mahvish Naeem; Lars Norden; Bernardus F Nazar Van Doornik
Abstract:	We investigate whether pre-publication revisions of bank financial statements contain forward-looking information about bank risk. Using 7.4 million observations of monthly financial reports from all banks in Brazil during 2007-2019, we show that 78% of all revisions occur before the publication of these statements. The frequency, missing of reporting deadlines, and severity of revisions are positively related to future bank risk. Using machine learning techniques, we provide evidence on mechanisms through which revisions affect bank risk. Our findings suggest that private information about pre-publication revisions is useful for supervisors to monitor banks.
Keywords:	banks, bank performance, regulatory reporting quality, regulatory oversight, machine learning
JEL:	G21 G28 M41
Date:	2024–03
URL:	http://d.repec.org/n?u=RePEc:bis:biswps:1177&r=big

Artificial Intelligence-based Analysis of Change in Public Finance between US and International Markets

By:	Kapil Panda
Abstract:	Public finances are one of the fundamental mechanisms of economic governance that refer to the financial activities and decisions made by government entities to fund public services, projects, and operations through assets. In today's globalized landscape, even subtle shifts in one nation's public debt landscape can have significant impacts on that of international finances, necessitating a nuanced understanding of the correlations between international and national markets to help investors make informed investment decisions. Therefore, by leveraging the capabilities of artificial intelligence, this study utilizes neural networks to depict the correlations between US and International Public Finances and predict the changes in international public finances based on the changes in US public finances. With the neural network model achieving a commendable Mean Squared Error (MSE) value of 2.79, it is able to affirm a discernible correlation and also plot the effect of US market volatility on international markets. To further test the accuracy and significance of the model, an economic analysis was conducted that aimed to correlate the changes seen by the results of the model with historical stock market changes. This model demonstrates significant potential for investors to predict changes in international public finances based on signals from US markets, marking a significant stride in comprehending the intricacies of global public finances and the role of artificial intelligence in decoding its multifaceted patterns for practical forecasting.
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2403.18823&r=big

A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

By:	Lorenc Kapllani; Long Teng
Abstract:	In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.08456&r=big

From Predictive Algorithms to Automatic Generation of Anomalies

By:	Sendhil Mullainathan; Ashesh Rambachan
Abstract:	Machine learning algorithms can find predictive signals that researchers fail to notice; yet they are notoriously hard-to-interpret. How can we extract theoretical insights from these black boxes? History provides a clue. Facing a similar problem -- how to extract theoretical insights from their intuitions -- researchers often turned to ``anomalies:'' constructed examples that highlight flaws in an existing theory and spur the development of new ones. Canonical examples include the Allais paradox and the Kahneman-Tversky choice experiments for expected utility theory. We suggest anomalies can extract theoretical insights from black box predictive algorithms. We develop procedures to automatically generate anomalies for an existing theory when given a predictive algorithm. We cast anomaly generation as an adversarial game between a theory and a falsifier, the solutions to which are anomalies: instances where the black box algorithm predicts - were we to collect data - we would likely observe violations of the theory. As an illustration, we generate anomalies for expected utility theory using a large, publicly available dataset on real lottery choices. Based on an estimated neural network that predicts lottery choices, our procedures recover known anomalies and discover new ones for expected utility theory. In incentivized experiments, subjects violate expected utility theory on these algorithmically generated anomalies; moreover, the violation rates are similar to observed rates for the Allais paradox and Common ratio effect.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.10111&r=big

Quantifying Trade from Renaissance Merchant Letters

By:	Fabio Gatti (University of Bern)
Abstract:	Medieval and Early-Modern business correspondence between European companies constitutes a rich source of economic, business, and trade information in that the writing of letters was the very instrument through which merchants ordered and organized the shipments of goods, and performed financial operations. While a comprehensive analysis of such material enables scholars to re-construct the supply chains and sales of various goods, as well as identify the trading networks in the Europe, much of the archival sources have not undergone any systematic and quantitative analysis. In this paper we develop a new holistic and quantitative approach for analysing the entire outgoing, and so far unexploited, correspondence of a major Renaissance merchantbank - the Saminiati & Guasconi company of Florence - for the first years of its activity. After digitization of the letters, we employ an AI-based HTR model on the Transkribus platform and perform an automated-text analysis over the HTR-model’s output. For each letter (6, 376 epistles) this results in the identification of the addressee (446 merchants), their place of residence (65 towns), and the traded goods (27 main goods). The approach developed arguably provides a best-practice methodology for the quantitative treatment of medieval and early-modern merchant letters and the use of the derived historical text as data.
Keywords:	HTR, Machine Learning, Text Analysis, Merchant Letters
JEL:	N00 N01 C80 C88 C89
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:hes:wpaper:0258&r=big

Cultural and Creative Employment Across Italian Regions

By:	Leogrande, Angelo
Abstract:	in the following article I analyze the trend of cultural and creative employment in the Italian regions between 2004 and 2022 through the use of ISTAT-BES data. After presenting a static analysis, I also present the results of the clustering analysis aimed at identifying groupings between Italian regions. Subsequently, an econometric model is proposed for estimating the value of cultural and creative employment in the Italian regions. Finally, I compare various machine learning models for predicting the value of cultural and creative employment. The results are critically discussed through an economic policy analysis.
Keywords:	ovation, Innovation and Invention, Management of Technological Innovation and R&D, Technological Change, Intellectual Property and Intellectual Capital
JEL:	O30 O31 O32 O33 O34
Date:	2024–02
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:120603&r=big

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations

By:	Armand Mihai Cismaru
Abstract:	In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX's capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging "black-box" Deep Learning systems to create more efficient financial markets.
Date:	2024–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2403.18831&r=big

Skewed signals? Confronting biases in Online Job Ads data

By:	FERNANDEZ MACIAS Enrique (European Commission - JRC); SOSTERO Matteo
Abstract:	Most job vacancies in advanced economies are advertised online. With big data analytics, they can be converted into useful data for research. This data based on Online Job Ads (OJA) is a very promising source for labour market analysis and skills intelligence: rich in content, granular in detail, and available almost in real-time. This data is increasingly being used, in particular to study the changing nature of skills. Some key findings from OJA data emphasize: 1) The growing importance of digital and soft skills; 2) An acceleration in the rate of change in skills demand; 3) A growing hybridisation of jobs. However, some of these findings may be driven by biases inherent in OJA data, because: 1) It tends to overrepresent high-skill occupations relative to manual ones, particularly in ICT; 2) It covers better skills which are formal and standardised, typically associated to profession-al occupations; 3) It suffers from social desirability bias, with positive and soft attributes being over-emphasized in the vacancy notices. OJA can provide frequent and detailed data on labour market trends, and help identify emerging skills and occupations. However, it suffers from biases and it cannot provide all answers. OJA should complement, rather than replace, data from traditional surveys and administrative sources on the labour market
Date:	2024–01
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc136599&r=big

Predicting Mergers and Acquisitions in Competitive Industries: A Model Based on Temporal Dynamics and Industry Networks

By:	Dayu Yang
Abstract:	M&A activities are pivotal for market consolidation, enabling firms to augment market power through strategic complementarities. Existing research often overlooks the peer effect, the mutual influence of M&A behaviors among firms, and fails to capture complex interdependencies within industry networks. Common approaches suffer from reliance on ad-hoc feature engineering, data truncation leading to significant information loss, reduced predictive accuracy, and challenges in real-world application. Additionally, the rarity of M&A events necessitates data rebalancing in conventional models, introducing bias and undermining prediction reliability. We propose an innovative M&A predictive model utilizing the Temporal Dynamic Industry Network (TDIN), leveraging temporal point processes and deep learning to adeptly capture industry-wide M&A dynamics. This model facilitates accurate, detailed deal-level predictions without arbitrary data manipulation or rebalancing, demonstrated through superior evaluation results from M&A cases between January 1997 and December 2020. Our approach marks a significant improvement over traditional models by providing detailed insights into M&A activities and strategic recommendations for specific firms.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.07298&r=big

Stress index strategy enhanced with financial news sentiment analysis for the equity markets

By:	Baptiste Lefort; Eric Benhamou; Jean-Jacques Ohana; David Saltiel; Beatrice Guez; Thomas Jacquot
Abstract:	This paper introduces a new risk-on risk-off strategy for the stock market, which combines a financial stress indicator with a sentiment analysis done by ChatGPT reading and interpreting Bloomberg daily market summaries. Forecasts of market stress derived from volatility and credit spreads are enhanced when combined with the financial news sentiment derived from GPT-4. As a result, the strategy shows improved performance, evidenced by higher Sharpe ratio and reduced maximum drawdowns. The improved performance is consistent across the NASDAQ, the S&P 500 and the six major equity markets, indicating that the method generalises across equities markets.
Date:	2024–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.00012&r=big

Bayesian Bi-level Sparse Group Regressions for Macroeconomic Forecasting

By:	Matteo Mogliani; Anna Simoni
Abstract:	We propose a Machine Learning approach for optimal macroeconomic forecasting in a high-dimensional setting with covariates presenting a known group structure. Our model encompasses forecasting settings with many series, mixed frequencies, and unknown nonlinearities. We introduce in time-series econometrics the concept of bi-level sparsity, i.e. sparsity holds at both the group level and within groups, and we assume the true model satisfies this assumption. We propose a prior that induces bi-level sparsity, and the corresponding posterior distribution is demonstrated to contract at the minimax-optimal rate, recover the model parameters, and have a support that includes the support of the model asymptotically. Our theory allows for correlation between groups, while predictors in the same group can be characterized by strong covariation as well as common characteristics and patterns. Finite sample performance is illustrated through comprehensive Monte Carlo experiments and a real-data nowcasting exercise of the US GDP growth rate.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.02671&r=big

Ethnic Inequality and Economic Growth: Evidence from Harmonized Satellite Data

By:	Klaus Gründler; Andreas Link
Abstract:	Inequality between ethnic groups has been shown to be negatively related to GDP, but research on its eﬀect on contemporary economic growth is limited by the availability of comparable data. We compile a novel and comprehensive dataset of harmonized Gini indices on ethnic inequality for countries and sub-national units between 1992 and 2013. Our approach exploits diﬀerentials in nighttime lights (NTL) across ethnic homelands, using new techniques to harmonize NTL series across geographic regions and years to address concerns about spatial and temporal incomparability of satellite photographs. Our new data shows that ethnic inequality is widespread across countries but has decreased over time. Exploiting the artiﬁciality of sub-national borders in an instrumental variable setting, we provide evidence that income inequality across ethnic groups reduces contemporary economic growth. The negative eﬀect of ethnic inequality is caused by increasing conﬂict and decreasing public goods provision.
Keywords:	ethnic inequality, economic development, regional data, nighttime lights, satellite photographs, calibration, ethnic groups, conflict, public goods provision
JEL:	O10 O15 O43
Date:	2024
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_11034&r=big

StockGPT: A GenAI Model for Stock Prediction and Trading

By:	Dat Mai
Abstract:	This paper introduces StockGPT, an autoregressive "number" model pretrained directly on the history of daily U.S. stock returns. Treating each return series as a sequence of tokens, the model excels at understanding and predicting the highly intricate stock return dynamics. Instead of relying on handcrafted trading patterns using historical stock prices, StockGPT automatically learns the hidden representations predictive of future returns via its attention mechanism. On a held-out test sample from 2001 to 2023, a daily rebalanced long-short portfolio formed from StockGPT predictions earns an annual return of 119% with a Sharpe ratio of 6.5. The StockGPT-based portfolio completely explains away momentum and long-/short-term reversals, eliminating the need for manually crafted price-based strategies and also encompasses most leading stock market factors. This highlights the immense promise of generative AI in surpassing human in making complex financial investment decisions and illustrates the efficacy of the attention mechanism of large language models when applied to a completely different domain.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.05101&r=big

ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past

By:	Van Pham; Scott Cunningham
Abstract:	This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can accurately forecast future events using two distinct prompting strategies. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of experiment stopped at September 2021, and ask about events that happened in 2022 using ChatGPT-3.5 and ChatGPT-4. We employed two prompting strategies: direct prediction and what we call future narratives which ask ChatGPT to tell fictional stories set in the future with characters that share events that have happened to them, but after ChatGPT's training data had been collected. Concentrating on events in 2022, we prompted ChatGPT to engage in storytelling, particularly within economic contexts. After analyzing 100 prompts, we discovered that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy. This was especially evident in its predictions of major Academy Award winners as well as economic trends, the latter inferred from scenarios where the model impersonated public figures like the Federal Reserve Chair, Jerome Powell. These findings indicate that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions. Our research reveals new aspects of LLMs' predictive capabilities and suggests potential future applications in analytical contexts.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.07396&r=big

Developing An Attention-Based Ensemble Learning Framework for Financial Portfolio Optimisation

By:	Zhenglong Li; Vincent Tam
Abstract:	In recent years, deep or reinforcement learning approaches have been applied to optimise investment portfolios through learning the spatial and temporal information under the dynamic financial market. Yet in most cases, the existing approaches may produce biased trading signals based on the conventional price data due to a lot of market noises, which possibly fails to balance the investment returns and risks. Accordingly, a multi-agent and self-adaptive portfolio optimisation framework integrated with attention mechanisms and time series, namely the MASAAT, is proposed in this work in which multiple trading agents are created to observe and analyse the price series and directional change data that recognises the significant changes of asset prices at different levels of granularity for enhancing the signal-to-noise ratio of price series. Afterwards, by reconstructing the tokens of financial data in a sequence, the attention-based cross-sectional analysis module and temporal analysis module of each agent can effectively capture the correlations between assets and the dependencies between time points. Besides, a portfolio generator is integrated into the proposed framework to fuse the spatial-temporal information and then summarise the portfolios suggested by all trading agents to produce a newly ensemble portfolio for reducing biased trading actions and balancing the overall returns and risks. The experimental results clearly demonstrate that the MASAAT framework achieves impressive enhancement when compared with many well-known portfolio optimsation approaches on three challenging data sets of DJIA, S&P 500 and CSI 300. More importantly, our proposal has potential strengths in many possible applications for future study.
Date:	2024–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.08935&r=big

Algorithmic Collusion by Large Language Models

By:	Sara Fish; Yannai A. Gonczarowski; Ran I. Shorrer
Abstract:	The rise of algorithmic pricing raises concerns of algorithmic collusion. We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs), and specifically GPT-4. We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions ("prompts") may increase collusion. These results extend to auction settings. Our findings underscore the need for antitrust regulation regarding algorithmic pricing, and uncover regulatory challenges unique to LLM-based pricing agents.
Date:	2024–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2404.00806&r=big

Tracking Real Time Layoffs with SEC Filings: A Preliminary Investigation

By:	Leland D. Crane; Emily Green; Molly Harnish; Will McClennan; Paul E. Soto; Betsy Vrankovich; Jacob Williams
Abstract:	We explore a new source of data on layoffs: timely 8-K filings with the Securities and and Exchange Commission. We develop measures of both the number of reported layoff events and the number of affected workers. These series are highly correlated with the business cycle and other layoff indicators. Linking firm-level reported layoff events with WARN notices suggests that 8-K filings are sometimes available before WARN notices, and preliminary regression results suggest our layoff series are useful for forecasting. We also document the industry composition of the data and specific areas where the industry shares diverge.
Keywords:	Forecasting; Labor markets; Large language models; Alternative data; Natural language processing; Layoffs
JEL:	E24 G28 J21 M51 C81
Date:	2024–04–11
URL:	http://d.repec.org/n?u=RePEc:fip:fedgfe:2024-20&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.