nep-big 2024-12-16 papers

on Big Data

Issue of 2024–12–16
twenty-six papers chosen by
Tom Coupé, University of Canterbury

Analyst Reports and Stock Performance: Evidence from the Chinese Market By Rui Liu; Jiayou Liang; Haolong Chen; Yujia Hu
Taming the Curse of Dimensionality: Quantitative Economics with Deep Learning By Jésus Fernández-Villaverde; Galo Nuño; Jesse Perla; Jesús Fernández-Villaverde
Portfolio Optimization with Feedback Strategies Based on Artificial Neural Networks By Yaacov Kopeliovich; Michael Pokojovy
Price Prediction Using Machine Learning By Asef Yelghi; Aref Yelghi; Shirmohammad Tavangari
Sparse Interval-valued Time Series Modeling with Machine Learning By Haowen Bao; Yongmiao Hong; Yuying Sun; Shouyang Wang
MCI-GRU: Stock Prediction Model Based on Multi-Head Cross-Attention and Improved GRU By Peng Zhu; Yuante Li; Yifan Hu; Sheng Xiang; Qinyuan Liu; Dawei Cheng; Yuqi Liang
Words that Move Markets- Quantifying the Impact of RBI's Monetary Policy Communications on Indian Financial Market By Rohit Kumar; Sourabh Bikas Paul; Nikita Singh
Climate AI for Corporate Decarbonization Metrics Extraction By Aditya Dave; Mengchen Zhu; Dapeng Hu; Sachin Tiwari
On the (Mis)Use of Machine Learning with Panel Data By Augusto Cerqua; Marco Letta; Gabriele Pinto
Quantifying the Differences in Innovation Processes in China, Japan and the United States by Document Level Concordance between Patents and Web Contents By MOTOHASHI Kazuyuki; ZHU Chen
Fitting item response theory models using deep learning computational frameworks By Luo, Nanyu; Ji, Feng; Han, Yuting; He, Jinbo; Zhang, Xiaoya
Semiparametric inference for impulse response functions using double/debiased machine learning By Daniele Ballinari; Alexander Wehrli
Quantifying Qualitative Insights: Leveraging LLMs to Market Predict By Hoyoung Lee; Youngsoo Choi; Yuhee Kwon
Beyond the Fundamentals: How Media-Driven Narratives Influence Cross-Border Capital Flows By Agarwal, Isha; Chen, Wentong; Prasad, Eswar
On the Asymptotic Properties of Debiased Machine Learning Estimators By Amilcar Velez
Utilizing RNN for Real-time Cryptocurrency Price Prediction and Trading Strategy Optimization By Shamima Nasrin Tumpa; Kehelwala Dewage Gayan Maduranga
Bitcoin Research with a Transaction Graph Dataset By Hugo Schnoering; Michalis Vazirgiannis
Hiring and the Dynamics of the Gender Gap By Hannah Illing; Hanna Schwank; Linh T. Tô
A Risk Sensitive Contract-unified Reinforcement Learning Approach for Option Hedging By Xianhua Peng; Xiang Zhou; Bo Xiao; Yi Wu
Reinforcement Learning Framework for Quantitative Trading By Alhassan S. Yasin; Prabdeep S. Gill
Hybrid Vector Auto Regression and Neural Network Model for Order Flow Imbalance Prediction in High Frequency Trading By Abdul Rahman; Neelesh Upadhye
Bridging an energy system model with an ensemble deep-learning approach for electricity price forecasting By Souhir Ben Amor; Thomas M\"obius; Felix M\"usgens
Bounded Rationality in Central Bank Communication By Wonseong Kim; Choong Lyol Lee
Refined and Segmented Price Sentiment Indices from Survey Comments By Masahiro Suzuki; Hiroki Sakaji
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models By Tianyu Zhou; Pinqiao Wang; Yilin Wu; Hongyang Yang
Nowcasting inflation using prices from the web By Mirko Ðukic, Iva Krsmanovic, Miodrag Petkovic; Mirko Ðukic; Iva Krsmanovic; Miodrag Petkovic

Analyst Reports and Stock Performance: Evidence from the Chinese Market

By:	Rui Liu; Jiayou Liang; Haolong Chen; Yujia Hu
Abstract:	This article applies natural language processing (NLP) to extract and quantify textual information to predict stock performance. Using an extensive dataset of Chinese analyst reports and employing a customized BERT deep learning model for Chinese text, this study categorizes the sentiment of the reports as positive, neutral, or negative. The findings underscore the predictive capacity of this sentiment indicator for stock volatility, excess returns, and trading volume. Specifically, analyst reports with strong positive sentiment will increase excess return and intraday volatility, and vice versa, reports with strong negative sentiment also increase volatility and trading volume, but decrease future excess return. The magnitude of this effect is greater for positive sentiment reports than for negative sentiment reports. This article contributes to the empirical literature on sentiment analysis and the response of the stock market to news in the Chinese stock market.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.08726

Taming the Curse of Dimensionality: Quantitative Economics with Deep Learning

By:	Jésus Fernández-Villaverde; Galo Nuño; Jesse Perla; Jesús Fernández-Villaverde
Abstract:	We argue that deep learning provides a promising avenue for taming the curse of dimensionality in quantitative economics. We begin by exploring the unique challenges posed by solving dynamic equilibrium models, especially the feedback loop between individual agents’ decisions and the aggregate consistency conditions required by equilibrium. Following this, we introduce deep neural networks and demonstrate their application by solving the stochastic neoclassical growth model. Next, we compare deep neural networks with traditional solution methods in quantitative economics. We conclude with a survey of neural network applications in quantitative economics and offer reasons for cautious optimism.
Keywords:	deep learning, quantitative economics
JEL:	C61 C63 E27
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:ces:ceswps:_11448

Portfolio Optimization with Feedback Strategies Based on Artificial Neural Networks

By:	Yaacov Kopeliovich; Michael Pokojovy
Abstract:	With the recent advancements in machine learning (ML), artificial neural networks (ANN) are starting to play an increasingly important role in quantitative finance. Dynamic portfolio optimization is among many problems that have significantly benefited from a wider adoption of deep learning (DL). While most existing research has primarily focused on how DL can alleviate the curse of dimensionality when solving the Hamilton-Jacobi-Bellman (HJB) equation, some very recent developments propose to forego derivation and solution of HJB in favor of empirical utility maximization over dynamic allocation strategies expressed through ANN. In addition to being simple and transparent, this approach is universally applicable, as it is essentially agnostic about market dynamics. To showcase the method, we apply it to optimal portfolio allocation between a cash account and the S&P 500 index modeled using geometric Brownian motion or the Heston model. In both cases, the results are demonstrated to be on par with those under the theoretical optimal weights assuming isoelastic utility and real-time rebalancing. A set of R codes for a broad class of stochastic volatility models are provided as a supplement.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.09899

Price Prediction Using Machine Learning

By:	Asef Yelghi; Aref Yelghi; Shirmohammad Tavangari
Abstract:	The development of artificial intelligence has made significant contributions to the financial sector. One of the main interests of investors is price predictions. Technical and fundamental analyses, as well as econometric analyses, are conducted for price predictions; recently, the use of AI-based methods has become more prevalent. This study examines daily Dollar/TL exchange rates from January 1, 2020, to October 4, 2024. It has been observed that among artificial intelligence models, random forest, support vector machines, k-nearest neighbors, decision trees, and gradient boosting models were not suitable; however, multilayer perceptron and linear regression models showed appropriate suitability and despite the sharp increase in Dollar/TL rates in Turkey as of 2019, the suitability of valid models has been maintained.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.04259

Sparse Interval-valued Time Series Modeling with Machine Learning

By:	Haowen Bao; Yongmiao Hong; Yuying Sun; Shouyang Wang
Abstract:	By treating intervals as inseparable sets, this paper proposes sparse machine learning regressions for high-dimensional interval-valued time series. With LASSO or adaptive LASSO techniques, we develop a penalized minimum distance estimation, which covers point-based estimators are special cases. We establish the consistency and oracle properties of the proposed penalized estimator, regardless of whether the number of predictors is diverging with the sample size. Monte Carlo simulations demonstrate the favorable finite sample properties of the proposed estimation. Empirical applications to interval-valued crude oil price forecasting and sparse index-tracking portfolio construction illustrate the robustness and effectiveness of our method against competing approaches, including random forest and multilayer perceptron for interval-valued data. Our findings highlight the potential of machine learning techniques in interval-valued time series analysis, offering new insights for financial forecasting and portfolio management.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.09452

MCI-GRU: Stock Prediction Model Based on Multi-Head Cross-Attention and Improved GRU

By:	Peng Zhu; Yuante Li; Yifan Hu; Sheng Xiang; Qinyuan Liu; Dawei Cheng; Yuqi Liang
Abstract:	As financial markets grow increasingly complex in the big data era, accurate stock prediction has become more critical. Traditional time series models, such as GRUs, have been widely used but often struggle to capture the intricate nonlinear dynamics of markets, particularly in the flexible selection and effective utilization of key historical information. Recently, methods like Graph Neural Networks and Reinforcement Learning have shown promise in stock prediction but require high data quality and quantity, and they tend to exhibit instability when dealing with data sparsity and noise. Moreover, the training and inference processes for these models are typically complex and computationally expensive, limiting their broad deployment in practical applications. Existing approaches also generally struggle to capture unobservable latent market states effectively, such as market sentiment and expectations, microstructural factors, and participant behavior patterns, leading to an inadequate understanding of market dynamics and subsequently impact prediction accuracy. To address these challenges, this paper proposes a stock prediction model, MCI-GRU, based on a multi-head cross-attention mechanism and an improved GRU. First, we enhance the GRU model by replacing the reset gate with an attention mechanism, thereby increasing the model's flexibility in selecting and utilizing historical information. Second, we design a multi-head cross-attention mechanism for learning unobservable latent market state representations, which are further enriched through interactions with both temporal features and cross-sectional features. Finally, extensive experiments on four main stock markets show that the proposed method outperforms SOTA techniques across multiple metrics. Additionally, its successful application in real-world fund management operations confirms its effectiveness and practicality.
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2410.20679

Words that Move Markets- Quantifying the Impact of RBI's Monetary Policy Communications on Indian Financial Market

By:	Rohit Kumar; Sourabh Bikas Paul; Nikita Singh
Abstract:	We analyze the impact of the Reserve Bank of India's (RBI) monetary policy communications on Indian financial market from April 2014 to June 2024 using advanced natural language processing techniques. Employing BERTopic for topic modeling and a fine-tuned RoBERTa model for sentiment analysis, we assess how variations in sentiment across different economic topics affect the stock market. Our findings indicate that dovish sentiment generally leads to declines in equity markets, particularly in topics related to the interest rate policy framework and economic growth, suggesting that market participants interpret dovish language as signaling economic weakness rather than policy easing. Conversely, dovish sentiment regarding foreign exchange reserves management has a positive impact on equity market. These results highlight the importance of topic-specific communication strategies for central banks in emerging markets.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.04808

Climate AI for Corporate Decarbonization Metrics Extraction

By:	Aditya Dave; Mengchen Zhu; Dapeng Hu; Sachin Tiwari
Abstract:	Corporate Greenhouse Gas (GHG) emission targets are important metrics in sustainable investing [12, 16]. To provide a comprehensive view of company emission objectives, we propose an approach to source these metrics from company public disclosures. Without automation, curating these metrics manually is a labor-intensive process that requires combing through lengthy corporate sustainability disclosures that often do not follow a standard format. Furthermore, the resulting dataset needs to be validated thoroughly by Subject Matter Experts (SMEs), further lengthening the time-to-market. We introduce the Climate Artificial Intelligence for Corporate Decarbonization Metrics Extraction (CAI) model and pipeline, a novel approach utilizing Large Language Models (LLMs) to extract and validate linked metrics from corporate disclosures. We demonstrate that the process improves data collection efficiency and accuracy by automating data curation, validation, and metric scoring from public corporate disclosures. We further show that our results are agnostic to the choice of LLMs. This framework can be applied broadly to information extraction from textual data.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.03402

On the (Mis)Use of Machine Learning with Panel Data

By:	Augusto Cerqua; Marco Letta; Gabriele Pinto
Abstract:	Machine Learning (ML) is increasingly employed to inform and support policymaking interventions. This methodological article cautions practitioners about common but often overlooked pitfalls associated with the uncritical application of supervised ML algorithms to panel data. Ignoring the cross-sectional and longitudinal structure of this data can lead to hard-to-detect data leakage, inflated out-of-sample performance, and an inadvertent overestimation of the real-world usefulness and applicability of ML models. After clarifying these issues, we provide practical guidelines and best practices for applied researchers to ensure the correct implementation of supervised ML in panel data environments, emphasizing the need to define ex ante the primary goal of the analysis and align the ML pipeline accordingly. An empirical application based on over 3, 000 US counties from 2000 to 2019 illustrates the practical relevance of these points across nearly 500 models for both classification and regression tasks.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.09218

Quantifying the Differences in Innovation Processes in China, Japan and the United States by Document Level Concordance between Patents and Web Contents

By:	MOTOHASHI Kazuyuki; ZHU Chen
Abstract:	While innovation performance at country level has been analyzed using a variety of STI indicators, the relationship between them such as the patent-new product relationship is under-investigated. Historically, the relationship between technology and industrial output has been analyzed using technology-industry concordance matrices, but the granularity of output information is bounded by the industrial classification system. In this study, we use the text information in both patent and product-related keywords extracted from companyâ€™s web site contents to come up with detailed concordance information between technology and products, and compare them across three countries, China, Japan and the United States. First, we apply a dual attention model to extract product/service information from web page information. Then, using the textual information of both patent abstracts and product/service keywords, we develop a machine learning model to predict products/services from a particular type of technology. Then, we use this transformation model (from technology to product) to understand the difference in innovation processes of the three countries.
Date:	2024–10
URL:	https://d.repec.org/n?u=RePEc:eti:dpaper:24075

Fitting item response theory models using deep learning computational frameworks

By:	Luo, Nanyu; Ji, Feng; Han, Yuting; He, Jinbo; Zhang, Xiaoya
Abstract:	PyTorch and TensorFlow are two widely adopted, modern deep learning frameworks that offer comprehensive computation libraries for deep learning models. We illustrate how to utilize these deep learning computational platforms and infrastructure to estimate a class of popular psychometric models, dichotomous and polytomous Item Response Theory (IRT) models, along with their multidimensional extensions. Through simulation studies, the estimation performance on the simulated datasets demonstrates low mean square error and bias for model parameters. We discuss the potential of integrating modern deep learning tools and views into psychometric research.
Date:	2024–10–28
URL:	https://d.repec.org/n?u=RePEc:osf:osfxxx:tjxab

Semiparametric inference for impulse response functions using double/debiased machine learning

By:	Daniele Ballinari; Alexander Wehrli
Abstract:	We introduce a double/debiased machine learning (DML) estimator for the impulse response function (IRF) in settings where a time series of interest is subjected to multiple discrete treatments, assigned over time, which can have a causal effect on future outcomes. The proposed estimator can rely on fully nonparametric relations between treatment and outcome variables, opening up the possibility to use flexible machine learning approaches to estimate IRFs. To this end, we extend the theory of DML from an i.i.d. to a time series setting and show that the proposed DML estimator for the IRF is consistent and asymptotically normally distributed at the parametric rate, allowing for semiparametric inference for dynamic effects in a time series setting. The properties of the estimator are validated numerically in finite samples by applying it to learn the IRF in the presence of serial dependence in both the confounder and observation innovation processes. We also illustrate the methodology empirically by applying it to the estimation of the effects of macroeconomic shocks.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.10009

Quantifying Qualitative Insights: Leveraging LLMs to Market Predict

By:	Hoyoung Lee; Youngsoo Choi; Yuhee Kwon
Abstract:	Recent advancements in Large Language Models (LLMs) have the potential to transform financial analytics by integrating numerical and textual data. However, challenges such as insufficient context when fusing multimodal information and the difficulty in measuring the utility of qualitative outputs, which LLMs generate as text, have limited their effectiveness in tasks such as financial forecasting. This study addresses these challenges by leveraging daily reports from securities firms to create high-quality contextual information. The reports are segmented into text-based key factors and combined with numerical data, such as price information, to form context sets. By dynamically updating few-shot examples based on the query time, the sets incorporate the latest information, forming a highly relevant set closely aligned with the query point. Additionally, a crafted prompt is designed to assign scores to the key factors, converting qualitative insights into quantitative results. The derived scores undergo a scaling process, transforming them into real-world values that are used for prediction. Our experiments demonstrate that LLMs outperform time-series models in market forecasting, though challenges such as imperfect reproducibility and limited explainability remain.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.08404

Beyond the Fundamentals: How Media-Driven Narratives Influence Cross-Border Capital Flows

By:	Agarwal, Isha (University of British Columbia); Chen, Wentong (Cornell University); Prasad, Eswar (Cornell University)
Abstract:	We provide the first empirical evidence on how media-driven narratives influence cross-border institutional investment flows. Applying natural language processing techniques to one-and-a-half million newspaper articles, we document substantial cross-country variation in sentiment and risk indices constructed from domestic media narratives about China in 15 countries. These narratives significantly affect portfolio flows, even after controlling for macroeconomic and financial fundamentals. This impact is smaller for investors with greater familiarity or private information about China and larger during periods of heightened uncertainty. Political and environmental narratives are as influential as economic narratives. Investors react more sharply to negative narratives than positive ones.
Keywords:	media narratives, cross-border flows, institutional investors, portfolio investment in China, textual analysis, natural language processing
JEL:	F30 G11 G15
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp17442

On the Asymptotic Properties of Debiased Machine Learning Estimators

By:	Amilcar Velez
Abstract:	This paper studies the properties of debiased machine learning (DML) estimators under a novel asymptotic framework, offering insights for improving the performance of these estimators in applications. DML is an estimation method suited to economic models where the parameter of interest depends on unknown nuisance functions that must be estimated. It requires weaker conditions than previous methods while still ensuring standard asymptotic properties. Existing theoretical results do not distinguish between two alternative versions of DML estimators, DML1 and DML2. Under a new asymptotic framework, this paper demonstrates that DML2 asymptotically dominates DML1 in terms of bias and mean squared error, formalizing a previous conjecture based on simulation results regarding their relative performance. Additionally, this paper provides guidance for improving the performance of DML2 in applications.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.01864

Utilizing RNN for Real-time Cryptocurrency Price Prediction and Trading Strategy Optimization

By:	Shamima Nasrin Tumpa; Kehelwala Dewage Gayan Maduranga
Abstract:	This study explores the use of Recurrent Neural Networks (RNN) for real-time cryptocurrency price prediction and optimized trading strategies. Given the high volatility of the cryptocurrency market, traditional forecasting models often fall short. By leveraging RNNs' capability to capture long-term patterns in time-series data, this research aims to improve accuracy in price prediction and develop effective trading strategies. The project follows a structured approach involving data collection, preprocessing, and model refinement, followed by rigorous backtesting for profitability and risk assessment. This work contributes to both the academic and practical fields by providing a robust predictive model and optimized trading strategies that address the challenges of cryptocurrency trading.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.05829

Bitcoin Research with a Transaction Graph Dataset

By:	Hugo Schnoering; Michalis Vazirgiannis
Abstract:	Bitcoin, launched in 2008 by Satoshi Nakamoto, established a new digital economy where value can be stored and transferred in a fully decentralized manner - alleviating the need for a central authority. This paper introduces a large scale dataset in the form of a transactions graph representing transactions between Bitcoin users along with a set of tasks and baselines. The graph includes 252 million nodes and 785 million edges, covering a time span of nearly 13 years of and 670 million transactions. Each node and edge is timestamped. As for supervised tasks we provide two labeled sets i. a 33, 000 nodes based on entity type and ii. nearly 100, 000 Bitcoin addresses labeled with an entity name and an entity type. This is the largest publicly available data set of bitcoin transactions designed to facilitate advanced research and exploration in this domain, overcoming the limitations of existing datasets. Various graph neural network models are trained to predict node labels, establishing a baseline for future research. In addition, several use cases are presented to demonstrate the dataset's applicability beyond Bitcoin analysis. Finally, all data and source code is made publicly available to enable reproducibility of the results.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.10325

Hiring and the Dynamics of the Gender Gap

By:	Hannah Illing (University of Bonn, Institute for Employment Research (IAB) & Institute of Labor Economics (IZA)); Hanna Schwank (University of Bonn & Institute of Labor Economics (IZA)); Linh T. Tô (Boston University)
Abstract:	We investigate how the same hiring opportunity leads to different labor market outcomes for male and female full-time workers. To study firms’ wage-setting behavior following exogenous vacancies, we analyze the wages of new hires after sudden worker deaths between 1981 and 2016. Using admin- istrative data from Germany, we apply a novel technique to identify external replacement workers, and we use machine learning to compare replacements hired for comparable positions by similar firms. We find that female replacement workers’ starting wages are, on average, 10 log points lower than those of replacing men of the same productivity. Differences in labor supply, within-firm ad- justments, or outside options do not explain this gap; instead, we attribute it to gender differences in bargaining. We conclude that a significant portion of the gender wage gap emerges within firms at the hiring stage.
Keywords:	Gender Wage Gap, Hiring, Labor Supply
JEL:	J2 J31 J63
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:ajk:ajkdps:339

A Risk Sensitive Contract-unified Reinforcement Learning Approach for Option Hedging

By:	Xianhua Peng; Xiang Zhou; Bo Xiao; Yi Wu
Abstract:	We propose a new risk sensitive reinforcement learning approach for the dynamic hedging of options. The approach focuses on the minimization of the tail risk of the final P&L of the seller of an option. Different from most existing reinforcement learning approaches that require a parametric model of the underlying asset, our approach can learn the optimal hedging strategy directly from the historical market data without specifying a parametric model; in addition, the learned optimal hedging strategy is contract-unified, i.e., it applies to different options contracts with different initial underlying prices, strike prices, and maturities. Our approach extends existing reinforcement learning methods by learning the tail risk measures of the final hedging P&L and the optimal hedging strategy at the same time. We carry out comprehensive empirical study to show that, in the out-of-sample tests, the proposed reinforcement learning hedging strategy can obtain statistically significantly lower tail risk and higher mean of the final P&L than delta hedging methods.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.09659

Reinforcement Learning Framework for Quantitative Trading

By:	Alhassan S. Yasin; Prabdeep S. Gill
Abstract:	The inherent volatility and dynamic fluctuations within the financial stock market underscore the necessity for investors to employ a comprehensive and reliable approach that integrates risk management strategies, market trends, and the movement trends of individual securities. By evaluating specific data, investors can make more informed decisions. However, the current body of literature lacks substantial evidence supporting the practical efficacy of reinforcement learning (RL) agents, as many models have only demonstrated success in back testing using historical data. This highlights the urgent need for a more advanced methodology capable of addressing these challenges. There is a significant disconnect in the effective utilization of financial indicators to better understand the potential market trends of individual securities. The disclosure of successful trading strategies is often restricted within financial markets, resulting in a scarcity of widely documented and published strategies leveraging RL. Furthermore, current research frequently overlooks the identification of financial indicators correlated with various market trends and their potential advantages. This research endeavors to address these complexities by enhancing the ability of RL agents to effectively differentiate between positive and negative buy/sell actions using financial indicators. While we do not address all concerns, this paper provides deeper insights and commentary on the utilization of technical indicators and their benefits within reinforcement learning. This work establishes a foundational framework for further exploration and investigation of more complex scenarios.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.07585

Hybrid Vector Auto Regression and Neural Network Model for Order Flow Imbalance Prediction in High Frequency Trading

By:	Abdul Rahman; Neelesh Upadhye
Abstract:	In high frequency trading, accurate prediction of Order Flow Imbalance (OFI) is crucial for understanding market dynamics and maintaining liquidity. This paper introduces a hybrid predictive model that combines Vector Auto Regression (VAR) with a simple feedforward neural network (FNN) to forecast OFI and assess trading intensity. The VAR component captures linear dependencies, while residuals are fed into the FNN to model non-linear patterns, enabling a comprehensive approach to OFI prediction. Additionally, the model calculates the intensity on the Buy or Sell side, providing insights into which side holds greater trading pressure. These insights facilitate the development of trading strategies by identifying periods of high buy or sell intensity. Using both synthetic and real trading data from Binance, we demonstrate that the hybrid model offers significant improvements in predictive accuracy and enhances strategic decision-making based on OFI dynamics. Furthermore, we compare the hybrid models performance with standalone FNN and VAR models, showing that the hybrid approach achieves superior forecasting accuracy across both synthetic and real datasets, making it the most effective model for OFI prediction in high frequency trading.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.08382

Bridging an energy system model with an ensemble deep-learning approach for electricity price forecasting

By:	Souhir Ben Amor; Thomas M\"obius; Felix M\"usgens
Abstract:	This paper combines a techno-economic energy system model with an econometric model to maximise electricity price forecasting accuracy. The proposed combination model is tested on the German day-ahead wholesale electricity market. Our paper also benchmarks the results against several econometric alternatives. Lastly, we demonstrate the economic value of improved price estimators maximising the revenue from an electric storage resource. The results demonstrate that our integrated model improves overall forecasting accuracy by 18 %, compared to available literature benchmarks. Furthermore, our robustness checks reveal that a) the Ensemble Deep Neural Network model performs best in our dataset and b) adding output from the techno-economic energy systems model as econometric model input improves the performance of all econometric models. The empirical relevance of the forecast improvement is confirmed by the results of the exemplary storage optimisation, in which the integration of the techno-economic energy system model leads to a revenue increase of up to 10 %.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.04880

Bounded Rationality in Central Bank Communication

By:	Wonseong Kim; Choong Lyol Lee
Abstract:	This study explores the influence of FOMC sentiment on market expectations, focusing on cognitive differences between experts and non-experts. Using sentiment analysis of FOMC minutes, we integrate these insights into a bounded rationality model to examine the impact on inflation expectations. Results show that experts form more conservative expectations, anticipating FOMC stabilization actions, while non-experts react more directly to inflation concerns. A lead-lag analysis indicates that institutions adjust faster, though the gap with individual investors narrows in the short term. These findings highlight the need for tailored communication strategies to better align public expectations with policy goals.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.04286

Refined and Segmented Price Sentiment Indices from Survey Comments

By:	Masahiro Suzuki; Hiroki Sakaji
Abstract:	We aim to enhance a price sentiment index and to more precisely understand price trends from the perspective of not only consumers but also businesses. We extract comments related to prices from the Economy Watchers Survey conducted by the Cabinet Office of Japan and classify price trends using a large language model (LLM). We classify whether the survey sample reflects the perspective of consumers or businesses, and whether the comments pertain to goods or services by utilizing information on the fields of comments and the industries of respondents included in the Economy Watchers Survey. From these classified price-related comments, we construct price sentiment indices not only for a general purpose but also for more specific objectives by combining perspectives on consumers and prices, as well as goods and services. It becomes possible to achieve a more accurate classification of price directions by employing a LLM for classification. Furthermore, integrating the outputs of multiple LLMs suggests the potential for the better performance of the classification. The use of more accurately classified comments allows for the construction of an index with a higher correlation to existing indices than previous studies. We demonstrate that the correlation of the price index for consumers, which has a larger sample size, is further enhanced by selecting comments for aggregation based on the industry of the survey respondents.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.09937

FinRobot: AI Agent for Equity Research and Valuation with Large Language Models

By:	Tianyu Zhou; Pinqiao Wang; Yilin Wu; Hongyang Yang
Abstract:	As financial markets grow increasingly complex, there is a rising need for automated tools that can effectively assist human analysts in equity research, particularly within sell-side research. While Generative AI (GenAI) has attracted significant attention in this field, existing AI solutions often fall short due to their narrow focus on technical factors and limited capacity for discretionary judgment. These limitations hinder their ability to adapt to new data in real-time and accurately assess risks, which diminishes their practical value for investors. This paper presents FinRobot, the first AI agent framework specifically designed for equity research. FinRobot employs a multi-agent Chain of Thought (CoT) system, integrating both quantitative and qualitative analyses to emulate the comprehensive reasoning of a human analyst. The system is structured around three specialized agents: the Data-CoT Agent, which aggregates diverse data sources for robust financial integration; the Concept-CoT Agent, which mimics an analysts reasoning to generate actionable insights; and the Thesis-CoT Agent, which synthesizes these insights into a coherent investment thesis and report. FinRobot provides thorough company analysis supported by precise numerical data, industry-appropriate valuation metrics, and realistic risk assessments. Its dynamically updatable data pipeline ensures that research remains timely and relevant, adapting seamlessly to new financial information. Unlike existing automated research tools, such as CapitalCube and Wright Reports, FinRobot delivers insights comparable to those produced by major brokerage firms and fundamental research vendors. We open-source FinRobot at \url{https://github. com/AI4Finance-Foundation/FinRobot}.
Date:	2024–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2411.08804

Nowcasting inflation using prices from the web

By:	Mirko Ðukic, Iva Krsmanovic, Miodrag Petkovic; Mirko Ðukic (National Bank of Serbia); Iva Krsmanovic (National Bank of Serbia); Miodrag Petkovic (National Bank of Serbia)
Abstract:	The paper presents the methodology which the National Bank of Serbia uses to nowcast inflation in real time, based on prices from the web, downloaded automatically using web scraping. A specific feature of the method used by the National Bank of Serbia is that it is based not only on prices for online shopping, but on every relevant data on the prices, including those displayed on the web merely informatively. The intention of the NBS was to cover as many items in the CPI as possible (around 90% at the time of writing this paper), in an endeavour to acquire a more reliable nowcast of the inflation central tendency. In the first year of applying this method, nowcasting performance has been encouraging – on average, inflation nowcasts were at the level of the official figures (nowcasts are not biased), the mean forecasting absolute error was 0.20 pp, and the median was 0.13 pp, which is not significant given that the observed period was characterized by relatively high and volatile inflation.
Keywords:	inflation forecasting, web prices, web scraping, big data
JEL:	C53 E17 E58
Date:	2023–03
URL:	https://d.repec.org/n?u=RePEc:nsb:bilten:16

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.