nep-big New Economics Papers
on Big Data
Issue of 2024‒02‒12
seventeen papers chosen by
Tom Coupé, University of Canterbury


  1. Do Machine Learning Approaches Have the Same Accuracy in Forecasting Cryptocurrencies Volatilities? By Brahmana, Rayenda Khresna
  2. CRISIS ALERT:Forecasting Stock Market Crisis Events Using Machine Learning Methods By Yue Chen; Xingyi Andrew; Salintip Supasanya
  3. Financial Time-Series Forecasting: Towards Synergizing Performance And Interpretability Within a Hybrid Machine Learning Approach By Shun Liu; Kexin Wu; Chufeng Jiang; Bin Huang; Danqing Ma
  4. Machine Learning Developments as Stimuli for Organizational Learning By Vetter, Oliver A.; Sturm, Timo; Fecho, Mariska; Buxmann, Peter
  5. A deep implicit-explicit minimizing movement method for option pricing in jump-diffusion models By Emmanuil H. Georgoulis; Antonis Papapantoleon; Costas Smaragdakis
  6. Forecast model of the price of a product with a cold start By Drin, Svitlana
  7. Model Averaging and Double Machine Learning By Achim Ahrens; Christian B. Hansen; Mark E. Schaffer; Thomas Wiemann
  8. "Discovering the Significance of Sports Footwear Brands through Text Analysis " By Sara Slamić Tarade
  9. Multi-relational Graph Diffusion Neural Network with Parallel Retention for Stock Trends Classification By Zinuo You; Pengju Zhang; Jin Zheng; John Cartlidge
  10. SpotV2Net: Multivariate Intraday Spot Volatility Forecasting via Vol-of-Vol-Informed Graph Attention Networks By Alessio Brini; Giacomo Toscano
  11. A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business Clusters By Haiyan Hao; Yan Wang
  12. Designing Heterogeneous LLM Agents for Financial Sentiment Analysis By Frank Xing
  13. Follow The Money: Exploring the Key Factors Influencing Investment in African Startups By Khalil Liouane
  14. The Determinants of the Transit Accessibility Premium By Gal Amedi
  15. The Balance Permutation Test: A Machine Learning Replacement for Balance Tables By Rametta, Jack T.; Fuller, Sam
  16. An adaptive network-based approach for advanced forecasting of cryptocurrency values By Ali Mehrban; Pegah Ahadian
  17. Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps? By Baptiste Lefort; Eric Benhamou; Jean-Jacques Ohana; David Saltiel; Beatrice Guez; Damien Challet

  1. By: Brahmana, Rayenda Khresna
    Abstract: The emergence of cryptocurrencies as digital investments drives scholars to explore their predictive prices. Intriguingly, most research focuses on its price and returns prediction using various models, leaving out the importance of persistent risk for portfolio management. This is not to mention that most research focuses only on Bitcoin, neglecting other altcoins and stablecoins. Therefore, this study comprehensively examines the cryptocurrency investment’s persistent risk from the forecasting point of view. We focus on comparing the best forecasting methods because they are vital for volatility-targeting and risk-parity in portfolio strategy. Four time-series model performances will be compared to select a suitable volatility prediction model: Machine Learning-Based GARCH, Machine Learning-Based SVR-GARCH, Neural Network, and Deep Learning. Using six different cryptocurrencies proxies: Bitcoin, Ethereum, Ripple, USD Coin, Tether, and Binance Coin, we found that ML-Based SVR-GARCH outperformed the peers in volatility forecasting. However, the prediction accuracy differences among all models are not significant. Finally, our paper provides new insights into machine learning methods’ applications in cryptocurrency market volatility prediction, which is helpful for academics, policy-makers, and investors in forming portfolio strategies.
    Keywords: Volatility Forecasting; Cryptocurrencies; Bitcoin; SVR-GARCH; Neural Network; Deep Learning
    JEL: C53 G17 G32
    Date: 2022–12–01
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:119598&r=big
  2. By: Yue Chen; Xingyi Andrew; Salintip Supasanya
    Abstract: Historically, the economic recession often came abruptly and disastrously. For instance, during the 2008 financial crisis, the SP 500 fell 46 percent from October 2007 to March 2009. If we could detect the signals of the crisis earlier, we could have taken preventive measures. Therefore, driven by such motivation, we use advanced machine learning techniques, including Random Forest and Extreme Gradient Boosting, to predict any potential market crashes mainly in the US market. Also, we would like to compare the performance of these methods and examine which model is better for forecasting US stock market crashes. We apply our models on the daily financial market data, which tend to be more responsive with higher reporting frequencies. We consider 75 explanatory variables, including general US stock market indexes, SP 500 sector indexes, as well as market indicators that can be used for the purpose of crisis prediction. Finally, we conclude, with selected classification metrics, that the Extreme Gradient Boosting method performs the best in predicting US stock market crisis events.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.06172&r=big
  3. By: Shun Liu; Kexin Wu; Chufeng Jiang; Bin Huang; Danqing Ma
    Abstract: In the realm of cryptocurrency, the prediction of Bitcoin prices has garnered substantial attention due to its potential impact on financial markets and investment strategies. This paper propose a comparative study on hybrid machine learning algorithms and leverage on enhancing model interpretability. Specifically, linear regression(OLS, LASSO), long-short term memory(LSTM), decision tree regressors are introduced. Through the grounded experiments, we observe linear regressor achieves the best performance among candidate models. For the interpretability, we carry out a systematic overview on the preprocessing techniques of time-series statistics, including decomposition, auto-correlational function, exponential triple forecasting, which aim to excavate latent relations and complex patterns appeared in the financial time-series forecasting. We believe this work may derive more attention and inspire more researches in the realm of time-series analysis and its realistic applications.
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.00534&r=big
  4. By: Vetter, Oliver A.; Sturm, Timo; Fecho, Mariska; Buxmann, Peter
    Date: 2023–12
    URL: http://d.repec.org/n?u=RePEc:dar:wpaper:142034&r=big
  5. By: Emmanuil H. Georgoulis; Antonis Papapantoleon; Costas Smaragdakis
    Abstract: We develop a novel deep learning approach for pricing European basket options written on assets that follow jump-diffusion dynamics. The option pricing problem is formulated as a partial integro-differential equation, which is approximated via a new implicit-explicit minimizing movement time-stepping approach, involving approximation by deep, residual-type Artificial Neural Networks (ANNs) for each time step. The integral operator is discretized via two different approaches: a) a sparse-grid Gauss--Hermite approximation following localised coordinate axes arising from singular value decompositions, and b) an ANN-based high-dimensional special-purpose quadrature rule. Crucially, the proposed ANN is constructed to ensure the asymptotic behavior of the solution for large values of the underlyings and also leads to consistent outputs with respect to a priori known qualitative properties of the solution. The performance and robustness with respect to the dimension of the methods are assessed in a series of numerical experiments involving the Merton jump-diffusion model.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.06740&r=big
  6. By: Drin, Svitlana (Örebro University School of Business)
    Abstract: This article presents a comprehensive study on developing a predictive product pricing model using LightGBM, a machine learning method optimized for regression challenges in situations with limited historical data. It begins by detailing the core principles of LightGBM, including decision trees, boosting, and gradient descent, and then delves into the method’s unique features like Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). The model’s efficacy is demonstrated through a comparative analysis with XGBoost, highlighting Light- GBM’s enhanced efficiency and slight improvement in prediction accuracy. This research offers valuable insights into the application of LightGBM in developing fast and accurate product pricing models, crucial for businesses in the rapidly evolving data landscape.
    Keywords: GBM; GBDT; LightGBM; GOSS; EFB; predictive model
    JEL: E37
    Date: 2024–01–17
    URL: http://d.repec.org/n?u=RePEc:hhs:oruesi:2024_002&r=big
  7. By: Achim Ahrens; Christian B. Hansen; Mark E. Schaffer; Thomas Wiemann
    Abstract: This paper discusses pairing double/debiased machine learning (DDML) with stacking, a model averaging method for combining multiple candidate learners, to estimate structural parameters. We introduce two new stacking approaches for DDML: short-stacking exploits the cross-fitting step of DDML to substantially reduce the computational burden and pooled stacking enforces common stacking weights over cross-fitting folds. Using calibrated simulation studies and two applications estimating gender gaps in citations and wages, we show that DDML with stacking is more robust to partially unknown functional forms than common alternative approaches based on single pre-selected learners. We provide Stata and R software implementing our proposals.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.01645&r=big
  8. By: Sara Slamić Tarade (Zagreb University of Applied Sciences, Vrbik 8, 10000, Zagreb, Croatia Author-2-Name: Dijana Vuković Author-2-Workplace-Name: University of North, Jurja Križanića 31b, 42000, Varaždin, Croatia Author-3-Name: Author-3-Workplace-Name: Author-4-Name: Author-4-Workplace-Name: Author-5-Name: Author-5-Workplace-Name: Author-6-Name: Author-6-Workplace-Name: Author-7-Name: Author-7-Workplace-Name: Author-8-Name: Author-8-Workplace-Name:)
    Abstract: "Objective - This paper focuses on analyzing the significance of sports footwear brands by processing large text data from the Internet. In a modern environment, the brand distinguishes a company's products or services from those of its competitors. A strong brand can help build trust with customers as they perceive the brand as reliable and trustworthy. Methodology/Technique - The study uses NLP (Natural Language Processing) methods to analyze rich text content on the Internet. The research focus is based on the application of innovative methods to determine the importance and value of a brand using NLP techniques by analyzing the content of a large corpus of text originating from websites dealing with sports footwear brands. The NLP analysis models were programmed using the low-code analysis tool KNIME. Findings - The analysis is carried out for the most well-known sports footwear brands such as Nike, Adidas, Puma, Under Armour, Reebok and Asics. The research object refers to the analysis of brand significance and the evaluation of consumer opinions on sports footwear, based on the processing of large text data from the internet. Novelty - The research results are based on an innovative approach to measuring and evaluating the brand significance in sports footwear using NLP methods to analyze large text content from the Internet. The results obtained show that this new approach to metrics and evaluation can significantly improve existing methods of brand evaluation. Type of Paper - Empirical"
    Keywords: Brand, NLP Method, Text Analysis, Online Brand Management Strategies, Sports Footwear
    JEL: M39
    Date: 2023–12–31
    URL: http://d.repec.org/n?u=RePEc:gtr:gatrjs:jmmr326&r=big
  9. By: Zinuo You; Pengju Zhang; Jin Zheng; John Cartlidge
    Abstract: Stock trend classification remains a fundamental yet challenging task, owing to the intricate time-evolving dynamics between and within stocks. To tackle these two challenges, we propose a graph-based representation learning approach aimed at predicting the future movements of multiple stocks. Initially, we model the complex time-varying relationships between stocks by generating dynamic multi-relational stock graphs. This is achieved through a novel edge generation algorithm that leverages information entropy and signal energy to quantify the intensity and directionality of inter-stock relations on each trading day. Then, we further refine these initial graphs through a stochastic multi-relational diffusion process, adaptively learning task-optimal edges. Subsequently, we implement a decoupled representation learning scheme with parallel retention to obtain the final graph representation. This strategy better captures the unique temporal features within individual stocks while also capturing the overall structure of the stock graph. Comprehensive experiments conducted on real-world datasets from two US markets (NASDAQ and NYSE) and one Chinese market (Shanghai Stock Exchange: SSE) validate the effectiveness of our method. Our approach consistently outperforms state-of-the-art baselines in forecasting next trading day stock trends across three test periods spanning seven years. Datasets and code have been released (https://github.com/pixelhero98/MGDPR).
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.05430&r=big
  10. By: Alessio Brini; Giacomo Toscano
    Abstract: This paper introduces SpotV2Net, a multivariate intraday spot volatility forecasting model based on a Graph Attention Network architecture. SpotV2Net represents financial assets as nodes within a graph and includes non-parametric high-frequency Fourier estimates of the spot volatility and co-volatility as node features. Further, it incorporates Fourier estimates of the spot volatility of volatility and co-volatility of volatility as features for node edges. We test the forecasting accuracy of SpotV2Net in an extensive empirical exercise, conducted with high-frequency prices of the components of the Dow Jones Industrial Average index. The results we obtain suggest that SpotV2Net shows improved accuracy, compared to alternative econometric and machine-learning-based models. Further, our results show that SpotV2Net maintains accuracy when performing intraday multi-step forecasts. To interpret the forecasts produced by SpotV2Net, we employ GNNExplainer, a model-agnostic interpretability tool and thereby uncover subgraphs that are critical to a node's predictions.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.06249&r=big
  11. By: Haiyan Hao; Yan Wang
    Abstract: Existing Spatial Interaction Models (SIMs) are limited in capturing the complex and context-aware interactions between business clusters and trade areas. To address the limitation, we propose a SIM-GAT model to predict spatiotemporal visitation flows between community business clusters and their trade areas. The model innovatively represents the integrated system of business clusters, trade areas, and transportation infrastructure within an urban region using a connected graph. Then, a graph-based deep learning model, i.e., Graph AttenTion network (GAT), is used to capture the complexity and interdependencies of business clusters. We developed this model with data collected from the Miami metropolitan area in Florida. We then demonstrated its effectiveness in capturing varying attractiveness of business clusters to different residential neighborhoods and across scenarios with an eXplainable AI approach. We contribute a novel method supplementing conventional SIMs to predict and analyze the dynamics of inter-connected community business clusters. The analysis results can inform data-evidenced and place-specific planning strategies helping community business clusters better accommodate their customers across scenarios, and hence improve the resilience of community businesses.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.04849&r=big
  12. By: Frank Xing
    Abstract: Large language models (LLMs) have drastically changed the possible ways to design intelligent systems, shifting the focuses from massive data acquisition and new modeling training to human alignment and strategical elicitation of the full potential of existing pre-trained models. This paradigm shift, however, is not fully realized in financial sentiment analysis (FSA), due to the discriminative nature of this task and a lack of prescriptive knowledge of how to leverage generative models in such a context. This study investigates the effectiveness of the new paradigm, i.e., using LLMs without fine-tuning for FSA. Rooted in Minsky's theory of mind and emotions, a design framework with heterogeneous LLM agents is proposed. The framework instantiates specialized agents using prior domain knowledge of the types of FSA errors and reasons on the aggregated agent discussions. Comprehensive evaluation on FSA datasets show that the framework yields better accuracies, especially when the discussions are substantial. This study contributes to the design foundations and paves new avenues for LLMs-based FSA. Implications on business and management are also discussed.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.05799&r=big
  13. By: Khalil Liouane
    Abstract: The African continent has witnessed a notable surge in entrepreneurial activity, with the number of startups and investments made in the ecosystem growing significantly in recent years. Against this backdrop, this paper presents an in-depth analysis of the critical key factors influencing funding amounts in African startup deals. A comprehensive analysis of 2, 521 startup investment deals, spanning from January 2019 to March 2023, was conducted using a combination of statistical and several machine learning techniques. The results of this study highlight a significant gender diversity gap, the importance of professional experience, and the impact of founders' academic backgrounds. The study reveals that human capital, a diversified sector approach, and cross-border collaboration strategies are crucial for a robust startup ecosystem. Additionally, we identified the potential positive impact of 'Y combinators' for African startups, the implications of exit strategies on deal amounts, and the heterogeneity as well as the incongruity of investment rounds across the continent. In light of these findings, we propose an assortment of policy recommendations aimed at fostering a propitious milieu for African entrepreneurial ventures, promoting equitable investment distribution, and enhancing cross-border collaboration. By providing a rigorous empirical analysis, this study not only contributes to the existing body of literature but also lays the foundation for future research aimed at promoting investment and catalyzing socio-economic development throughout the African continent.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.05760&r=big
  14. By: Gal Amedi (Bank of Israel)
    Abstract: Accessibility is a key factor in the utility from living in different areas. In urban models, accessibility is theoretically expected to be internalized by the residential market, creating an 'accessibility premium' in areas with better accessibility. Previous case-study literature found significant and largely unexplained variation in the transit accessibility premium in different urban contexts. This paper proposes a new approach to uncovering the determinants of this variation in a unified framework, utilizing a theoretically grounded measure of accessibility, and both causal machine learning and standard econometric methods applied to highly granular nationwide data on rents and the transportation network. I find that high residential density, mixed-use zoning, and a demographic composition better reflecting typical transit users imply a larger transit accessibility premium. This premium is also higher in areas with a low level of services compared to a reasonable reference point, and positive only up to a threshold level of services. There is some evidence that proximity to rail systems implies a premium over and above the expected premium implied by a reduction in travel times alone. The estimated effect is usually modest.
    JEL: R40 R31 R23 R12
    Date: 2023–06
    URL: http://d.repec.org/n?u=RePEc:boi:wpaper:2023.12&r=big
  15. By: Rametta, Jack T.; Fuller, Sam
    Abstract: Balance tests are standard for experiments in numerous fields, with many journals across disciplines recommending or requiring them for publication. This standard persists despite significant evidence of balance tests' inadequacies and the development of better tools for detecting failures of random assignment and covariate imbalance. To date there is still no consensus on how randomization and balance should be checked, and also how these failures and imbalances should be addressed, or if they should be addressed at all. In this article we provide clear guidelines and implement a new statistical test, the "balance permutation test, " designed to detect arbitrarily complex randomization failures. Our approach leverages a combination of permutation inference and the predictive power of machine learning to accomplish this task. Additionally, we advocate reporting both simple unadjusted and "doubly robust" treatment effect estimates in all experimental contexts, but particularly in situations where failures are detected. To justify our recommendations and the use of our method, we report the results of two sets of applications. First, we show how the balance permutation test is able to detect complex imbalance in real, simulated, and even fabricated data. Second, using an extensive set of Monte Carlo simulations, we demonstrate the overwhelming advantages of doubly robust treatment effect estimation over existing methods. Finally, we introduce an efficient, easy-to-use R package, MLbalance, that implements the balance permutation test approach. Our hope is that this method helps resolve the longstanding debate over how to detect and adjust for assignment failures in experiments.
    Date: 2024–01–12
    URL: http://d.repec.org/n?u=RePEc:osf:osfxxx:xcwt9&r=big
  16. By: Ali Mehrban; Pegah Ahadian
    Abstract: This paper describes an architecture for predicting the price of cryptocurrencies for the next seven days using the Adaptive Network Based Fuzzy Inference System (ANFIS). Historical data of cryptocurrencies and indexes that are considered are Bitcoin (BTC), Ethereum (ETH), Bitcoin Dominance (BTC.D), and Ethereum Dominance (ETH.D) in a daily timeframe. The methods used to teach the data are hybrid and backpropagation algorithms, as well as grid partition, subtractive clustering, and Fuzzy C-means clustering (FCM) algorithms, which are used in data clustering. The architectural performance designed in this paper has been compared with different inputs and neural network models in terms of statistical evaluation criteria. Finally, the proposed method can predict the price of digital currencies in a short time.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.05441&r=big
  17. By: Baptiste Lefort; Eric Benhamou; Jean-Jacques Ohana; David Saltiel; Beatrice Guez; Damien Challet
    Abstract: We used a dataset of daily Bloomberg Financial Market Summaries from 2010 to 2023, reposted on large financial media, to determine how global news headlines may affect stock market movements using ChatGPT and a two-stage prompt approach. We document a statistically significant positive correlation between the sentiment score and future equity market returns over short to medium term, which reverts to a negative correlation over longer horizons. Validation of this correlation pattern across multiple equity markets indicates its robustness across equity regions and resilience to non-linearity, evidenced by comparison of Pearson and Spearman correlations. Finally, we provide an estimate of the optimal horizon that strikes a balance between reactivity to new information and correlation.
    Date: 2024–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2401.05447&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.