nep-big 2024-09-30 papers

on Big Data

Issue of 2024‒09‒30
thirteen papers chosen by
Tom Coupé, University of Canterbury

Combining supervised and unsupervised learning methods to predict financial market movements By Gabriel Rodrigues Palma; Mariusz Skocze\'n; Phil Maguire
A General Framework for Optimizing and Learning Nash Equilibrium By Di Zhang; Wei Gu; Qing Jin
Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions By Jonathan Fuhr; Dominik Papies
Vaccination uptake, happiness and emotions: using a supervised machine learning approach. By Greyling, Talita; Rossouw, Stephanié
Reducing Racial and Ethnic Bias in AI Models: A Comparative Analysis of ChatGPT and Google Bard By Tavishi Choudhary
Who Rallies Round the Flag? The Impact of the US Sanctions on Iranians’ Attitude Toward the Government By RezaeeDaryakenari, Babak
Determinants of Recent CRE Distress: Implications for the Banking Sector By David P. Glancy; Robert J. Kurtzman
Utilização de IA para previsão de desastres naturais em Portugal By Pedro Gomes; Tiago Cardoso
Sentiment Analysis of State Bank of Pakistan's Monetary Policy Documents and its Impact on Stock Market By Aabid Karim; Heman Das Lohano
Beach morphodynamic response to hurricane events using new metamodeling approach. Case study in Martinique By Nico Valentini; Yann Balouin; Clément Bouvier; Jérémy Rohmer
Global Public Sentiment on Decentralized Finance: A Spatiotemporal Analysis of Geo-tagged Tweets from 150 Countries By Yuqi Chen; Yifan Li; Kyrie Zhixuan Zhou; Xiaokang Fu; Lingbo Liu; Shuming Bao; Daniel Sui; Luyao Zhang
Large Language Model Agent in Financial Trading: A Survey By Han Ding; Yinheng Li; Junhao Wang; Hang Chen
Media tone: The role of news and social media on heterogeneous inflation expectations By Heikkinen, Joni; Heimonen, Kari

Combining supervised and unsupervised learning methods to predict financial market movements

By:	Gabriel Rodrigues Palma; Mariusz Skocze\'n; Phil Maguire
Abstract:	The decisions traders make to buy or sell an asset depend on various analyses, with expertise required to identify patterns that can be exploited for profit. In this paper we identify novel features extracted from emergent and well-established financial markets using linear models and Gaussian Mixture Models (GMM) with the aim of finding profitable opportunities. We used approximately six months of data consisting of minute candles from the Bitcoin, Pepecoin, and Nasdaq markets to derive and compare the proposed novel features with commonly used ones. These features were extracted based on the previous 59 minutes for each market and used to identify predictions for the hour ahead. We explored the performance of various machine learning strategies, such as Random Forests (RF) and K-Nearest Neighbours (KNN) to classify market movements. A naive random approach to selecting trading decisions was used as a benchmark, with outcomes assumed to be equally likely. We used a temporal cross-validation approach using test sets of 40%, 30% and 20% of total hours to evaluate the learning algorithms' performances. Our results showed that filtering the time series facilitates algorithms' generalisation. The GMM filtering approach revealed that the KNN and RF algorithms produced higher average returns than the random algorithm.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2409.03762

A General Framework for Optimizing and Learning Nash Equilibrium

By:	Di Zhang; Wei Gu; Qing Jin
Abstract:	One key in real-life Nash equilibrium applications is to calibrate players' cost functions. To leverage the approximation ability of neural networks, we proposed a general framework for optimizing and learning Nash equilibrium using neural networks to estimate players' cost functions. Depending on the availability of data, we propose two approaches (a) the two-stage approach: we need the data pair of players' strategy and relevant function value to first learn the players' cost functions by monotonic neural networks or graph neural networks, and then solve the Nash equilibrium with the learned neural networks; (b) the joint approach: we use the data of partial true observation of the equilibrium and contextual information (e.g., weather) to optimize and learn Nash equilibrium simultaneously. The problem is formulated as an optimization problem with equilibrium constraints and solved using a modified Backpropagation Algorithm. The proposed methods are validated in numerical experiments.
Date:	2024–08
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.16260

Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions

By:	Jonathan Fuhr; Dominik Papies
Abstract:	Estimating causal effect using machine learning (ML) algorithms can help to relax functional form assumptions if used within appropriate frameworks. However, most of these frameworks assume settings with cross-sectional data, whereas researchers often have access to panel data, which in traditional methods helps to deal with unobserved heterogeneity between units. In this paper, we explore how we can adapt double/debiased machine learning (DML) (Chernozhukov et al., 2018) for panel data in the presence of unobserved heterogeneity. This adaptation is challenging because DML's cross-fitting procedure assumes independent data and the unobserved heterogeneity is not necessarily additively separable in settings with nonlinear observed confounding. We assess the performance of several intuitively appealing estimators in a variety of simulations. While we find violations of the cross-fitting assumptions to be largely inconsequential for the accuracy of the effect estimates, many of the considered methods fail to adequately account for the presence of unobserved heterogeneity. However, we find that using predictive models based on the correlated random effects approach (Mundlak, 1978) within DML leads to accurate coefficient estimates across settings, given a sample size that is large relative to the number of observed confounders. We also show that the influence of the unobserved heterogeneity on the observed confounders plays a significant role for the performance of most alternative methods.
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2409.01266

Vaccination uptake, happiness and emotions: using a supervised machine learning approach.

By:	Greyling, Talita; Rossouw, Stephanié
Abstract:	The COVID-19 pandemic is an example of an immense global failure to curb the spread of a pathogen and save lives. To indirectly protect people against a deadly virus, a population needs to achieve herd immunity, which is attained either through vaccination or prior infection. However, achieving herd immunity by vaccination is preferable as it limits the health risks of disease. As the coronavirus mutated, vaccination estimates for achieving herd immunity went from 70% to 90%. In this study, we investigate the order of the importance of the variables to identify those factors that contribute most to achieving high vaccination rates. Secondly, we consider if subjective measures, including the level of happiness and different collective emotions of populations, contribute to higher vaccine uptake. We employ an XGBoost machine learning model (and, as robustness tests, Random Forest and Decision Tree models) to train our data. Our target output variable is the number of people vaccinated as a percentage of the population. We consider two thresholds of our output variable, the first at 70% of a country's population, corresponding to the initial suggestions to achieve herd immunity, and the second with a threshold of 90%, suggested later due to the highly infectious virus. We use a dataset that includes ten countries in the Northern and Southern Hemisphere and variables related to COVID-19, vaccines, country characteristics and the level of happiness and collective emotions within countries. The most important variables listed in reaching the 70% and 90% thresholds are similar. These include the implemented vaccination policy, international travel controls, the percentage of the population in rural areas, the average temperature, and the happiness levels within countries. It is remarkable how the importance of subjective measures of people's emotions and moods play a role in attaining higher vaccination levels. As the vaccine threshold increases, the importance of subjective well-being variables rises. Therefore, not only the implemented policies and country characteristics but also the happiness levels and emotions play a role in compliance and achieving higher vaccination thresholds. Our results provide actionable policy insights to increase vaccination rates. Additionally, we highlight the importance of subjective measures such as happiness and collective emotions to increase vaccination rates and assist governments to be better prepared for the next global pandemic.
Keywords:	COVID-19, vaccine, happiness, emotions, supervised machine learning
JEL:	C55 I10 I31 H12 N40
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:zbw:glodps:1482

Reducing Racial and Ethnic Bias in AI Models: A Comparative Analysis of ChatGPT and Google Bard

By:	Tavishi Choudhary (Greenwich, Connecticut, United States of America)
Abstract:	53% of adults in the US acknowledge racial bias as a significant issue, 23% of Asian adults experience cultural and ethnic bias, and more than 60% conceal their cultural heritage after racial abuse (Ruiz 2023). AI models like ChatGPT and Google Bard, trained on historically biased data, inadvertently amplify racial and ethnic bias and stereotypes. This paper addresses the issue of racial bias in AI models using scientific, evidence-based analysis and auditing processes to identify biased responses from AI models and develop a mitigation tool. The methodology involves creating a comprehensive database of racially biased questions, terms, and phrases from thousands of legal cases, Wikipedia, and surveys, and then testing them on AI Models and analyzing the responses through sentiment analysis and human evaluation, and eventually creation of an 'AI-BiasAudit, ' tool having a racial-ethnic database for social science researchers and AI developers to identify and prevent racial bias in AI models.
Keywords:	data bias, digital law, diversity, ethical artificial intelligence, ethnic bias, inequality, racial bias, sentiment analysis
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:smo:raiswp:0400

Who Rallies Round the Flag? The Impact of the US Sanctions on Iranians’ Attitude Toward the Government

By:	RezaeeDaryakenari, Babak (Leiden University)
Abstract:	While politicians often argue that economic sanctions can induce policy changes in targeted states by undermining elite and public support for the reigning government, the efficacy of these measures, particularly against non-democratic regimes, is debatable. We propose that, counterintuitively, economic sanctions can bolster rather than diminish support for the sanctioned government, even in non-democratic contexts. However, this support shift and its magnitude can differ across various political factions and depend on the nature of the sanctions. To empirically evaluate our theoretical expectations, we use supervised machine learning to scrutinize nearly 2 million tweets from over 1, 000 Iranian influencers, assessing their responses to both comprehensive and targeted sanctions during Donald Trump’s presidency. Our analysis shows that comprehensive sanctions generally improved sentiments toward the Iranian government, even among its moderate oppositions, rendering them more aligned with the state's stance. Conversely, while targeted sanctions elicited a milder rally-around-the-flag response, the identity of the targeted entity plays a crucial role in determining the scale of this reaction.
Date:	2024–08–29
URL:	https://d.repec.org/n?u=RePEc:osf:socarx:r7ae4

Determinants of Recent CRE Distress: Implications for the Banking Sector

By:	David P. Glancy; Robert J. Kurtzman
Abstract:	Rising interest rates and structural shifts in the demand for space have strained CRE markets and prompted concern about contagion to the largest CRE debt holder: banks. We use confidential loan-level data on bank CRE portfolios to examine banks' exposure to at-risk CRE loans. We investigate (1) what loan characteristics are associated with delinquency and (2) to what extent the portfolio composition of major CRE lenders determines their exposure to losses. Higher LTVs, larger property sizes, and greater local remote work tendencies are all associated with increased delinquency risk, particularly for office loans. We use several machine learning algorithms to demonstrate that variation in exposure to these risk factors can account for most of the performance disparity across different types of CRE lenders. The headline result is that small banks' comparatively modest delinquency rates mostly reflect observable portfolio characteristics---predominantly their low holdings of large-sized office loans---rather than unobserved factors like extension or modification tendencies.
Keywords:	Commercial real estate; Banks; CMBS
JEL:	G21 G23 R33
Date:	2024–08–29
URL:	https://d.repec.org/n?u=RePEc:fip:fedgfe:2024-72

Utilização de IA para previsão de desastres naturais em Portugal

By:	Pedro Gomes (Secretaria-Geral do Ambiente); Tiago Cardoso (Secretaria-Geral do Ambiente)
Abstract:	Analisar o papel da Inteligência Artificial (IA) na promoção da sustentabilidade e impacto nas alterações climáticas, destacando a sua importância na prevenção e mitigação de desastres naturais, bem como na tomada de decisões sustentáveis é o core deste artigo. Tendo por foco central a utilização de algoritmos de machine learning para previsão de terramotos, que analisam dados sísmicos para antecipar eventos, permitindo evacuações e preparações para a resposta. Outro aspeto abordado será a utilização de sensores e redes neurais na deteção antecipada de furacões, possibilitando alertas precisos e medidas preventivas, tendo em linha de conta que a IA poderá auxiliar na previsão de tsunamis, analisando atividades sísmicas e níveis oceânicos para uma resposta eficiente. Os algoritmos de IA são ainda fundamentais na previsão de enchentes e deslizamentos de terra, com a identificação de áreas de risco, permitindo assim uma ação preventiva. Na deteção e prevenção de incêndios florestais, a IA é essencial, com o recurso aos dados de satélites, drones e sensores para identificar e monitorizar áreas propensas a incêndios, otimizando uma rápida resposta dos bombeiros. A implementação de sistemas de alerta precoce baseados em IA torna-se fundamental para uma rápida resposta a desastres naturais, emitindo alertas imediatos perante riscos iminentes. Além dos desastres naturais, a IA influencia a sustentabilidade em geral, utilizando machine learning na análise de dados geoespaciais para prevenção de desastres e gestão eficiente de recursos naturais. O artigo avalia ainda o potencial de produção tecnológica mais acelerada e facilitada, utilizando assistentes virtuais como o OpenAI e o Cody em combinação com plataformas de desenvolvimento integrado como o Streamlit, Jupyter Notebook para produção de aplicações numa lógica de low code, dados abertos e utilização de informação oficial como um contributo para a produção de ferramentas de alerta e previsão de desastres naturais em Portugal.
Keywords:	Desastres Naturais, Inteligência Artificial, Estratégia Nacional de IA, Alterações Climáticas, Machine Learning
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:mde:wpaper:185

Sentiment Analysis of State Bank of Pakistan's Monetary Policy Documents and its Impact on Stock Market

By:	Aabid Karim; Heman Das Lohano
Abstract:	This research examines whether sentiments conveyed in the State Bank of Pakistan's (SBP) communications impact financial market expectations and can act as a monetary policy tool. To achieve our goal, we first use sentiment analysis techniques to quantify the tone of SBP monetary policy documents and second, we use short time window, high frequency methodology to approximate the impact of tone on stock market returns. Our results show that positive (negative) change in the tone positively (negatively) impacts stock returns in Karachi Stock Exchange. Further extension shows that the communication of SBP still has a statistically significant impact on stock returns when controlling for different variables and monetary policy tool. Also, the communication of SBP does not have a long term constant effect on stock market.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.03328

Beach morphodynamic response to hurricane events using new metamodeling approach. Case study in Martinique

By:	Nico Valentini (BRGM - Bureau de Recherches Géologiques et Minières); Yann Balouin (BRGM - Bureau de Recherches Géologiques et Minières, UM - Université de Montpellier); Clément Bouvier (BRGM - Bureau de Recherches Géologiques et Minières); Jérémy Rohmer (BRGM - Bureau de Recherches Géologiques et Minières)
Abstract:	The vulnerability of small islands in the Lesser Antilles to coastal erosion and submersion is particularly pronounced, due to their occasional exposure to tropical storms and hurricanes. These meteorological events unleash energetic waves and increase water levels, with substantial modifications of the coastline playing a crucial role in the long-term evolution of beaches. In the context of Martinique, the complex links between extreme weather events and coastal erosion remain poorly documented while shoreline retreat projections in this region critically rely on the development of a comprehensive quantitative understanding of the sediment budget. This includes an acute consideration of short-term sedimentary contribution associated with extreme events as well. To better understand and anticipate the impacts of extreme events on the shoreline, we build a deep neural network model fed by 600 numerical simulations from an offline-coupling 2D morphodynamic model at one beach in Martinique. The use of this innovative approach marks a step towards improving our ability to predict and understand the ramifications of severe weather events on the fragile coastal landscapes of Martinique and Caribbean region in general.
Keywords:	Coastal erosion, Tropical storms, Hurricanes, Martinique, Caribbean, Xbeach, Deep Neural Network, Surrogates, Beach Morphodynamics
Date:	2024–06–25
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-04574236

Global Public Sentiment on Decentralized Finance: A Spatiotemporal Analysis of Geo-tagged Tweets from 150 Countries

By:	Yuqi Chen; Yifan Li; Kyrie Zhixuan Zhou; Xiaokang Fu; Lingbo Liu; Shuming Bao; Daniel Sui; Luyao Zhang
Abstract:	In the digital era, blockchain technology, cryptocurrencies, and non-fungible tokens (NFTs) have transformed financial and decentralized systems. However, existing research often neglects the spatiotemporal variations in public sentiment toward these technologies, limiting macro-level insights into their global impact. This study leverages Twitter data to explore public attention and sentiment across 150 countries, analyzing over 150 million geotagged tweets from 2012 to 2022. Sentiment scores were derived using a BERT-based multilingual sentiment model trained on 7.4 billion tweets. The analysis integrates global cryptocurrency regulations and economic indicators from the World Development Indicators database. Results reveal significant global sentiment variations influenced by economic factors, with more developed nations engaging more in discussions, while less developed countries show higher sentiment levels. Geographically weighted regression indicates that GDP-tweet engagement correlation intensifies following Bitcoin price surges. Topic modeling shows that countries within similar economic clusters share discussion trends, while different clusters focus on distinct topics. This study highlights global disparities in sentiment toward decentralized finance, shaped by economic and regional factors, with implications for poverty alleviation, cryptocurrency crime, and sustainable development. The dataset and code are publicly available on GitHub.
Date:	2024–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2409.00843

Large Language Model Agent in Financial Trading: A Survey

By:	Han Ding; Yinheng Li; Junhao Wang; Hang Chen
Abstract:	Trading is a highly competitive task that requires a combination of strategy, knowledge, and psychological fortitude. With the recent success of large language models(LLMs), it is appealing to apply the emerging intelligence of LLM agents in this competitive arena and understanding if they can outperform professional traders. In this survey, we provide a comprehensive review of the current research on using LLMs as agents in financial trading. We summarize the common architecture used in the agent, the data inputs, and the performance of LLM trading agents in backtesting as well as the challenges presented in these research. This survey aims to provide insights into the current state of LLM-based financial trading agents and outline future research directions in this field.
Date:	2024–07
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2408.06361

Media tone: The role of news and social media on heterogeneous inflation expectations

By:	Heikkinen, Joni; Heimonen, Kari
Abstract:	This study investigates the role of media tone on inflation expectations. Examining the relationships between news and the inflation expectations of various U.S demographic groupings, we find that traditional news influences older cohorts, while social media news align more closely with the expectations of younger and more educated groups. Interestingly, social media correspond more closely than traditional news with the expectations of professional forecasters. Our analysis shows that media influences can persist for longer than a year, highlighting the importance of historical inflation data and the gradual adaptation of new information. Additionally, we find that separate media tones for specific news topics such as "Inflation & Fed" and "Healthcare Costs" resonate differently across demographic groups. These insights highlight the nuanced role of media in shaping inflation expectations across demographic segments.
Keywords:	Inflation Expectations, Household Heterogeneity, Media Tone, Local Projection, Language Model, Forecasting
JEL:	E3 E31 E32 E52 E58
Date:	2024
URL:	https://d.repec.org/n?u=RePEc:zbw:bofrdp:302555

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.