nep-big 2024-01-08 papers

on Big Data

Issue of 2024‒01‒08
25 papers chosen by
Tom Coupé, University of Canterbury

Illuminating Africa? By Tanner Regan; Giorgio Chiovelli; Stelios Michalopoulos; Elias Papaioannou
Machine learning methods for American-style path-dependent contracts By Matteo Gambara; Giulia Livieri; Andrea Pallavicini
Forecasting Cryptocurrency Prices Using Deep Learning: Integrating Financial, Blockchain, and Text Data By Vincent Gurgul; Stefan Lessmann; Wolfgang Karl H\"ardle
From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks By Philippe Goulet Coulombe; Mikael Frenette; Karin Klieber
Generative Machine Learning for Multivariate Equity Returns By Ruslan Tepelyan; Achintya Gopal
From Deep Filtering to Deep Econometrics By Robert Stok; Paul Bilokon
Inheritances and wealth inequality: a machine learning approach By Salas-Rojo, Pedro; Rodríguez, Juan Gabriel
Potential of ChatGPT in predicting stock market trends based on Twitter Sentiment Analysis By Ummara Mumtaz; Summaya Mumtaz
K-Means Clustering algorithms in Urban studies: A Review of Unsupervised Machine Learning techniques By kilani, bochra hadj
Narratives from GPT-derived Networks of News, and a link to Financial Markets Dislocations By Deborah Miori; Constantin Petrov
Uniswap Daily Transaction Indices by Network By Chemaya, Nir; Cong, Lin William; Joergensen, Emma; Liu, Dingyue; Zhang, Luyao
Algorithmic Persuasion Through Simulation: Information Design in the Age of Generative AI By Keegan Harris; Nicole Immorlica; Brendan Lucier; Aleksandrs Slivkins
The Power to Conserve: A Field Experiment on Electricity Use in Qatar By Omar Al-Ubaydli; Alecia W. Cassidy; Anomitro Chatterjee; Ahmed Khalifa; Michael K. Price
The Fundamental Properties, Stability and Predictive Power of Distributional Preferences By Ernst Fehr; Thomas Epper; Julien Senn
An investigation of auctions in the Regional Greenhouse Gas Initiative By Khezr, Peyman; Pourkhanali, Armin
Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China By Jen-Yin Yeh; Hsin-Yu Chiu; Jhih-Huei Huang
Round-Number Effects in Real Estate Prices: Evidence from Germany By Florian Englmaier; Andreas Roider; Lars Schlereth; Steffen Sebastian
Economic Complexity for Regional Industrial Strategies By DIODATO Dario; NAPOLITANO Lorenzo; PUGLIESE Emanuele; TACCHELLA Andrea
On the adaptation of causal forests to manifold data By Yiyi Huo; Yingying Fan; Fang Han
Predictability and (co-)incidence of labor and health shocks By Emile Cammeraat; Brinn Hekkelman; Pim Kastelein; Suzanne Vissers
The Impact of AI and Cross-Border Data Regulation on International Trade in Digital Services: A Large Language Model By Ruiqi Sun; Daniel Trefler
Benchmarking Large Language Model Volatility By Boyang Yu
Words of the RBNZ: Textual analysis of Monetary Policy Statements By Rennae Cherry; Eric Tong
Rental housing market and directed search By Julien Pascal
Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination By Haoqiang Kang; Xiao-Yang Liu

By:	Tanner Regan (George Washington University); Giorgio Chiovelli (Universidad de Montevideo); Stelios Michalopoulos (Brown University); Elias Papaioannou (London Business School)
Abstract:	Satellite images of nighttime lights are commonly used to proxy local economic conditions. Despite their popularity, there are concerns about how accurately they capture local development in low-income settings and different scales. We compile a yearly series of comparable nighttime lights for Africa from 1992 to 2020, considering key factors that affect accuracy and comparability over time: sensor quality, top coding, blooming, and, importantly, variations in satellite systems (DMPS and VIIRS) using an ensemble, machine learning, approach. The harmonized luminosity series outperforms the unadjusted series as a stronger predictor of local development, particularly over time and at higher spatial resolutions.
Keywords:	Night Lights, Economic Development, Measurement, Africa
JEL:	O1 R1 E01 I32
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:gwi:wpaper:2023-11&r=big

Machine learning methods for American-style path-dependent contracts

By:	Matteo Gambara; Giulia Livieri; Andrea Pallavicini
Abstract:	In the present work, we introduce and compare state-of-the-art algorithms, that are now classified under the name of machine learning, to price Asian and look-back products with early-termination features. These include randomized feed-forward neural networks, randomized recurrent neural networks, and a novel method based on signatures of the underlying price process. Additionally, we explore potential applications on callable certificates. Furthermore, we present an innovative approach for calculating sensitivities, specifically Delta and Gamma, leveraging Chebyshev interpolation techniques.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.16762&r=big

Forecasting Cryptocurrency Prices Using Deep Learning: Integrating Financial, Blockchain, and Text Data

By:	Vincent Gurgul; Stefan Lessmann; Wolfgang Karl H\"ardle
Abstract:	This paper explores the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques in cryptocurrency price forecasting, specifically Bitcoin (BTC) and Ethereum (ETH). Focusing on news and social media data, primarily from Twitter and Reddit, we analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods. Alongside conventional price regression, we treat cryptocurrency price forecasting as a classification problem. This includes both the prediction of price movements (up or down) and the identification of local extrema. We compare the performance of various ML models, both with and without NLP data integration. Our findings reveal that incorporating NLP data significantly enhances the forecasting performance of our models. We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment, and that fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data. All of our models consistently generate profit across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.14759&r=big

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks

By:	Philippe Goulet Coulombe; Mikael Frenette; Karin Klieber
Abstract:	We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres. Our architecture features several key ingredients making MLE work in this context. First, the hemispheres share a common core at the entrance of the network which accommodates for various forms of time variation in the error variance. Second, we introduce a volatility emphasis constraint that breaks mean/variance indeterminacy in this class of overparametrized nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb overfitting in both conditional moments. Fourth, the algorithm utilizes standard deep learning software and thus handles large data sets - both computationally and statistically. Ergo, our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must. We evaluate point and density forecasts with an extensive out-of-sample experiment and benchmark against a suite of models ranging from classics to more modern machine learning-based offerings. In all cases, HNN fares well by consistently providing accurate mean/variance forecasts for all targets and horizons. Studying the resulting volatility paths reveals its versatility, while probabilistic forecasting evaluation metrics showcase its enviable reliability. Finally, we also demonstrate how this machinery can be merged with other structured deep learning models by revisiting Goulet Coulombe (2022)'s Neural Phillips Curve.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.16333&r=big

Generative Machine Learning for Multivariate Equity Returns

By:	Ruslan Tepelyan; Achintya Gopal
Abstract:	The use of machine learning to generate synthetic data has grown in popularity with the proliferation of text-to-image models and especially large language models. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods common in finance of fitting statistical models to data. In this work, we explore the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.14735&r=big

From Deep Filtering to Deep Econometrics

By:	Robert Stok; Paul Bilokon
Abstract:	Calculating true volatility is an essential task for option pricing and risk management. However, it is made difficult by market microstructure noise. Particle filtering has been proposed to solve this problem as it favorable statistical properties, but relies on assumptions about underlying market dynamics. Machine learning methods have also been proposed but lack interpretability, and often lag in performance. In this paper we implement the SV-PF-RNN: a hybrid neural network and particle filter architecture. Our SV-PF-RNN is designed specifically with stochastic volatility estimation in mind. We then show that it can improve on the performance of a basic particle filter.
Date:	2023–09
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.06256&r=big

Inheritances and wealth inequality: a machine learning approach

By:	Salas-Rojo, Pedro; Rodríguez, Juan Gabriel
Abstract:	This paper explores the relationship between received inheritances and the distribution of wealth (financial, non-financial and total) in four developed countries: the United States, Canada, Italy and Spain. We follow the inequality of opportunity (IOp) literature and − considering inheritances as the only circumstance− we show that traditional IOp approaches can lead to non-robust and arbitrary measures of IOp depending on discretionary cut-off choices of a continuous circumstance such as inheritances. To overcome this limitation, we apply Machine Learning methods (‘random forest’ algorithm) to optimize the choice of cut-offs and we find that IOp explains over 60% of wealth inequality in the US and Spain (using the Gini coefficient), and more than 40% in Italy and Canada. Including parental education as an additional circumstance −available for the US and Italy− we find that inheritances are still the main contributor. Finally, using the S-Gini index with different parameters to weight different parts of the distribution, we find that the effect of inheritances is more prominent at the middle of the wealth distribution, while parental education is more important for the asset-poor.
Keywords:	C60; D31; D63; G51; inequality of opportunity; inheritances; machine learning; parental education; wealth inequality
JEL:	J1
Date:	2022–03–10
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:120916&r=big

Potential of ChatGPT in predicting stock market trends based on Twitter Sentiment Analysis

By:	Ummara Mumtaz; Summaya Mumtaz
Abstract:	The rise of ChatGPT has brought a notable shift to the AI sector, with its exceptional conversational skills and deep grasp of language. Recognizing its value across different areas, our study investigates ChatGPT's capacity to predict stock market movements using only social media tweets and sentiment analysis. We aim to see if ChatGPT can tap into the vast sentiment data on platforms like Twitter to offer insightful predictions about stock trends. We focus on determining if a tweet has a positive, negative, or neutral effect on two big tech giants Microsoft and Google's stock value. Our findings highlight a positive link between ChatGPT's evaluations and the following days stock results for both tech companies. This research enriches our view on ChatGPT's adaptability and emphasizes the growing importance of AI in shaping financial market forecasts.
Date:	2023–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.06273&r=big

K-Means Clustering algorithms in Urban studies: A Review of Unsupervised Machine Learning techniques

By:	kilani, bochra hadj
Abstract:	In years there has been an increase, in the interest surrounding the utilization of unsupervised machine learning methods, particularly the application of K means clustering algorithms within urban studies. These techniques have demonstrated their usefulness, in examining and comprehending facets of planning including land usage patterns, transportation systems and population distribution. The objective of this article is to offer an overview of how K means clustering algorithm are employed in urban studies. The review examines the different methodologies and approaches employed in utilizing K-means clustering for urban analysis, highlighting its advantages and limitations. Additionally, the article discusses the specific challenges and considerations that arise when applying K-means clustering in urban studies, including data preprocessing, feature selection, and interpretation of the cluster results. The findings of this review demonstrate the wide range of applications of K-means clustering in urban studies, from identifying distinct land use categories to understanding the spatial distribution of social amenities. Furthermore, it is revealed that the use of K-means clustering in urban studies allows for the identification and characterization of hidden patterns and similarities among urban areas that might not be immediately apparent through traditional analysis methods. Overall, the use of K-means clustering algorithms provides a valuable tool for urban planners and researchers in gaining insights and making informed decisions in urban design.
Date:	2023–11–30
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:bs6wy&r=big

Narratives from GPT-derived Networks of News, and a link to Financial Markets Dislocations

By:	Deborah Miori; Constantin Petrov
Abstract:	Starting from a corpus of economic articles from The Wall Street Journal, we present a novel systematic way to analyse news content that evolves over time. We leverage on state-of-the-art natural language processing techniques (i.e. GPT3.5) to extract the most important entities of each article available, and aggregate co-occurrence of entities in a related graph at the weekly level. Network analysis techniques and fuzzy community detection are tested on the proposed set of graphs, and a framework is introduced that allows systematic but interpretable detection of topics and narratives. In parallel, we propose to consider the sentiment around main entities of an article as a more accurate proxy for the overall sentiment of such piece of text, and describe a case-study to motivate this choice. Finally, we design features that characterise the type and structure of news within each week, and map them to moments of financial markets dislocations. The latter are identified as dates with unusually high volatility across asset classes, and we find quantitative evidence that they relate to instances of high entropy in the high-dimensional space of interconnected news. This result further motivates the pursued efforts to provide a novel framework for the systematic analysis of narratives within news.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.14419&r=big

Uniswap Daily Transaction Indices by Network

By:	Chemaya, Nir; Cong, Lin William; Joergensen, Emma; Liu, Dingyue; Zhang, Luyao
Abstract:	Decentralized Finance (DeFi) is revolutionizing traditional financial services by enabling direct, intermediary-free transactions, thereby generating a substantial volume of open-source transaction data. This evolving DeFi landscape is particularly influenced by the emergence of Layer 2 (L2) solutions, which are poised to enhance network efficiency and scalability significantly, surpassing the existing capabilities of Layer 1 (L1) infrastructures. However, the detailed impact of these L2 solutions has been somewhat obscured due to a dearth of transaction data indices that can provide in-depth economic insights for empirical research. This study seeks to address this critical gap by conducting a comprehensive analysis of raw transactions sourced from Uniswap, a central decentralized exchange (DEX) within the DeFi ecosystem. The dataset encompasses an extensive collection of over 50 million transactions from both L1 and L2 networks. Additionally, we have curated a wide-ranging repository of daily indices derived from transaction trading data across prominent blockchain networks, including Ethereum, Optimism, Arbitrum, and Polygon. These indices shed light on crucial network dynamics, such as adoption trends, evaluations of scalability, decentralization metrics, wealth distribution patterns, and other key aspects of the DeFi landscape. This rich dataset serves as an invaluable tool, enabling researchers to dissect the complex interplay between DeFi and Layer 2 solutions, thus enhancing our collective understanding of this rapidly evolving ecosystem. Its notable contribution to the data science pipeline includes the implementation of a flexible, open-source Python framework, enabling the dynamic calculation of decentralization indices, customizable to specific research requirements. This adaptability makes the dataset particularly suitable for advanced machine learning applications, including deep learning, thereby solidifying its role as a critical asset in shaping Blockchain as the foundational infrastructure for the intelligent Web3 ecosystem.
Date:	2023–12–05
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:ube2z&r=big

Algorithmic Persuasion Through Simulation: Information Design in the Age of Generative AI

By:	Keegan Harris; Nicole Immorlica; Brendan Lucier; Aleksandrs Slivkins
Abstract:	How can an informed sender persuade a receiver, having only limited information about the receiver's beliefs? Motivated by research showing generative AI can simulate economic agents, we initiate the study of information design with an oracle. We assume the sender can learn more about the receiver by querying this oracle, e.g., by simulating the receiver's behavior. Aside from AI motivations such as general-purpose Large Language Models (LLMs) and problem-specific machine learning models, alternate motivations include customer surveys and querying a small pool of live users. Specifically, we study Bayesian Persuasion where the sender has a second-order prior over the receiver's beliefs. After a fixed number of queries to an oracle to refine this prior, the sender commits to an information structure. Upon receiving the message, the receiver takes a payoff-relevant action maximizing her expected utility given her posterior beliefs. We design polynomial-time querying algorithms that optimize the sender's expected utility in this Bayesian Persuasion game. As a technical contribution, we show that queries form partitions of the space of receiver beliefs that can be used to quantify the sender's knowledge.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.18138&r=big

The Power to Conserve: A Field Experiment on Electricity Use in Qatar

By:	Omar Al-Ubaydli; Alecia W. Cassidy; Anomitro Chatterjee; Ahmed Khalifa; Michael K. Price
Abstract:	High resource users often have the strongest response to behavioral interventions promoting conservation. Yet, litlle is known about how to motivate them. We implement a field experiment in Qatar, where residential customers have some of the highest energy use per capita in the world. Our dataset consists of 207, 325 monthly electricity meter readings from a panel of 6, 096 customers. We employ two normative treatments priming identity - a religious message quoting the Qur’an, and a national message reminding households that Qatar prioritizes energy conservation. The treatments reduce electricity use by 3.8% and both messages are equally effective. Using machine learning methods on supplemental survey data, we elucidate how agency, motivation, and responsibility activate conservation responses to our identity primes.
JEL:	C93 D90 Q4
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:31931&r=big

The Fundamental Properties, Stability and Predictive Power of Distributional Preferences

By:	Ernst Fehr (Department of Economics, Zurich University. Blümlisalpstrasse 10, 8006 Zurich, Switzerland); Thomas Epper (IESEG School of Management, Univ. Lille, CNRS, UMR 9221 - LEM - Lille Economie Management, F-59000 Lille, France; iRisk Center on Risk and Uncertainty); Julien Senn (Department of Economics, Zurich University. Blümlisalpstrasse 10, 8006 Zurich, Switzerland)
Abstract:	Parsimony is a desirable feature of economic models but almost all human behaviors are characterized by vast individual variation that appears to defy parsimony. How much parsimony do we need to give up to capture the fundamental aspects of a population’s distributional preferences and to maintain high predictive ability? Using a Bayesian nonparametric clustering method that makes the trade-off between parsimony and descriptive accuracy explicit, we show that three preference types—an inequality averse, an altruistic and a predominantly selfish type—capture the essence of behavioral heterogeneity. These types independently emerge in four different data sets and are strikingly stable over time. They predict out-of-sample behavior equally well as a model that permits all individuals to differ and substantially better than a representative agent model and a state-of-the-art machine learning algorithm. Thus, a parsimonious model with three stable types captures key characteristics of distributional preferences and has excellent predictive power.
Keywords:	Distributional Preferences, Altruism, Inequality Aversion, Preference Heterogeneity, Stability, Out-of-Sample Prediction, Parsimony, Bayesian Nonparametrics.
JEL:	D31 D63 C49 C90
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:ies:wpaper:e202310&r=big

An investigation of auctions in the Regional Greenhouse Gas Initiative

By:	Khezr, Peyman; Pourkhanali, Armin
Abstract:	The Regional Greenhouse Gas Initiative (RGGI), as the largest cap-and-trade system in the United States, employs quarterly auctions to distribute emissions permits to firms. This study examines firm behavior and auction performance from both theoretical and empirical perspectives. We utilize auction theory to offer theoretical insights regarding the optimal bidding behavior of firms participating in these auctions. Subsequently, we analyze data from the past 58 RGGI auctions to assess the relevant parameters, employing panel random effects and machine learning models. Our findings indicate that most significant policy changes within RGGI, such as the Cost Containment Reserve, positively impacted the auction clearing price. Furthermore, we identify critical parameters, including the number of bidders and the extent of their demand in the auction, demonstrating their influence on the auction clearing price. This paper presents valuable policy insights for all cap-and-trade systems that allocate permits through auctions, as we employ data from an established market to substantiate the efficacy of policies and the importance of specific parameters.
Keywords:	Emissions permit, auctions, uniform-price, RGGI
JEL:	C5 D21 Q5
Date:	2023–04–24
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:119289&r=big

Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China

By:	Jen-Yin Yeh; Hsin-Yu Chiu; Jhih-Huei Huang
Abstract:	This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of robust variables that consistently appear in the feature subsets across different selection methods and models, suggesting their reliability and relevance in predicting platform failures. The study highlights that reducing the number of variables in the feature subset leads to an increase in the false acceptance rate while the performance metrics remain stable, with an AUC value of approximately 0.96 and an F1 score of around 0.88. The findings of this research provide significant practical implications for regulatory authorities and investors operating in the Chinese P2P lending industry.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.14577&r=big

Round-Number Effects in Real Estate Prices: Evidence from Germany

By:	Florian Englmaier (LMU Munich); Andreas Roider (Universität Regensburg); Lars Schlereth (Universität Regensburg); Steffen Sebastian (Universität Regensburg)
Abstract:	Round numbers affect behavior in various domains, e.g., as prominent thresholds or focal points in bargaining. In line with earlier findings, residential real estate transactions in Germany cluster at round-number prices, but there are also interesting (presumably cultural) differences. We extend our analysis to the commercial real estate market, where stakes are even higher and market participants arguably more experienced. For the same type of object, professionals cluster significantly less on round-number prices compared to non-professionals. We employ machine learning and show that transactions of family homes and condominiums at round-number prices are 2–7% above their hedonic values.
Keywords:	round-number effects; focal points; residential real estate; commercial real estate; housing prices; machine learning;
JEL:	D01 D91 C78 R31
Date:	2023–11–06
URL:	http://d.repec.org/n?u=RePEc:rco:dpaper:446&r=big

Economic Complexity for Regional Industrial Strategies

By:	DIODATO Dario (European Commission - JRC); NAPOLITANO Lorenzo (European Commission - JRC); PUGLIESE Emanuele; TACCHELLA Andrea
Abstract:	Innovation and industrial policies in the EU is often undertaken at regional level. Policymakers that have to design regional industrial strategy need quantitative tools for guidance. Economic complexity can support policymakers especially during the early phase of policy design: patent and trade data are fed into predictive models to assess the chances of success of a strategy. The methods of economic complexity follow the driving principles of machine learning to predict the probability that a region becomes successful in a given technology or product. We present a series of quantitative tools for regions: (1) relative innovation capabilities; (2) expected diversification by sector; (3) expected diversification by product; (4) fitness of a region for a project.
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc136443&r=big

On the adaptation of causal forests to manifold data

By:	Yiyi Huo; Yingying Fan; Fang Han
Abstract:	Researchers often hold the belief that random forests are "the cure to the world's ills" (Bickel, 2010). But how exactly do they achieve this? Focused on the recently introduced causal forests (Athey and Imbens, 2016; Wager and Athey, 2018), this manuscript aims to contribute to an ongoing research trend towards answering this question, proving that causal forests can adapt to the unknown covariate manifold structure. In particular, our analysis shows that a causal forest estimator can achieve the optimal rate of convergence for estimating the conditional average treatment effect, with the covariate dimension automatically replaced by the manifold dimension. These findings align with analogous observations in the realm of deep learning and resonate with the insights presented in Peter Bickel's 2004 Rietz lecture.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.16486&r=big

Predictability and (co-)incidence of labor and health shocks

By:	Emile Cammeraat (CPB Netherlands Bureau for Economic Policy Analysis); Brinn Hekkelman (CPB Netherlands Bureau for Economic Policy Analysis); Pim Kastelein (CPB Netherlands Bureau for Economic Policy Analysis); Suzanne Vissers (CPB Netherlands Bureau for Economic Policy Analysis)
Abstract:	Setbacks such as dismissal or illness can turn the lives of people upside down. This study shows that such adverse events can be anticipated in advance and that their occurrence is strongly interrelated. These insights suggest that social security policy should consider the fact that vulnerable groups are likely to face multiple difficulties at the same time. Using machine learning techniques and anonymous data on millions of Dutch people, this study maps out the entire probability distribution of a wide range of labor market and health shocks. The degree of inequality in risk exposure across the population is striking. Most people have a low probability of becoming seriously ill or dependent on social benefits, while one percent of people bears up to thirty times more risk compared to the population average. People with a flexible employment contract, low income, little wealth and migration background are overrepresented within this high-risk group.
JEL:	C53 H55 I10 J01 J64
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:cpb:discus:453&r=big

The Impact of AI and Cross-Border Data Regulation on International Trade in Digital Services: A Large Language Model

By:	Ruiqi Sun; Daniel Trefler
Abstract:	The rise of artificial intelligence (AI) and of cross-border restrictions on data flows has created a host of new questions and related policy dilemmas. This paper addresses two questions: How is digital service trade shaped by (1) AI algorithms and (2) by the interplay between AI algorithms and cross-border restrictions on data flows? Answers lie in the palm of your hand: From London to Lagos, mobile app users trigger international transactions when they open AI-powered foreign apps. We have 2015-2020 usage data for the most popular 35, 575 mobile apps and, to quantify the AI deployed in each of these apps, we use a large language model (LLM) to link each app to each of the app developer's AI patents. (This linkage of specific products to specific patents is a methodological innovation.) Armed with data on app usage by country, with AI deployed in each app, and with an instrument for AI (a Heckscher-Ohlin cost-shifter), we answer our two questions. (1) On average, AI causally raises an app's number of foreign users by 2.67 log points or by more than 10-fold. (2) The impact of AI on foreign users is halved if the foreign users are in a country with strong restrictions on cross-border data flows. These countries are usually autocracies. We also provide a new way of measuring AI knowledge spillovers across firms and find large spillovers. Finally, our work suggests numerous ways in which LLMs such as ChatGPT can be used in other applications.
JEL:	F12 F13 F14 F23
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:31925&r=big

Benchmarking Large Language Model Volatility

By:	Boyang Yu
Abstract:	The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.15180&r=big

Words of the RBNZ: Textual analysis of Monetary Policy Statements

By:	Rennae Cherry; Eric Tong (Reserve Bank of New Zealand)
Abstract:	Clear communication helps New Zealanders understand monetary policy and its relationship to them. Communication explains to the public the purpose and rationale behind monetary policy decisions and, when done right, may enhance monetary policy transmission via different channels (RBNZ, 2020; Blinder et al. 2008; Blot and Hubert, 2018). With this motivation, we apply textual analysis to flagship publications of the Reserve Bankâ€”with the aim of assessing Reserve Bank communications and supporting its mandates of maintaining price stability over the medium term and supporting maximum sustainable employment. Key findings: - Textual analysis shows that keywords mentioned in the Monetary Policy Statements (MPS) align with the objectives in the Remit. - The tone of MPS has been neutral and objective, even as the sentiment in the MPS moves in tandem with household and business confidence surveys. - Similar to monetary policy documents published by central banks overseas, the MPS are complex and may not be accessible to the general public. However, readability, which measures the complexity of a text based on sentence length and the number of syllables in words, has remained stable over 1997Q1-2021Q4 and has marginally improved recently. - The Monetary Policy Snapshots, introduced in 2018, are easier to read than the main part of the MPS â€“ they are accessible to a high school graduate rather than a university graduate.
Date:	2023–07
URL:	http://d.repec.org/n?u=RePEc:nzb:nzbans:2023/4&r=big

Rental housing market and directed search

By:	Julien Pascal
Abstract:	This paper introduces new empirical findings concerning the rental housing market in the Paris metropolitan area. Combining a new dataset gathered from online advertisements for Parisian rentals with a hedonic model that incorporates both apartment features and property-specific photographs, two main stylized facts are established. First, with comparable property features, landlords who ask for lower rent attract a greater number of applicants, consistent with predictions from standard directed search models. Second, many landlords employ a two-stage pricing approach, initially advertising a high rent and then reducing it after a "wait-and-see" period. This previously unreported feature is consistent with the slow Dutch auction mechanism studied in the auction literature and observed in the property sales market.
Keywords:	Rental Housing Market; Hedonic Model; Directed Search Models; Landlordsâ€™Pricing Strategies; Machine Learning
JEL:	R31 R21 C21 D83 C45
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:bcl:bclwop:bclwp179&r=big

Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination

By:	Haoqiang Kang; Xiao-Yang Liu
Abstract:	The hallucination issue is recognized as a fundamental deficiency of large language models (LLMs), especially when applied to fields such as finance, education, and law. Despite the growing concerns, there has been a lack of empirical investigation. In this paper, we provide an empirical examination of LLMs' hallucination behaviors in financial tasks. First, we empirically investigate LLM model's ability of explaining financial concepts and terminologies. Second, we assess LLM models' capacity of querying historical stock prices. Third, to alleviate the hallucination issue, we evaluate the efficacy of four practical methods, including few-shot learning, Decoding by Contrasting Layers (DoLa), the Retrieval Augmentation Generation (RAG) method and the prompt-based tool learning method for a function to generate a query command. Finally, our major finding is that off-the-shelf LLMs experience serious hallucination behaviors in financial tasks. Therefore, there is an urgent need to call for research efforts in mitigating LLMs' hallucination.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.15548&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.