nep-big 2023-05-15 papers

on Big Data

Issue of 2023‒05‒15
24 papers chosen by
Tom Coupé
University of Canterbury

Exploring economic activity from outer space: A Python notebook for processing and analyzing satellite nighttime lights By Carlos Mendez; Ayush Patnaik
Teletrabajo en Twitter: Análisis mediante Deep Learning By Gutierrez-Lythgoe, Antonio
Movilidad urbana sostenible: Predicción de demanda con Inteligencia Artificial By Gutierrez-Lythgoe, Antonio
The Determinants of CO2 Emissions in the Context of ESG Models at World Level By Costantiello, Alberto; Leogrande, Angelo
Structured Multifractal Scaling of the Principal Cryptocurrencies: Examination using a Self-Explainable Machine Learning By Foued Sa\^adaoui
Quantitative Trading using Deep Q Learning By Soumyadip Sarkar
Assessing the Credit Risk of Crypto-Assets Using Daily Range Volatility Models By Fantazzini, Dean
The Impact of Research and Development Expenditures on ESG Model in the Global Economy By Costantiello, Alberto; Leogrande, Angelo
GDP nowcasting with artificial neural networks: How much does long-term memory matter? By Krist\'of N\'emeth; D\'aniel Hadh\'azi
Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams By Mahsa Tavakoli; Rohitash Chandra; Fengrui Tian; Cristi\'an Bravo
Parameterized Neural Networks for Finance By Daniel Oeltz; Jan Hamaekers; Kay F. Pilz
Measuring the Temporal Dimension of Text: An Application to Policymaker Speeches By Byrne, David; Goodhead, Robert; McMahon, Michael; Parle, Conor
Artificial neural networks and time series of counts: A class of nonlinear INGARCH models By Malte Jahn
Short-Term Volatility Prediction Using Deep CNNs Trained on Order Flow By Mingyu Hao; Artem Lenskiy
How can Big Data improve the quality of tourism statistics? The Bank of Italyâ€™s experience in compiling the â€œtravelâ€ item in the Balance of Payments. By Costanza Catalano; Andrea Carboni; Claudio Doria
From Euclidean Distance to Spatial Classification: Unraveling the Technology behind GPT Models By Alfredo B. Roisenzvit
Modelling customer lifetime-value in the retail banking industry By Greig Cowan; Salvatore Mercuri; Raad Khraishi
Determinants and Social Dividends of Digital Adoption By David Amaglobeli; Mariano Moszoro; Utkarsh Kumar
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models By Alejandro Lopez-Lira; Yuehua Tang
The Central Bank Crystal Ball: Temporal information in monetary policy communication By Byrne, David; Goodhead, Robert; McMahon, Michael; Parle, Conor
What We Teach about Race and Gender: Representation in Images and Text of Children's Books By Adukia, Anjali; Eble, Alex; Harrison, Emileigh; Runesha, Hakizumwami Birali; Szasz, Teodora
Academic Migration and Academic Networks: Evidence from Scholarly Big Data and the Iron Curtain By Donia Kamel; Laura Pollacci
Leaning Against the Data: Policymaker Communications under State-Based Forward Guidance By Taeyoung Doh; Joseph W. Gruber; Dongho Song
Data, Competition, and Digital Platforms By Dirk Bergemann; Alessandro Bonatti

Exploring economic activity from outer space: A Python notebook for processing and analyzing satellite nighttime lights

By:	Carlos Mendez (Nagoya University); Ayush Patnaik (xKDR Forum)
Abstract:	Nighttime lights (NTL) data are widely recognized as a useful proxy for monitoring national, subnational, and supranational economic activity. These data offer advantages over traditional economic indicators such as GDP, including greater spatial granularity, timeliness, lower cost, and comparability between regions regardless of statistical capacity or political interference. However, despite these benefits, the use of NTL data in regional science has been limited. This is in part due to the lack of accessible methods for processing and analyzing satellite images. To address this issue, this paper presents a user-friendly geocomputational notebook that illustrates how to process and analyze satellite NTL images. First, the notebook introduces a cloud-based Python environment for visualizing, analyzing, and transforming raster satellite images into tabular data. Next, it presents interactive tools to explore the space-time patterns of the tabulated data. Finally, it describes methods for evaluating the usefulness of NTL data in terms of their cross-sectional predictions, time-series predictions, and regional inequality dynamics.
JEL:	Y9
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:anf:wpaper:21&r=big

Teletrabajo en Twitter: Análisis mediante Deep Learning

By:	Gutierrez-Lythgoe, Antonio
Abstract:	In this article we analyse Twitter users’ perceptions on remote working. To do so, we use artificial intelligence techniques of natural language processing. Specifically, we run a Sentiment Analysis and Latent Dirichlet Allocation (LDA) on a sample of 12, 986 tweets related to remote working published in Spanish. Our results show that 21.2% of the tweets present a positive sentiment, 43.5% a negative sentiment and 35.3% a neutral connotation. This article contributes to the application of Machine learning and Deep learning techniques in the study of social sciences.
Keywords:	Artificial Intelligence, Sentiment analysis, Big Data, remote working, telework
JEL:	C88 D83 J22 J23
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:117101&r=big

Movilidad urbana sostenible: Predicción de demanda con Inteligencia Artificial

By:	Gutierrez-Lythgoe, Antonio
Abstract:	The evolution of cities has led to changes in urban mobility patterns, including an increased number of trips, longer and more dispersed routes. Therefore, it is crucial to study urban mobility efficiently to promote sustainability and well-being. In this context, we reviewed the existing literature on the applications of artificial intelligence (AI) in urban mobility research, specifically focusing on Deep Learning techniques such as CNN and LSTM models. These AI tools are being used to address the challenges of urban mobility research and offer new possibilities for tackling the pressing issues faced by cities, such as sustainability in transportation. AI can contribute to improving sustainability by predicting real-time traffic, optimizing transportation efficiency, and informing public policies that promote sustainable modes of transportation. In this study, we propose a Random Forest model for predicting demand for sustainable urban mobility based on machine learning, achieving accurate and consistent predictions. Overall, the application of AI in urban mobility research presents a unique opportunity to advance towards more sustainable, livable cities and resilient societies.
Keywords:	Artificial Intelligence, Urban mobility, Deep Learning, Machine Learning , sustainability
JEL:	C45 C53 Q56 R41 R42
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:117103&r=big

The Determinants of CO2 Emissions in the Context of ESG Models at World Level

By:	Costantiello, Alberto; Leogrande, Angelo
Abstract:	We estimate the determinants of CO2 Emissions-COE in the context of Environmental, Social and Governance-ESG model at world level. We use data of the World Bank for 193 countries in the period 2011-2020. We found that the level of COE is positively associated, among others to “Methane Emissions”, “Research and Development Expenditures”, and negatively associated among others to “Renewable Energy Consumption” and “Mean Drought Index”. Furthermore, we have applied a cluster analysis with the k-Means algorithm optimized with the Elbow Method and we find the presence of four cluster. Finally, we apply eight machine-learning algorithms for the prediction of the future value of COE and we find that the Artificial Neural Network-ANN algorithm is the best predictor. The ANN predicts a reduction in the level of COE equal to 5.69% on average for the analysed countries.
Keywords:	Analysis of Collective Decision-Making, General, Political Processes: Rent-Seeking, Lobbying, Elections, Legislatures, and Voting Behaviour, Bureaucracy, Administrative Processes in Public Organizations, Corruption, Positive Analysis of Policy Formulation, Implementation.
JEL:	D7 D70 D72 D73 D78
Date:	2023–04–20
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:117110&r=big

Structured Multifractal Scaling of the Principal Cryptocurrencies: Examination using a Self-Explainable Machine Learning

By:	Foued Sa\^adaoui
Abstract:	Multifractal analysis is a forecasting technique used to study the scaling regularity properties of financial returns, to analyze the long-term memory and predictability of financial markets. In this paper, we propose a novel structural detrended multifractal fluctuation analysis (S-MF-DFA) to investigate the efficiency of the main cryptocurrencies. The new methodology generalizes the conventional approach by allowing it to proceed on the different fluctuation regimes previously determined using a change-points detection test. In this framework, the characterization of the various exogenous factors influencing the scaling behavior is performed on the basis of a single-factor model, thus creating a kind of self-explainable machine learning for price forecasting. The proposal is tested on the daily data of the three among the main cryptocurrencies in order to examine whether the digital market has experienced upheavals in recent years and whether this has in some ways led to a structured multifractal behavior. The sampled period ranges from April 2017 to December 2022. We especially detect common periods of local scaling for the three prices with a decreasing multifractality after 2018. Complementary tests on shuffled and surrogate data prove that the distribution, linear correlation, and nonlinear structure also explain at some level the structural multifractality. Finally, prediction experiments based on neural networks fed with multi-fractionally differentiated data show the interest of this new self-explained algorithm, thus giving decision-makers and investors the ability to use it for more accurate and interpretable forecasts.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.08440&r=big

Quantitative Trading using Deep Q Learning

By:	Soumyadip Sarkar
Abstract:	Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative trading and presents a case study of a RL-based trading algorithm. The results show that RL can be a powerful tool for quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using reinforcement learning in quantitative trading and highlights the importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial markets and generate greater returns for investors.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.06037&r=big

Assessing the Credit Risk of Crypto-Assets Using Daily Range Volatility Models

By:	Fantazzini, Dean
Abstract:	In this paper, we analyzed a dataset of over 2000 crypto-assets to assess their credit risk by computing their probability of death using the daily range. Unlike conventional low-frequency volatility models that only utilize close-to-close prices, the daily range incorporates all the information provided in traditional daily datasets, including the open-high-low-close (OHLC) prices for each asset. We evaluated the accuracy of the probability of death estimated with the daily range against various forecasting models, including credit scoring models, machine learning models, and time-series-based models. Our study considered different definitions of ``dead coins'' and various forecasting horizons. Our results indicate that credit scoring models and machine learning methods incorporating lagged trading volumes and online searches were the best models for short-term horizons up to 30 days. Conversely, time-series models using the daily range were more appropriate for longer term forecasts, up to one year. Additionally, our analysis revealed that the models using the daily range signaled, far in advance, the weakened credit position of the crypto derivatives trading platform FTX, which filed for Chapter 11 bankruptcy protection in the United States on 11 November 2022.
Keywords:	daily range; bitcoin; crypto-assets; cryptocurrencies; credit risk; default probability; probability of death; ZPP; cauchit; random forests
JEL:	C32 C35 C51 C53 C58 G12 G17 G32 G33
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:117141&r=big

The Impact of Research and Development Expenditures on ESG Model in the Global Economy

By:	Costantiello, Alberto; Leogrande, Angelo
Abstract:	We estimate the value of Research and Development Expenditures as a percentage of GDP-RDE in the context of Environmental, Social and Governance-ESG model. We use the ESG World Bank database. We analyze data from193 countries in the period 2011-2020. We apply a set of econometric techniques i.e. Pooled Ordinary Least Squares-OLS, Panel Data with Random Effects, Panel Data with Fixed Effects, Weighted Least Squares-WLS. We found that the level of RDE is positively associated, among others, to “Nitrous Oxide Emissions” and “Scientific and Technical Journal Articles”, and negatively associated, among others to “Heat Index 35”, “Maximum 5-day Rainfall”. Furthermore, we perform a cluster analysis with the application of the k-Means algorithm optimized with the Elbow Method. The results show the presence of four clusters. Finally, we confront eight different machine-learning algorithms to predict the future value of RDE. We find that Linear Regression is the best predictive algorithms. RDE is expected to growth on average of 0.07% for the analysed countries.
Keywords:	Analysis of Collective Decision-Making, General, Political Processes: Rent-Seeking, Lobbying, Elections, Legislatures, and Voting Behaviour, Bureaucracy, Administrative Processes in Public Organizations, Corruption, Positive Analysis of Policy Formulation, Implementation.
JEL:	D7 D70 D72 D73 D78
Date:	2023–04–10
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:117013&r=big

GDP nowcasting with artificial neural networks: How much does long-term memory matter?

By:	Krist\'of N\'emeth; D\'aniel Hadh\'azi
Abstract:	In our study, we apply different statistical models to nowcast quarterly GDP growth for the US economy. Using the monthly FRED-MD database, we compare the nowcasting performance of the dynamic factor model (DFM) and four artificial neural networks (ANNs): the multilayer perceptron (MLP), the one-dimensional convolutional neural network (1D CNN), the long short-term memory network (LSTM), and the gated recurrent unit (GRU). The empirical analysis presents the results from two distinctively different evaluation periods. The first (2010:Q1 -- 2019:Q4) is characterized by balanced economic growth, while the second (2010:Q1 -- 2022:Q3) also includes periods of the COVID-19 recession. According to our results, longer input sequences result in more accurate nowcasts in periods of balanced economic growth. However, this effect ceases above a relatively low threshold value of around six quarters (eighteen months). During periods of economic turbulence (e.g., during the COVID-19 recession), longer training sequences do not help the models' predictive performance; instead, they seem to weaken their generalization capability. Our results show that 1D CNN, with the same parameters, generates accurate nowcasts in both of our evaluation periods. Consequently, first in the literature, we propose the use of this specific neural network architecture for economic nowcasting.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.05805&r=big

Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data Streams

By:	Mahsa Tavakoli; Rohitash Chandra; Fengrui Tian; Cristi\'an Bravo
Abstract:	Knowing which factors are significant in credit rating assignment leads to better decision-making. However, the focus of the literature thus far has been mostly on structured data, and fewer studies have addressed unstructured or multi-modal datasets. In this paper, we present an analysis of the most effective architectures for the fusion of deep learning models for the prediction of company credit rating classes, by using structured and unstructured datasets of different types. In these models, we tested different combinations of fusion strategies with different deep learning models, including CNN, LSTM, GRU, and BERT. We studied data fusion strategies in terms of level (including early and intermediate fusion) and techniques (including concatenation and cross-attention). Our results show that a CNN-based multi-modal model with two fusion strategies outperformed other multi-modal techniques. In addition, by comparing simple architectures with more complex ones, we found that more sophisticated deep learning models do not necessarily produce the highest performance; however, if attention-based models are producing the best results, cross-attention is necessary as a fusion strategy. Finally, our comparison of rating agencies on short-, medium-, and long-term performance shows that Moody's credit ratings outperform those of other agencies like Standard & Poor's and Fitch Ratings.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.10740&r=big

Parameterized Neural Networks for Finance

By:	Daniel Oeltz; Jan Hamaekers; Kay F. Pilz
Abstract:	We discuss and analyze a neural network architecture, that enables learning a model class for a set of different data samples rather than just learning a single model for a specific data sample. In this sense, it may help to reduce the overfitting problem, since, after learning the model class over a larger data sample consisting of such different data sets, just a few parameters need to be adjusted for modeling a new, specific problem. After analyzing the method theoretically and by regression examples for different one-dimensional problems, we finally apply the approach to one of the standard problems asset managers and banks are facing: the calibration of spread curves. The presented results clearly show the potential that lies within this method. Furthermore, this application is of particular interest to financial practitioners, since nearly all asset managers and banks which are having solutions in place may need to adapt or even change their current methodologies when ESG ratings additionally affect the bond spreads.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.08883&r=big

Measuring the Temporal Dimension of Text: An Application to Policymaker Speeches

By:	Byrne, David (Central Bank of Ireland); Goodhead, Robert (Central Bank of Ireland); McMahon, Michael (University of Oxford); Parle, Conor (European Central Bank and Trinity College Dublin)
Abstract:	Discussions of time are central to many questions in the social sciences and to official announcements of policy. Despite the growing popularity of applying Natural Language Processing (NLP) techniques to social science research questions, before now there have been few attempts to measure expressions of time. This paper provides a methodology to measure the “third T of Text”: the Time dimension. We also survey the techniques used to measure the other Ts, namely Topic and Tone. We document key stylised facts relating to temporal information in a corpus of policymaker speeches.
Keywords:	Textual analysis, Machine Learning, Communication.
JEL:	C55 C80 E58
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:cbi:wpaper:2/rt/23&r=big

Artificial neural networks and time series of counts: A class of nonlinear INGARCH models

By:	Malte Jahn
Abstract:	Time series of counts are frequently analyzed using generalized integer-valued autoregressive models with conditional heteroskedasticity (INGARCH). These models employ response functions to map a vector of past observations and past conditional expectations to the conditional expectation of the present observation. In this paper, it is shown how INGARCH models can be combined with artificial neural network (ANN) response functions to obtain a class of nonlinear INGARCH models. The ANN framework allows for the interpretation of many existing INGARCH models as a degenerate version of a corresponding neural model. Details on maximum likelihood estimation, marginal effects and confidence intervals are given. The empirical analysis of time series of bounded and unbounded counts reveals that the neural INGARCH models are able to outperform reasonable degenerate competitor models in terms of the information loss.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.01025&r=big

Short-Term Volatility Prediction Using Deep CNNs Trained on Order Flow

By:	Mingyu Hao; Artem Lenskiy
Abstract:	As a newly emerged asset class, cryptocurrency is evidently more volatile compared to the traditional equity markets. Due to its mostly unregulated nature, and often low liquidity, the price of crypto assets can sustain a significant change within minutes that in turn might result in considerable losses. In this paper, we employ an approach for encoding market information into images and making predictions of short-term realized volatility by employing Convolutional Neural Networks. We then compare the performance of the proposed encoding and corresponding model with other benchmark models. The experimental results demonstrate that this representation of market data with a Convolutional Neural Network as a predictive model has the potential to better capture the market dynamics and a better volatility prediction.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.02472&r=big

How can Big Data improve the quality of tourism statistics? The Bank of Italyâ€™s experience in compiling the â€œtravelâ€ item in the Balance of Payments.

By:	Costanza Catalano (Bank of Italy); Andrea Carboni (Bank of Italy); Claudio Doria (Bank of Italy)
Abstract:	In tourism statistics it is becoming more and more important to identify data sources that are more timely and cheaper than the traditional ones, such as surveys. In this paper, we investigate how mobile phone data (MPD), electronic payments data and internet search data (Google Trends) can improve the compilation of tourism statistics and the 'travel' item in the Balance of Payments (BoP). We find that MPD have the potential to improve the estimates of the number of international travelers and can be integrated with surveys, although a constant interaction with the data supplier is required to identify the phenomena to be captured. We highlight the limitations and issues in using electronic payment data for estimating expenditure in tourism statistics, and we propose a model for producing more timely preliminary estimates for BoP purposes. Finally, we point out that Google Trends data can be used to complement the sample estimates of international travelers and to improve the quality of provisional data.
Keywords:	big data, international tourism, mobile phone data, payments statistics, Google Trends
JEL:	I31 I32 D63 D31
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:bdi:opques:qef_761_23&r=big

From Euclidean Distance to Spatial Classification: Unraveling the Technology behind GPT Models

By:	Alfredo B. Roisenzvit
Abstract:	In this paper, we present a comprehensive analysis of the technology underpinning Generative Pre-trained Transformer (GPT) models, with a particular emphasis on the interrelationships between Euclidean distance, spatial classification, and the functioning of GPT models. Our investigation begins with a thorough examination of Euclidean distance, elucidating its role as a fundamental metric for quantifying the proximity between points in a multi-dimensional space. Following this, we provide an overview of spatial classification techniques, explicating their utility in discerning patterns and relationships within complex data structures. With this foundation, we delve into the inner workings of GPT models, outlining their architectural components, such as the self-attention mechanism and positional encoding. We then explore the process of training GPT models, detailing the significance of tokenization and embeddings. Additionally, we scrutinize the role of Euclidean distance and spatial classification in enabling GPT models to effectively process input sequences and generate coherent output in a wide array of natural language processing tasks. Ultimately, this paper aims to provide a comprehensive understanding of the intricate connections between Euclidean distance, spatial classification, and GPT models, fostering a deeper appreciation of their collective impact on the advancements in artificial intelligence and natural language processing.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:cem:doctra:853&r=big

Modelling customer lifetime-value in the retail banking industry

By:	Greig Cowan; Salvatore Mercuri; Raad Khraishi
Abstract:	Understanding customer lifetime value is key to nurturing long-term customer relationships, however, estimating it is far from straightforward. In the retail banking industry, commonly used approaches rely on simple heuristics and do not take advantage of the high predictive ability of modern machine learning techniques. We present a general framework for modelling customer lifetime value which may be applied to industries with long-lasting contractual and product-centric customer relationships, of which retail banking is an example. This framework is novel in facilitating CLV predictions over arbitrary time horizons and product-based propensity models. We also detail an implementation of this model which is currently in production at a large UK lender. In testing, we estimate an 43% improvement in out-of-time CLV prediction error relative to a popular baseline approach. Propensity models derived from our CLV model have been used to support customer contact marketing campaigns. In testing, we saw that the top 10% of customers ranked by their propensity to take up investment products were 3.2 times more likely to take up an investment product in the next year than a customer chosen at random.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.03038&r=big

Determinants and Social Dividends of Digital Adoption

By:	David Amaglobeli; Mariano Moszoro; Utkarsh Kumar
Abstract:	We identify key drivers of digital adoption, estimate fiscal costs to provide internet subsidies to households, and calculate social dividends from digital adoption. Using cross-country panel regressions and machine learning, we find that digital infrastructure coverage, internet price, and usability are the most statistically robust predictors of internet use in the short run. Based on estimates from a model of demand for internet, we find that demand is most price responsive in low-income developing countries and almost unresponsive in advanced economies. We estimate that moving low-income developing and emerging market economies to the levels of digital adoption in emerging and advanced economies, respectively, will require annual targeted subsidies of 1.8 and 0.05 percent of GDP, respectively. To aid with subsidy targeting, we use microdata from over 150 countries and document a digital divide by gender, socio-economic status, and demographics. Finally, we find substantial aggregate and distributional gains from digital adoption for education quality, time spent doing unpaid work, and labor force participation by gender.
Keywords:	Social Dividends; Digitalization; GovTech; Internet use; internet price; Internet adoption; Internet coverage; internet use; Labor force participation; Income; Women; Purchasing power parity; Sub-Saharan Africa
Date:	2023–03–17
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2023/065&r=big

Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models

By:	Alejandro Lopez-Lira; Yuehua Tang
Abstract:	We examine the potential of ChatGPT, and other large language models, in predicting stock market returns using sentiment analysis of news headlines. We use ChatGPT to indicate whether a given headline is good, bad, or irrelevant news for firms' stock prices. We then compute a numerical score and document a positive correlation between these ``ChatGPT scores'' and subsequent daily stock market returns. Further, ChatGPT outperforms traditional sentiment analysis methods. We find that more basic models such as GPT-1, GPT-2, and BERT cannot accurately forecast returns, indicating return predictability is an emerging capacity of complex models. Our results suggest that incorporating advanced language models into the investment decision-making process can yield more accurate predictions and enhance the performance of quantitative trading strategies.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.07619&r=big

The Central Bank Crystal Ball: Temporal information in monetary policy communication

By:	Byrne, David (Central Bank of Ireland); Goodhead, Robert (Central Bank of Ireland); McMahon, Michael (University of Oxford); Parle, Conor (European Central Bank and Trinity College Dublin)
Abstract:	Effective central bank communication provides information that the public wants but does not have. Using a new textual methodology to quantify the temporal information in central bank communication, we argue that central bank assessments of the (latent) state of the economy can be the source of the public’s information deficit, rather than superior information necessarily. The implication of this is that communication of a single, fixed, reaction function, even if desirable, is likely impossible even if preferences remain fixed over time. Communication of how the central bank is assessing the economy should be emphasised in addition to any forward guidance.
Keywords:	Monetary Policy, Communication, Natural Language Processing.
JEL:	E52 E58 C55
Date:	2023–02
URL:	http://d.repec.org/n?u=RePEc:cbi:wpaper:1/rt/23&r=big

What We Teach about Race and Gender: Representation in Images and Text of Children's Books

By:	Adukia, Anjali (Harris School, University of Chicago); Eble, Alex (Columbia University); Harrison, Emileigh (University of Chicago); Runesha, Hakizumwami Birali (University of Chicago); Szasz, Teodora (University of Chicago)
Abstract:	Books shape how children learn about society and norms, in part through representation of different characters. We introduce new artificial intelligence methods for systematically converting images into data and apply them, along with text analysis methods, to measure the representation of skin color, race, gender, and age in award-winning children's books widely read in homes, classrooms, and libraries over the last century. We find that more characters with darker skin color appear over time, but the most influential books persistently depict characters with lighter skin color, on average, than other books, even after conditioning on race; we also find that children are depicted with lighter skin than adults on average. Relative to their growing share of the U.S. population, Black and Latinx people are underrepresented in these same books, while White males are overrepresented. Over time, females are increasingly present but appear less often in text than in images, suggesting greater symbolic inclusion in pictures than substantive inclusion in stories. We then present analysis of the supply of, and demand for, books with different levels of representation to better understand the economic behavior that may contribute to these patterns. On the demand side, we show that people consume books that center their own identities. On the supply side, we document higher prices for books that center non-dominant social identities and fewer copies of these books in libraries that serve predominantly White communities. Lastly, we show that the types of children's books purchased in a neighborhood are related to local political beliefs.
Keywords:	representation, images as data, curriculum, children, education, libraries, race, gender
JEL:	I24 I21 Z1 J15 J16
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp16058&r=big

Academic Migration and Academic Networks: Evidence from Scholarly Big Data and the Iron Curtain

By:	Donia Kamel; Laura Pollacci
Abstract:	Iron Curtain and Big Data are two words usually used to denote completely two different eras. Yet, the context the former offers and the rich data source the latter provides, enable the causal identification of the effect of networks on migration. Academics in countries behind the Iron Curtain were strongly isolated from the rest of the world. This context poses the question of the importance of academic networks for migration post the fall of the Berlin Wall and Iron Curtain. Using Microsoft Academic Knowledge Graph, a scholarly big data source, mapping of academics’ networks is possible and information about the size and quality of their co-authorships, by location is achieved. Focusing on academics from Eastern Europe (henceforth EE) from 1980-1988 and their academic networks (1980-1988), We investigate the effect of academic network characteristics, by location, on the probability to migrate post the fall of the Berlin Wall in 1989 and up to 2003, marking the year many EE countries held referendums or signed treaties to join the EU. The unique context ensures that there was no anticipation of the fall of the Eastern Bloc and together with the data that offers unique rich information, identification is achieved. Approximately 30k academics from EE were identified, from which 3% were migrants. The results could be explained by two channels, the cost and signalling channel. The cost channel is how the network characteristic reduces or increases the cost of migration and thus acting as a facilitator or a de-facilitator of migration. The signal channel on the other hand in which the network characteristic serves as a signal for the academic himself and his quality and his potential contribution and addition to the new host institution, thus also serving as a facilitator or a de-facilitator of migration. We find that mostly network size and quality results could be explained by the cost channel and signalling channel, respectively. Size of the network tends to be more important than the quality, which is a context-specific result. We find heterogeneous effects by fields of study that align with previous lines of research. Heterogeneous effects are explained by two things: threat of attention and arrest from KGB and the role of reputation, language, and network barriers.
Keywords:	networks, migration, academic networks, Big Data, brain drain, Iron Curtain, Eastern Europe
JEL:	C55 D85 F50 I20 I23 J24 N34 N44 O15
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_10377&r=big

Leaning Against the Data: Policymaker Communications under State-Based Forward Guidance

By:	Taeyoung Doh; Joseph W. Gruber; Dongho Song
Abstract:	A purported benefit of state-based forward guidance is that the private sector adjusts the expected stance of policy without further policymaker communications. This assumes a shared understanding of how policymakers are interpreting the data and that policymakers are consistent in their assessment of the data. Using text analysis, we test whether the FOMC’s introduction of state-based forward guidance in December 2012 changed the tone of policymaker communications. We find that policymakers tended to downplay positive data following the introduction of the guidance, in effect leaning against the data and reinforcing the dependence of policy expectations on policymaker communications.
Keywords:	Monetary policy; Forward guidance; Financial markets
JEL:	E30 E40 E50 G12
Date:	2022–09–08
URL:	http://d.repec.org/n?u=RePEc:fip:fedkrw:94764&r=big

Data, Competition, and Digital Platforms

By:	Dirk Bergemann; Alessandro Bonatti
Abstract:	We analyze digital markets where a monopolist platform uses data to match multiproduct sellers with heterogeneous consumers who can purchase both on and off the platform. The platform sells targeted ads to sellers that recommend their products to consumers and reveals information to consumers about their values. The revenue-optimal mechanism is a managed advertising campaign that matches products and preferences efficiently. In equilibrium, sellers offer higher qualities at lower unit prices on than off the platform. Privacy-respecting data-governance rules such as organic search results or federated learning can lead to welfare gains for consumers.
Date:	2023–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2304.07653&r=big

This nep-big issue is ©2023 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.