nep-big 2024-01-01 papers

on Big Data

Issue of 2024‒01‒01
fifteen papers chosen by
Tom Coupé, University of Canterbury

A Data-driven Deep Learning Approach for Bitcoin Price Forecasting By Parth Daxesh Modi; Kamyar Arshi; Pertami J. Kunz; Abdelhak M. Zoubir
Predicting the Law: Artificial Intelligence Findings from the IMF’s Central Bank Legislation Database By Khaled AlAjmi; Jose Deodoro; Mr. Ashraf Khan; Kei Moriya
Whose Inflation Rates Matter Most? A DSGE Model and Machine Learning Approach to Monetary Policy in the Euro Area By Stempel, Daniel; Zahner, Johannes
Boosting Stock Price Prediction with Anticipated Macro Policy Changes By Md Sabbirul Haque; Md Shahedul Amin; Jonayet Miah; Duc Minh Cao; Ashiqul Haque Ahmed
Forecasting Economic Activity with a Neural Network in Uncertain Times: Monte Carlo Evidence and Application to German GDP By Holtemöller, Oliver; Kozyrev, Boris
Estimation of Semiparametric Multiâ€“Index Models Using Deep Neural Networks By Chaohua Dong; Jiti Gao; Bin Peng; Yayi Yan
â€œSovereign Risk and Economic Complexity: Machine Learning Insights on Causality and Predictionâ€ By Jose E. Gomez-Gonzalez; Jorge M. Uribe; Oscar M. Valencia
Grammar In Language Models: Bert Study By Ksenia E. Chistyakova; Tatiana B. Kazakova
Predicting Recessions in (almost) Real Time in a Big-data Setting By Alexandre Bonnet R. Costa; Pedro Cavalcanti G. Ferreira; Wagner Piazza Gaglianone; Osmani Teixeira C. Guillén; João Victor Issler; Artur Brasil Fialho Rodrigues
The Determinants of Missed Funding: Predicting the Paradox of Increased Need and Reduced Allocation By Di Stefano, Roberta; Resce, Giuliano
Stigma and Take-up of Labor Market Assistance: Evidence from Two Field Experiments By Osman, Adam; Speer, Jamin D.
Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis By Yuxi Heluo; Kexin Wang; Charles W. Robson
Multi-Layer Spillovers between Volatility and Skewness in International Stock Markets Over a Century of Data: The Role of Disaster Risks By Matteo Foglia; Vasilios Plakandaras; Rangan Gupta; Elie Bouri
Categorías municipales en Colombia: Avanzando hacia un modelo de descentralización asimétrica By Karina Acosta; Yuri Reina-Aranza
Doombot: a machine learning algorithm for predicting downturns in OECD countries By Thomas Chalaux; David Turner

A Data-driven Deep Learning Approach for Bitcoin Price Forecasting

By:	Parth Daxesh Modi; Kamyar Arshi; Pertami J. Kunz; Abdelhak M. Zoubir
Abstract:	Bitcoin as a cryptocurrency has been one of the most important digital coins and the first decentralized digital currency. Deep neural networks, on the other hand, has shown promising results recently; however, we require huge amount of high-quality data to leverage their power. There are some techniques such as augmentation that can help us with increasing the dataset size, but we cannot exploit them on historical bitcoin data. As a result, we propose a shallow Bidirectional-LSTM (Bi-LSTM) model, fed with feature engineered data using our proposed method to forecast bitcoin closing prices in a daily time frame. We compare the performance with that of other forecasting methods, and show that with the help of the proposed feature engineering method, a shallow deep neural network outperforms other popular price forecasting models.
Date:	2023–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.06280&r=big

Predicting the Law: Artificial Intelligence Findings from the IMF’s Central Bank Legislation Database

By:	Khaled AlAjmi; Jose Deodoro; Mr. Ashraf Khan; Kei Moriya
Abstract:	Using the 2010, 2015, and 2020/2021 datasets of the IMF’s Central Bank Legislation Database (CBLD), we explore artificial intelligence (AI) and machine learning (ML) approaches to analyzing patterns in central bank legislation. Our findings highlight that: (i) a simple Naïve Bayes algorithm can link CBLD search categories with a significant and increasing level of accuracy to specific articles and phrases in articles in laws (i.e., predict search classification); (ii) specific patterns or themes emerge across central bank legislation (most notably, on central bank governance, central bank policy and operations, and central bank stakeholders and transparency); and (iii) other AI/ML approaches yield interesting results, meriting further research.
Keywords:	central bank legislation; central banking; artificial intelligence; machine learning; Bayesian algorithm; Boolean algorithm; central bank governance; law and economics
Date:	2023–11–17
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2023/241&r=big

Whose Inflation Rates Matter Most? A DSGE Model and Machine Learning Approach to Monetary Policy in the Euro Area

By: Stempel, Daniel; Zahner, Johannes

JEL: E58 C45 C53

Date: 2023

URL: http://d.repec.org/n?u=RePEc:zbw:vfsc23:277627&r=big

Boosting Stock Price Prediction with Anticipated Macro Policy Changes

By:	Md Sabbirul Haque; Md Shahedul Amin; Jonayet Miah; Duc Minh Cao; Ashiqul Haque Ahmed
Abstract:	Prediction of stock prices plays a significant role in aiding the decision-making of investors. Considering its importance, a growing literature has emerged trying to forecast stock prices with improved accuracy. In this study, we introduce an innovative approach for forecasting stock prices with greater accuracy. We incorporate external economic environment-related information along with stock prices. In our novel approach, we improve the performance of stock price prediction by taking into account variations due to future expected macroeconomic policy changes as investors adjust their current behavior ahead of time based on expected future macroeconomic policy changes. Furthermore, we incorporate macroeconomic variables along with historical stock prices to make predictions. Results from this strongly support the inclusion of future economic policy changes along with current macroeconomic information. We confirm the supremacy of our method over the conventional approach using several tree-based machine-learning algorithms. Results are strongly conclusive across various machine learning models. Our preferred model outperforms the conventional approach with an RMSE value of 1.61 compared to an RMSE value of 1.75 from the conventional approach.
Date:	2023–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.06278&r=big

Forecasting Economic Activity with a Neural Network in Uncertain Times: Monte Carlo Evidence and Application to German GDP

By: Holtemöller, Oliver; Kozyrev, Boris

JEL: C22 C45 C53

Date: 2023

URL: http://d.repec.org/n?u=RePEc:zbw:vfsc23:277688&r=big

Estimation of Semiparametric Multiâ€“Index Models Using Deep Neural Networks

By:	Chaohua Dong; Jiti Gao; Bin Peng; Yayi Yan
Abstract:	In this paper, we consider estimation and inference for both the multi-index parameters and the link function involved in a class of semiparametric multiâ€“index models via deep neural networks (DNNs). We contribute to the design of DNN by i) providing more transparency for practical implementation, ii) defining different types of sparsity, iii) showing the differentiability, iv) pointing out the set of effective parameters, and v) offering a new variant of rectified linear activation function (ReLU), etc. Asymptotic properties for the joint estimates of both the index parameters and the link functions are established, and a feasible procedure for the purpose of inference is also proposed. We conduct extensive numerical studies to examine the finite-sample performance of the estimation methods, and we also evaluate the empirical relevance and applicability of the proposed models and estimation methods to real data.
Keywords:	asymptotic theory, multi-index model, ReLU, semiparametric regression
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:msh:ebswps:2023-21&r=big

â€œSovereign Risk and Economic Complexity: Machine Learning Insights on Causality and Predictionâ€

By:	Jose E. Gomez-Gonzalez (City University of New York-Lehman College (USA). Visiting Professor - Universidad de la Sabana); Jorge M. Uribe (Universitat Oberta de Catalunya, Barcelona (Spain)); Oscar M. Valencia (Fiscal Management Division, Inter-American Development Bank, Washington (USA).)
Abstract:	We investigate how a countryâ€™s economic complexity influences its sovereign yield spread with respect to the US. We analyze various maturities across 28 countries, consisting of 16 emerging and 12 advanced economies. Notably, a one-unit increase in the economic complexity index is associated to a reduction of about 87 basis points in the 10-year yield spread (p
Keywords:	Sovereign Credit Risk, Convenience Yields, Yield Curve, Government Debt, Double-Machine-Learning, XGBoost. JEL classification: F34, G12, G15, H63, O40.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:ira:wpaper:202315&r=big

Grammar In Language Models: Bert Study

By:	Ksenia E. Chistyakova (National Research University Higher School of Economics); Tatiana B. Kazakova (National Research University Higher School of Economics)
Abstract:	The problem of language models’ interpretation is extensively inspected, but no universal answers have been found. Our study offers to combine widely accepted probing methods with a novel approach to a neural network under investigation. We propose to break grammatical forms on the pre-training step in order to get two "sibling" models, as it casts some light on how different linguistic features are encoded and distributed across the neural language architecture.
Keywords:	probing, language models, transformers, BERT.
JEL:	Z
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:hig:wpaper:115/lng/2023&r=big

Predicting Recessions in (almost) Real Time in a Big-data Setting

By:	Alexandre Bonnet R. Costa; Pedro Cavalcanti G. Ferreira; Wagner Piazza Gaglianone; Osmani Teixeira C. Guillén; João Victor Issler; Artur Brasil Fialho Rodrigues
Abstract:	The objective of this paper is to propose an approach for dating recessions in real time (or slightly a posteriori) that is suitable to a big data environment. Our proposal is to mix the canonical correlation approach of Issler and Vahid (2006) with the big data approach defended by Stock and Watson (2014). We incorporate the good elements of each approach into one. This involves solving both the problem of missing data and high dimensionality in big databases, besides defining a decision rule on how to choose the best forecasting model in real time. Our empirical results show it is possible to track the state of the U.S. and European economies using the models developed here, as long as appropriate techniques to reduce the dimensionality of the databases are implemented - canonical correlations coupled with principal component analysis. Depending on the cutoffs chosen, the models predict recessions in real time with an accuracy of 98% and 80%, respectively, for the U.S. and the Euro Area.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:bcb:wpaper:587&r=big

The Determinants of Missed Funding: Predicting the Paradox of Increased Need and Reduced Allocation

By:	Di Stefano, Roberta; Resce, Giuliano
Abstract:	This research investigates how local governments overlook competitive funding opportunities within cohesion policies, utilizing machine learning and analyzing data from open calls within the European Next Generation EU funds. The focus is on predicting which local governments may face challenges in utilizing available funding, specifically examining the allocation of funds for Italian childcare services. The results demonstrate that it is possible to make out-of-sample predictions of municipalities that are likely to abstain from invitations, also identifying key determinants. Population-related factors play a pivotal role in predicting inertia, alongside other service-demand-related elements, particularly in regions with limited services. The study emphasizes the importance of local institutional quality and individual attributes of policymakers. The adverse effects on participation resulting from factors that justify fund allocation may place regions with higher investment needs at a competitive disadvantage. Anticipating potential non-participants in calls can aid in achieving policy targets and optimizing the allocation of funds across various local governments.
Keywords:	Competitive funding; Cohesion policies; Predictive modeling; Machine learning.
JEL:	H5 H7 I3 J1 R5
URL:	http://d.repec.org/n?u=RePEc:mol:ecsdps:esdp23092&r=big

Stigma and Take-up of Labor Market Assistance: Evidence from Two Field Experiments

By:	Osman, Adam (University of Illinois at Urbana-Champaign); Speer, Jamin D. (University of Memphis)
Abstract:	Aversion to "stigma" - disutility associated with a program or activity due to beliefs about how it is perceived - may affect labor market choices and utilization of social programs, but empirical evidence of its importance is scarce. Using two randomized field experiments, we show that stigma can affect consequential labor market decisions. Treatments designed to alleviate stigma concerns about taking entry-level jobs - such as how those jobs are perceived by society - had small average effects on take-up of job assistance programs. However, using compositional analysis and machine learning methods, we document large heterogeneity in the responses to our treatments. Stigma significantly affects the composition of who takes up a program: the treatments were successful in overcoming stigma for older, wealthier, and working respondents. For other people, we show that our treatments merely increased the salience of the stigma without dispelling it. We conclude that social image concerns affect labor market decisions and that messaging surrounding programs can have important effects on program take-up and composition.
Keywords:	stigma, experiment, machine learning
JEL:	J22 C93 I38 Z13
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp16599&r=big

Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

By:	Yuxi Heluo; Kexin Wang; Charles W. Robson
Abstract:	In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 2020. We find that mask-wearing-related government regulations and public-transport announcements encouraged correct mask-wearing-behaviours during the COVID-19 pandemic. Importantly, changes in announcement and regulation contents led to heterogeneous effects on people's behaviour. Comparing the predictive power of regression analysis and neural networks, we demonstrate that the latter produces more accurate predictions of population reactions during the COVID-19 pandemic. Our use of regression modelling also allows us to unearth possible causal pathways underlying societal behaviour. Since our findings highlight the importance of appropriate communication contents, our results will facilitate more effective non-pharmaceutical interventions to be developed in future. Adding to the literature, we demonstrate that regression modelling and neural networks are not mutually exclusive but instead complement each other.
Date:	2023–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2311.13046&r=big

Multi-Layer Spillovers between Volatility and Skewness in International Stock Markets Over a Century of Data: The Role of Disaster Risks

By:	Matteo Foglia (Department of Economics and Finance, University of Bari ``Aldo Moro", Italy); Vasilios Plakandaras (Department of Economics, Democritus University of Thrace, Komotini, Greece); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Elie Bouri (School of Business, Lebanese American University, Lebanon)
Abstract:	Measuring risk lies at the core of the decision-making process of every financial market participant and monetary authority. However, the bulk of literature treats risk as a function of the second moment (volatility) of the return distribution, based on the implicit unrealistic assumption that asset return are normally distributed. In this paper, we depart from centred moments of distribution by examining risk spillovers involving robust estimates of second and third moments of model-implied distributions of stock returns derived from the quantile autoregressive distributed lag mixed-frequency data sampling (QADL-MIDAS) method. Using a century of data on the stock indices of the G7 and Switzerland over the period May 1917 to February 2023 and applying the multilayer approach to spillovers, we show the following. Firstly, the risk spillover among stock markets is significant within each layer (i.e. volatility and skewness) and across the two layers. Secondly, geopolitical risks have the power to shape both risk layer values, based on an out-of-sample forecasting exercise involving machine-learning methods. Interestingly, the multi-layer approach offers a comprehensive and nuanced view of how risk information is transmitted across major stock markets, while global measures of geopolitical risk affect risk spillovers at shorter horizons up to 6 months, while, at longer horizons, the forecasting exercise is dominated by market-specific characteristics.
Keywords:	Risk spillover, advanced stock markets, multi-layer spillover approach, machine learning, geopolitical risks, forecasting
JEL:	C22 C32 C53 G15
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202337&r=big

Categorías municipales en Colombia: Avanzando hacia un modelo de descentralización asimétrica

By:	Karina Acosta; Yuri Reina-Aranza
Abstract:	La descentralización asimétrica se ha convertido en una discusión relevante en países en desarrollo como Colombia, donde las capacidades entre los gobiernos subnacionales difieren dramáticamente. Aunque estas discusiones han estado presentes desde la década de los sesenta, el éxito de la descentralización es aún una discusión relevante, donde la clasificación del territorio es fundamental. Este documento discute los posibles obstáculos de las clases territoriales usadas actualmente en Colombia. Asimismo, propone y discute la utilidad del uso de algoritmos recientes de la literatura de aprendizaje de máquinas no supervisados para la clasificación de los territorios subnacionales. Específicamente, este documento implementa Clustering via Optimal Trees(ICOT), un algoritmo que permite clasificar los territorios y, a la vez, identificar las reglas de las clases definidas. Este estudio además propone la creación de diferentes tipologías territoriales conforme a sus usos. **** ABSTRACT: Asymmetric decentralization has become relevant in developing countries like Colombia, where capabilities between subnational governments differ dramatically. Although these discussions have been present since the 1960s, the success of decentralization is still a relevant discussion in these contexts, where the classification of the territory is fundamental. This document discusses the possible obstacles of the territorial classes currently used in Colombia. Likewise, it proposes and discusses the usefulness of using recent algorithms from the unsupervised machine learning literature to classify subnational territories. Specifically, this document implements Clustering via Optimal Trees (ICOT), an algorithm that allows us to classify the territories and identify the rules underneath the defined classes. This study also proposes the existence of different territorial typologies according to their uses.
Keywords:	descentralización, capacidades, resultados, aprendizaje de máquinas, ICOT, Colombia, decentralization, capabilities, results, unsupervised learning, ICOT
JEL:	C45 R11 E62
Date:	2023–12
URL:	http://d.repec.org/n?u=RePEc:bdr:region:321&r=big

Doombot: a machine learning algorithm for predicting downturns in OECD countries

By:	Thomas Chalaux; David Turner
Abstract:	This paper describes an algorithm, “DoomBot”, which selects parsimonious models to predict downturns over different quarterly horizons covering the ensuing two years for 20 OECD countries. The models are country- and horizon-specific and are automatically updated as the estimation sample period is extended, so facilitating out-of-sample evaluation of the algorithm. A limited combination of explanatory variables is chosen from a much larger pool of potential variables that include those that have been most useful in predicting downturns in previous OECD work. The most frequently selected variables are financial variables, especially those relating to credit and house prices, but also include equity prices and various measures of interest rates (such as the slope of the yield curve). Business cycle variables -- survey measure of capacity utilisation, industrial production, GDP and unemployment -- are also selected, but more frequently at very short horizons. The variables selected do not just relate to the domestic economy of the country being considered, but also international aggregates, consistent with findings from previous OECD work. The in-sample fit of the models is very good on standard performance metrics, although the out-of-sample performance is less impressive. The models do, however, provide a clear out-of-sample early warning of the Global Financial Crisis (GFC), especially when considered collectively, although they do generate ‘false alarms’ just ahead of the crisis. The models are less good at predicting the euro area crisis out-of-sample, but it is clear from the evolution of the choice of variables that the algorithm learns from this episode, for example through the more frequent selection of a variable measuring euro area sovereign bond spreads. The latest out-of-sample predictions made in mid-2023, suggest the probability of a downturn is at its greatest and most widespread since the GFC, with the largest contributions to such risks coming from house prices, interest rate developments (as measured by the slope of the yield curve and the rapidity of the change in short rates) and oil prices. On the other hand, warning signals from business cycle variables and equity prices, which are often good downturn predictors at short horizons, are conspicuously absent.
Keywords:	Downturn, forecast, GDP growth, recession, risk
JEL:	E01 E17 E65 E66 E58
Date:	2023–12–12
URL:	http://d.repec.org/n?u=RePEc:oec:ecoaaa:1780-en&r=big

This nep-big issue is ©2024 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Stempel, Daniel; Zahner, Johannes
JEL:	E58 C45 C53
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:vfsc23:277627&r=big

By:	Holtemöller, Oliver; Kozyrev, Boris
JEL:	C22 C45 C53
Date:	2023
URL:	http://d.repec.org/n?u=RePEc:zbw:vfsc23:277688&r=big