nep-big 2021-04-26 papers

on Big Data

Issue of 2021‒04‒26
28 papers chosen by
Tom Coupé
University of Canterbury

CATE meets ML - The Conditional Average Treatment Effect and Machine Learning By Daniel Jacob
Accuracies of Model Risks in Finance using Machine Learning By Berthine Nyunga Mpinda; Jules Sadefo Kamdem; Salomey Osei; Jeremiah Fadugba
Aiding Long-Term Investment Decisions with XGBoost Machine Learning Model By Ekaterina Zolotareva
Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx By Kin G. Olivares; Cristian Challu; Grzegorz Marcjasz; Rafal Weron; Artur Dubrawski
“Nowcasting and forecasting GDP growth with machine-learning sentiment indicators” By Oscar Claveria; Enric Monte; Salvador Torra
FORECASTING RUSSIAN CPI WITH DATA VINTAGES AND MACHINE LEARNING TECHNIQUES By Denis Shibitov; Mariam Mamedli
A Machine Learning Approach to Analyze and Support Anti-Corruption Policy By Elliott Ash; Sergio Galletta; Tommaso Giommoni
Power, Hate Speech, Machine Learning, and Intersectional Approach By Kim, Jae Yeon
Interpretability in deep learning for finance: a case study for the Heston model By Damiano Brigo; Xiaoshan Huang; Andrea Pallavicini; Haitz Saez de Ocariz Borde
The Effect of Sport in Online Dating: Evidence from Causal Machine Learning By Boller, Daniel; Lechner, Michael; Okasa, Gabriel
Predicting Inflation with Neural Networks By Paranhos, Livia
Accuracies of some Learning or Scoring Models for Credit Risk Measurement By Salomey Osei; Berthine Nyunga Mpinda; Jules Sadefo Kamdem; Jeremiah Fadugba
Система Galymzhan: online-оценка потребительской инфляции в Казахстане // Galymzhan System: Online Assessment of Consumer Inflation in Kazakhstan By Тулеуов Олжас // Tuleuov Olzhas; Ержан Ислам // Yerzhan Islam; Сейдахметов Ансар // Seidakhmetov Ansar
Adaptive learning for financial markets mixing model-based and model-free RL for volatility targeting By Eric Benhamou; David Saltiel; Serge Tabachnik; Sui Kai Wong; Fran\c{c}ois Chareyron
Herramientas de Google para la predicción de variables económicas. Una aplicación al Índice Compuesto Coincidente de Actividad Económica de la Provincia de Santa Fe (ICASFe) By Ramiro Emmanuel Jorge
Deep Reinforcement Learning in a Monetary Model By Mingli Chen; Andreas Joseph; Michael Kumhof; Xinlei Pan; Rui Shi; Xuan Zhou
Applications of Machine Learning in Mental Healthcare By Davcheva, Elena
Automatic Double Machine Learning for Continuous Treatment Effects By Sylvia Klosin
Micro-Estimates of Wealth for all Low- and Middle-Income Countries By Guanghua Chi; Han Fang; Sourav Chatterjee; Joshua E. Blumenstock
Forecasting Oil and Gold Volatilities with Sentiment Indicators Under Structural Breaks By Jiawen Luo; Riza Demirer; Rangan Gupta; Qiang Ji
The Current State of AI Governance – An EU Perspective By Dempsey, Mark; McBride, Keegan; Bryson, Joanna J.
Estimating the Causal Effects of Cruise Traffic on Air Pollution using Randomization-Based Inference By Zabrocki, Léo; Leroutier, Marion; Bind, Marie-Abèle
fsdaSAS: a package for robust regression for very large datasets including the batch forward search By Torti, Francesca; Corbellini, Aldo; Atkinson, Anthony C.
Refugee influx and economic activity: evidence from Rohingya refugee camps in Bangladesh By José Joaquín Endara
Assessing the Impact of COVID-19 on Trade: a Machine Learning Counterfactual Analysis By Dueñas, Marco; Ortiz, Víctor; Riccaboni, Massimo; Serti, Francesco
Event studies on investor sentiment By Marc-Aurèle Divernois; Damir Filipović
Adoption of digital technologies: Insights from a global survey initiative By James Fudurich; Lena Suchanek; Lise Pichette
Information theoretic causality detection between financial and sentiment data By Roberta Scaramozzino; Paola Cerchiello; Tomaso Aste

CATE meets ML - The Conditional Average Treatment Effect and Machine Learning

By:	Daniel Jacob
Abstract:	For treatment effects - one of the core issues in modern econometric analysis - prediction and estimation are two sides of the same coin. As it turns out, machine learning methods are the tool for generalized prediction models. Combined with econometric theory, they allow us to estimate not only the average but a personalized treatment effect - the conditional average treatment effect (CATE). In this tutorial, we give an overview of novel methods, explain them in detail, and apply them via Quantlets in real data applications. We study the effect that microcredit availability has on the amount of money borrowed and if 401(k) pension plan eligibility has an impact on net financial assets, as two empirical examples. The presented toolbox of methods contains meta-learners, like the Doubly-Robust, R-, T- and X-learner, and methods that are specially designed to estimate the CATE like the causal BART and the generalized random forest. In both, the microcredit and 401(k) example, we find a positive treatment effect for all observations but conflicting evidence of treatment effect heterogeneity. An additional simulation study, where the true treatment effect is known, allows us to compare the different methods and to observe patterns and similarities.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.09935&r=

Accuracies of Model Risks in Finance using Machine Learning

By:	Berthine Nyunga Mpinda; Jules Sadefo Kamdem (MRE - Montpellier Recherche en Economie - UM - Université de Montpellier); Salomey Osei; Jeremiah Fadugba
Abstract:	There is increasing interest in using Artificial Intelligence (AI) and machine learning techniques to enhance risk management from credit risk to operational risk. Moreover, recent applications of machine learning models in risk management have proved efficient. That notwithstanding, while using machine learning techniques can have considerable benefits, they also can introduce risk of their own, when the models are wrong. Therefore, machine learning models must be tested and validated before they can be used. The aim of this work is to explore some existing machine learning models for operational risk, by comparing their accuracies. Because a model should add value and reduce risk, particular attention is paid on how to evaluate it's performance, robustness and limitations. After using the existing machine learning and deep learning methods for operational risk, particularly on risk of fraud, we compared accuracies of these models based on the following metrics: accuracy, F1-Score, AUROC curve and precision. We equally used quantitative validation such as Back-testing and Stress-testing for performance analysis of the model on historical data, and the sensibility of the model for extreme but plausible scenarios like the Covid-19 period. Our results show that, Logistic regression out performs all deep learning models considered for fraud detection
Keywords:	Machine Learning,Model Risk,Credit Card Fraud,Decisions Support,Stress-Testing
Date:	2021–04–07
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03191437&r=

Aiding Long-Term Investment Decisions with XGBoost Machine Learning Model

By:	Ekaterina Zolotareva
Abstract:	The ability to identify stock market trends has obvious advantages for investors. Buying stock on an upward trend (as well as selling it in case of downward movement) results in profit. Accordingly, the start and end-points of the trend are the optimal points for entering and leaving the market. The research concentrates on recognizing stock market long-term upward and downward trends. The key results are obtained with the use of gradient boosting algorithms, XGBoost in particular. The raw data is represented by time series with basic stock market quotes with periods labelled by experts as Trend or Flat. The features are then obtained via various data transformations, aiming to catch implicit factors resulting in a change of stock direction. Modelling is done in two stages: stage one aims to detect endpoints of tendencies (i.e. sliding windows), stage two recognizes the tendency itself inside the window. The research addresses such issues as imbalanced datasets and contradicting labels, as well as the need for specific quality metrics to keep up with practical applicability. The model can be used to design an investment strategy though further research in feature engineering and fine calibration is required.This paper is the full text of the research, presented at the 20th International Conference on Artificial Intelligence and Soft Computing Web System (ICAISC 2021)
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.09341&r=

Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx

By:	Kin G. Olivares; Cristian Challu; Grzegorz Marcjasz; Rafal Weron; Artur Dubrawski
Abstract:	We extend the neural basis expansion analysis (NBEATS) to incorporate exogenous factors. The resulting method, called NBEATSx, improves on a well performing deep learning model, extending its capabilities by including exogenous variables and allowing it to integrate multiple sources of useful information. To showcase the utility of the NBEATSx model, we conduct a comprehensive study of its application to electricity price forecasting (EPF) tasks across a broad range of years and markets. We observe state-of-the-art performance, significantly improving the forecast accuracy by nearly 20% over the original NBEATS model, and by up to 5% over other well established statistical and machine learning methods specialized for these tasks. Additionally, the proposed neural network has an interpretable configuration that can structurally decompose time series, visualizing the relative impact of trend and seasonal components and revealing the modeled processes' interactions with exogenous factors.
Keywords:	Deep learning; NBEATS and NBEATSx models; Interpretable neural network; Time series decomposition; Fourier series; Electricity price forecasting
JEL:	C22 C32 C45 C51 C53 Q41 Q47
Date:	2021–04–19
URL:	http://d.repec.org/n?u=RePEc:ahh:wpaper:worms2107&r=

“Nowcasting and forecasting GDP growth with machine-learning sentiment indicators”

By:	Oscar Claveria (AQR-IREA, University of Barcelona); Enric Monte (Polytechnic University of Catalunya); Salvador Torra (Riskcenter-IREA, University of Barcelona)
Abstract:	We apply the two-step machine-learning method proposed by Claveria et al. (2021) to generate country-specific sentiment indicators that provide estimates of year-on-year GDP growth rates. In the first step, by means of genetic programming, business and consumer expectations are evolved to derive sentiment indicators for 19 European economies. In the second step, the sentiment indicators are iteratively re-computed and combined each period to forecast yearly growth rates. To assess the performance of the proposed approach, we have designed two out-of-sample experiments: a nowcasting exercise in which we recursively generate estimates of GDP at the end of each quarter using the latest survey data available, and an iterative forecasting exercise for different forecast horizons We found that forecasts generated with the sentiment indicators outperform those obtained with time series models. These results show the potential of the methodology as a predictive tool.
Keywords:	Forecasting, Economic growth, Business and consumer expectations, Symbolic regression, Evolutionary algorithms, Genetic programming. JEL classification: C51, C55, C63, C83, C93
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:aqr:wpaper:202101&r=

FORECASTING RUSSIAN CPI WITH DATA VINTAGES AND MACHINE LEARNING TECHNIQUES

By:	Denis Shibitov (Bank of Russia, Russian Federation); Mariam Mamedli (Bank of Russia, Russian Federation)
Abstract:	We show, how the forecasting performance of models varies, when certain inaccuracies in the pseudo real-time experiment take place. We consider the case of Russian CPI forecasting and estimate several models on not seasonally adjusted data vintages. Particular attention is paid to the availability of the variables at the moment of forecast: we take into account the release timing of the series and the corresponding release delays, in order to reconstruct the forecasting in real-time. In the series of experiments, we quantify how each of these issues affect the out-of-sample error. We illustrate, that the neglect of the release timing generally lowers the errors. The same is true for the use of seasonally adjusted data. The impact of the data vintages depends on the model and forecasting period. The overall effect of all three inaccuracies varies from 8% to 17% depending on the forecasting horizon. This means, that the actual forecasting error can be significantly underestimated, when inaccurate pseudo real-time experiment is run. We underline the need to take these aspects into account, when the real-time forecasting is considered.
Keywords:	inflation, pseudo real-time forecasting, data vintages, machine learning, neural networks.
JEL:	C14 C45 C51 C53
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:bkr:wpaper:wps70&r=

A Machine Learning Approach to Analyze and Support Anti-Corruption Policy

By:	Elliott Ash; Sergio Galletta; Tommaso Giommoni
Abstract:	Can machine learning support better governance? In the context of Brazilian municipalities, 2001-2012, we have access to detailed accounts of local budgets and audit data on the associated fiscal corruption. Using the budget variables as predictors, we train a tree-based gradient-boosted classifier to predict the presence of corruption in held-out test data. The trained model, when applied to new data, provides a prediction-based measure of corruption that can be used for new empirical analysis or to support policy responses. We validate the empirical usefulness of this measure by replicating and extending some previous empirical evidence on corruption issues in Brazil. We then explore how the predictions can be used to support policies toward corruption. Our policy simulations show that, relative to the status quo policy of random audits, a targeted policy guided by the machine predictions could detect almost twice as many corrupt municipalities for the same audit rate. Similar gains can be achieved for a politically neutral targeting policy that equalizes audit rates across political parties.
Keywords:	algorithmic decision-making, corruption policy, local public finance
JEL:	D73 E62 K14 K42
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_9015&r=

Power, Hate Speech, Machine Learning, and Intersectional Approach

By:	Kim, Jae Yeon (University of California, Berkeley)
Abstract:	The advent of social media has increased digital content and, with it, hate speech. Advancements in machine learning algorithms help detect online hate speech at scale; nevertheless, these systems are far from perfect. Human-annotated hate speech data, used to train automated hate speech detection systems, is susceptible to racial/ethnic, gender, and other bias. To address societal and historical biases in automated hate speech detection, scholars and practitioners need to focus on the power dynamics: who decides what comprises hate speech. Examining inter- and intra-group dynamics can facilitate understanding of this causal mechanism. This intersectional approach deepens knowledge of the limitations of automated hate speech detection systems and bridges social science and machine learning literature on biases and fairness.
Date:	2021–04–10
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:chvgp&r=

Interpretability in deep learning for finance: a case study for the Heston model

By:	Damiano Brigo; Xiaoshan Huang; Andrea Pallavicini; Haitz Saez de Ocariz Borde
Abstract:	Deep learning is a powerful tool whose applications in quantitative finance are growing every day. Yet, artificial neural networks behave as black boxes and this hinders validation and accountability processes. Being able to interpret the inner functioning and the input-output relationship of these networks has become key for the acceptance of such tools. In this paper we focus on the calibration process of a stochastic volatility model, a subject recently tackled by deep learning algorithms. We analyze the Heston model in particular, as this model's properties are well known, resulting in an ideal benchmark case. We investigate the capability of local strategies and global strategies coming from cooperative game theory to explain the trained neural networks, and we find that global strategies such as Shapley values can be effectively used in practice. Our analysis also highlights that Shapley values may help choose the network architecture, as we find that fully-connected neural networks perform better than convolutional neural networks in predicting and interpreting the Heston model prices to parameters relationship.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.09476&r=

The Effect of Sport in Online Dating: Evidence from Causal Machine Learning

By:	Boller, Daniel (University of St. Gallen); Lechner, Michael (University of St. Gallen); Okasa, Gabriel (University of St. Gallen)
Abstract:	Online dating emerged as a key platform for human mating. Previous research focused on socio-demographic characteristics to explain human mating in online dating environments, neglecting the commonly recognized relevance of sport. This research investigates the effect of sport activity on human mating by exploiting a unique data set from an online dating platform. Thereby, we leverage recent advances in the causal machine learning literature to estimate the causal effect of sport frequency on the contact chances. We find that for male users, doing sport on a weekly basis increases the probability to receive a first message from a woman by 50%, relatively to not doing sport at all. For female users, we do not find evidence for such an effect. In addition, for male users the effect increases with higher income.
Keywords:	online dating, sports economics, big data, causal machine learning, effect heterogeneity, Modified Causal Forest
JEL:	J12 Z29 C21 C45
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp14259&r=all

Predicting Inflation with Neural Networks

By:	Paranhos, Livia (University of Warwick)
Abstract:	This paper applies neural network models to forecast inflation. The use of a particular recurrent neural network, the long-short term memory model, or LSTM, that summarizes macroeconomic information into common components is a major contribution of the paper. Results from an exercise with US data indicate that the estimated neural nets usually present better forecasting performance than standard benchmarks, especially at long horizons. The LSTM in particular is found to outperform the traditional feed-forward network at long horizons, suggesting an advantage of the recurrent model in capturing the long-term trend of inflation. This finding can be rationalized by the so called long memory of the LSTM that incorporates relatively old information in the forecast as long as accuracy is improved, while economizing in the number of estimated parameters. Interestingly, the neural nets containing macroeconomic information capture well the features of inflation during and after the Great Recession, possibly indicating a role for nonlinearities and macro information in this episode. The estimated common components used in the forecast seem able to capture the business cycle dynamics, as well as information on prices.
Keywords:	forecasting ; inflation ; neural networks ; deep learning ; LSTM model
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:wrk:warwec:1344&r=

Accuracies of some Learning or Scoring Models for Credit Risk Measurement

By:	Salomey Osei (AMMI - African Masters of Machine Intelligence); Berthine Nyunga Mpinda (AMMI - African Masters of Machine Intelligence); Jules Sadefo Kamdem (MRE - Montpellier Recherche en Economie - UM - Université de Montpellier); Jeremiah Fadugba (AMMI - African Masters of Machine Intelligence)
Abstract:	Given the role played by banks in the financial system as well, risks are subject to regulatory attention, and Credit risk is one of the major financial risks faced by banks. According to Basel I to III, banks have the responsibility to implement the credit risk strategy. Nowadays, machine learning techniques have attracted an important interest for different applications to financial institutions and its applications have received much attention from investors and researchers. Hence in this paper, we discuss existing literature by shedding more light on a number of techniques and examine machine learning models for Credit risk by focusing on Multi-Layer Perceptron (MLP) and Convolutional Neural Networks (CNN) for credit risk. Different test performances of these models such as back-testing and stress-testing have been done using Home Credit historical data and simulated data respectively. We realized that the MLP and CNN models were able to predict well with an accuracy of 91% and 67% respectively for back-testing. To test our models in stress scenarios and extreme scenarios, we consider a generated imbalanced data with 80% of defaults and 20% of non-default. Using the same model trained on Home Credit data, we perform a stress-test on the simulated data and we realized that the MLP model did not perform well compared to the CNN model, with an accuracy of 43% as against 89% obtained during the training. Thus, the CNN model was able to perform better during stressed situations for accuracy and for other metrics such as ROC AUC curve, recall, and precision.
Keywords:	Model Accuracy,Machine Learning,Credit Risk,Basel III,Risk Management
Date:	2021–03
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03194081&r=

Система Galymzhan: online-оценка потребительской инфляции в Казахстане // Galymzhan System: Online Assessment of Consumer Inflation in Kazakhstan

By:	Тулеуов Олжас // Tuleuov Olzhas (National Bank of Kazakhstan); Ержан Ислам // Yerzhan Islam (National Bank of Kazakhstan); Сейдахметов Ансар // Seidakhmetov Ansar (National Bank of Kazakhstan)
Abstract:	В данной работе описаны методология и результаты построения высокочастотного прокси-показателя инфляции Казахстана в Национальном Банке посредством использования технологии веб-скрепинга, подразумевающей автоматическое получение данных путем их извлечения c веб-страниц, реализованным с помощью программного алгоритма. // This Paper describes the methodology and outcomes of designing a high-frequency inflation proxy in Kazakhstan at the National Bank by using the web scrapping technology, which implies the automated data generation by deriving the data from web pages implemented with a help of a software algorithm.
Keywords:	inflation, web scrapping, Galymzhan, online store, prices, big data, CPI, parsing, consumer goods, инфляция, веб-скрепинг, Galymzhan, онлайн магазин, цены, big data, ИПЦ, парсинг, потребительские товары
JEL:	E31 E37 E39
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:aob:wpaper:18&r=

Adaptive learning for financial markets mixing model-based and model-free RL for volatility targeting

By:	Eric Benhamou; David Saltiel; Serge Tabachnik; Sui Kai Wong; Fran\c{c}ois Chareyron
Abstract:	Model-Free Reinforcement Learning has achieved meaningful results in stable environments but, to this day, it remains problematic in regime changing environments like financial markets. In contrast, model-based RL is able to capture some fundamental and dynamical concepts of the environment but suffer from cognitive bias. In this work, we propose to combine the best of the two techniques by selecting various model-based approaches thanks to Model-Free Deep Reinforcement Learning. Using not only past performance and volatility, we include additional contextual information such as macro and risk appetite signals to account for implicit regime changes. We also adapt traditional RL methods to real-life situations by considering only past data for the training sets. Hence, we cannot use future information in our training data set as implied by K-fold cross validation. Building on traditional statistical methods, we use the traditional "walk-forward analysis", which is defined by successive training and testing based on expanding periods, to assert the robustness of the resulting agent. Finally, we present the concept of statistical difference's significance based on a two-tailed T-test, to highlight the ways in which our models differ from more traditional ones. Our experimental results show that our approach outperforms traditional financial baseline portfolio models such as the Markowitz model in almost all evaluation metrics commonly used in financial mathematics, namely net performance, Sharpe and Sortino ratios, maximum drawdown, maximum drawdown over volatility.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.10483&r=

Herramientas de Google para la predicción de variables económicas. Una aplicación al Índice Compuesto Coincidente de Actividad Económica de la Provincia de Santa Fe (ICASFe)

By:	Ramiro Emmanuel Jorge
Abstract:	El paper internaliza información proveniente de las herramientas Google Trends y Google Correlate con el objetivo de predecir de manera oportuna el valor del Índice Compuesto Coincidente de Actividad Económica de la Provincia de Santa Fe (ICASFe), indicador que se publica con dos meses de rezago. Para esto, se identifican aquellos términos cuyos patrones de búsqueda tienen mayor correlación con el ICASFe y luego se plantea un método de agregación para incorporarlos la serie target. Las estimaciones obtenidas con el modelo son contrastadas con datos reales de la serie target (ex post). Los resultados indican que las herramientas y el procedimiento adoptado permiten realizar una estimación consistente y ganar oportunidad respecto a las publicaciones oficiales.
Keywords:	Cycles, nowcast, big data, Google tools
JEL:	E27 E32
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:aep:anales:4360&r=all

Deep Reinforcement Learning in a Monetary Model

By:	Mingli Chen; Andreas Joseph; Michael Kumhof; Xinlei Pan; Rui Shi; Xuan Zhou
Abstract:	We propose using deep reinforcement learning to solve dynamic stochastic general equilibrium models. Agents are represented by deep artificial neural networks and learn to solve their dynamic optimisation problem by interacting with the model environment, of which they have no a priori knowledge. Deep reinforcement learning offers a flexible yet principled way to model bounded rationality within this general class of models. We apply our proposed approach to a classical model from the adaptive learning literature in macroeconomics which looks at the interaction of monetary and fiscal policy. We find that, contrary to adaptive learning, the artificially intelligent household can solve the model in all policy regimes.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.09368&r=

Applications of Machine Learning in Mental Healthcare

By:	Davcheva, Elena
Abstract:	This thesis summarizes three studies in the area of machine learning applications within mental heathcare, specifically in the area of treatments and diagnostics. Mental healthcare today is challenging to provide worldwide because of a stark rise in demand for services. Traditional healthcare structures cannot keep up with the demand and information systems have the potential to fill in this gap. The thesis explores online mental health forums as a digital mental health platform and the possibility to automate treatments and diagnostics based on user-shared information.
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:dar:wpaper:126173&r=

Automatic Double Machine Learning for Continuous Treatment Effects

By:	Sylvia Klosin
Abstract:	In this paper, we introduce and prove asymptotic normality for a new nonparametric estimator of continuous treatment effects. Specifically, we estimate the average dose-response function - the expected value of an outcome of interest at a particular level of the treatment level. We utilize tools from both the double debiased machine learning (DML) and the automatic double machine learning (ADML) literatures to construct our estimator. Our estimator utilizes a novel debiasing method that leads to nice theoretical stability and balancing properties. In simulations our estimator performs well compared to current methods.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.10334&r=

Micro-Estimates of Wealth for all Low- and Middle-Income Countries

By:	Guanghua Chi; Han Fang; Sourav Chatterjee; Joshua E. Blumenstock
Abstract:	Many critical policy decisions, from strategic investments to the allocation of humanitarian aid, rely on data about the geographic distribution of wealth and poverty. Yet many poverty maps are out of date or exist only at very coarse levels of granularity. Here we develop the first micro-estimates of wealth and poverty that cover the populated surface of all 135 low and middle-income countries (LMICs) at 2.4km resolution. The estimates are built by applying machine learning algorithms to vast and heterogeneous data from satellites, mobile phone networks, topographic maps, as well as aggregated and de-identified connectivity data from Facebook. We train and calibrate the estimates using nationally-representative household survey data from 56 LMICs, then validate their accuracy using four independent sources of household survey data from 18 countries. We also provide confidence intervals for each micro-estimate to facilitate responsible downstream use. These estimates are provided free for public use in the hope that they enable targeted policy response to the COVID-19 pandemic, provide the foundation for new insights into the causes and consequences of economic development and growth, and promote responsible policymaking in support of the Sustainable Development Goals.
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2104.07761&r=

Forecasting Oil and Gold Volatilities with Sentiment Indicators Under Structural Breaks

By:	Jiawen Luo (School of Business Administration, South China University of Technology, Guangzhou, China); Riza Demirer (Department of Economics and Finance, Southern Illinois University Edwardsville, Edwardsville, IL 62026-1102, USA); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield, 0028, South Africa); Qiang Ji (Institutes of Science and Development, Chinese Academy of Sciences, Beijing, China)
Abstract:	This paper contributes to the literature on forecasting the realized volatility of oil and gold by (i) utilizing the Infinite Hidden Markov (IHM) switching model within the Heterogeneous Autoregressive (HAR) framework to accommodate structural breaks in the data and (ii) incorporating, for the first time in the literature, various sentiment indicators that proxy for the speculative and hedging tendencies of investors in these markets as predictors in the forecasting models. We show that accounting for structural breaks and incorporating sentiment-related indicators in the forecasting model does not only improve the out-of-sample forecasting performance of volatility models but also has significant economic implications, offering improved risk-adjusted returns for investors, particularly for short-term and mid-term forecasts. We also find evidence of significant cross-market information spilling over across the oil, gold, and stock markets that also contributes to the predictability of short-term market fluctuations due to sentiment-related factors. The results highlight the predictive role of investor sentiment-related factors in improving the forecast accuracy of volatility dynamics in commodities with the potential to also yield economic gains for investors in these markets.
Keywords:	Crude oil, realized volatility forecast, Infinite Hidden Markov model, structural break, speculation
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202130&r=all

The Current State of AI Governance – An EU Perspective

By:	Dempsey, Mark; McBride, Keegan; Bryson, Joanna J.
Abstract:	The rapid pace of technological advancement and innovation has put governance and regulatory mechanisms to the test. There is a clear need for new and innovative regulatory mechanisms that enable governments to successfully manage the integration of such technologies into our societies and ensure that such integration occurs in a sustainable, beneficial, and just manner. Artificial Intelligence stands out as one of the most debated such innovations. What exactly is it, how should it be built, how can it be used, and how and should it be regulated? Yet, in this debate, AI is becoming widely utilized within both existing, evolving, and bespoke regulatory contexts. The present chapter explores in particular what is arguably the most successful AI regulatory approach to date, that of the European Union. We explore core definitional concepts, shared understandings, values, and approaches currently in play. We argue that due to the so-called ‘Brussels effect’, regulatory initiatives within the European Union have a much broader global impact and, therefore, warrant close inspection.
Date:	2021–04–21
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:xu3jr&r=

Estimating the Causal Effects of Cruise Traffic on Air Pollution using Randomization-Based Inference

By:	Zabrocki, Léo (Paris School of Economics - EHESS); Leroutier, Marion; Bind, Marie-Abèle
Abstract:	Local environmental organizations and media have recently expressed concerns over air pollution induced by maritime traffic and its potential adverse health effects on the population of Mediterranean port cities. We explore this issue with unique high-frequency data from Marseille, France’s largest port for cruise ships, over the 2008- 2018 period. Using a new pair-matching algorithm designed for time series data, we create hypothetical randomized experiments and estimate the variation in air pollutant concentrations caused by a short-term increase in cruise vessel traffic. We carry out a randomization-based approach to compute 95% Fisherian intervals (FI) for constant treatment effects consistent with the matched data and the hypothetical intervention. At the hourly level, cruise vessels’ arrivals increase concentrations of nitrogen dioxide (NO2) by 4.7 μg/m³ (95% FI: [1.4, 8.0]), of sulfur dioxide (SO2) by 1.2 μg/m³ (95% FI: [-0.1, 2.5]), and of particulate matter (PM10) by 4.6 μg/m³ (95% FI: [0.9, 8.3]). At the daily level, cruise traffic increases concentrations of NO2 by 1.2 μg/m³ (95% FI: [-0.5, 3.0]) and of PM10 by 1.3 μg/m³ (95% FI: [-0.3, 3.0]). Our results suggest that well-designed hypothetical randomized experiments provide a principled approach to better understand the negative externalities of maritime traffic.
Date:	2021–04–18
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:v7ctk&r=

fsdaSAS: a package for robust regression for very large datasets including the batch forward search

By:	Torti, Francesca; Corbellini, Aldo; Atkinson, Anthony C.
Abstract:	The forward search (FS) is a general method of robust data fitting that moves smoothly from very robust to maximum likelihood estimation. The regression procedures are included in the MATLAB toolbox FSDA. The work on a SAS version of the FS originates from the need for the analysis of large datasets expressed by law enforcement services operating in the European Union that use our SAS software for detecting data anomalies that may point to fraudulent customs returns. Specific to our SAS implementation, the fsdaSAS package, we describe the approximation used to provide fast analyses of large datasets using an FS which progresses through the inclusion of batches of observations, rather than progressing one observation at a time. We do, however, test for outliers one observation at a time. We demonstrate that our SAS implementation becomes appreciably faster than the MATLAB version as the sample size increases and is also able to analyse larger datasets. The series of fits provided by the FS leads to the adaptive data-dependent choice of maximally efficient robust estimates. This also allows the monitoring of residuals and parameter estimates for fits of differing robustness levels. We mention that our fsdaSAS also applies the idea of monitoring to several robust estimators for regression for a range of values of breakdown point or nominal efficiency, leading to adaptive values for these parameters. We have also provided a variety of plots linked through brushing. Further programmed analyses include the robust transformations of the response in regression. Our package also provides the SAS community with methods of monitoring robust estimators for multivariate data, including multivariate data transformations.
Keywords:	approximate analysis; big data; linked plots; monitoring; robust regression
JEL:	C1
Date:	2021–04–18
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:109895&r=

Refugee influx and economic activity: evidence from Rohingya refugee camps in Bangladesh

By:	José Joaquín Endara
Abstract:	Using nighttime lights data and the location of historically important markets for host populations in Southern Bangladesh, we assess the impact of the sudden refugee influx in August 2017 in the economic activity for the local community. Using a difference in difference estimation, we find that a sudden refugee influx produced an increase of 24% in economic activity in host markets within 5 kilometers of refugee camps. The results are robust to different specifications, and we include as controls the population around markets from the High-Resolution Settlement Layer by CIENSIN and Facebook and travel times through the local road networks. We argue that the refugee influx plus the humanitarian response are responsible for this effect. This paper contributes to the literature documenting the impacts of refugees on host communities.
Keywords:	Refugee impacts, Forced migration impacts, Nighttime lights, Difference in Difference, Rohingyas, Travel times, High Resolution Settlement Layer
JEL:	O15 O12 R23 D62
Date:	2020–11
URL:	http://d.repec.org/n?u=RePEc:aep:anales:4341&r=all

Assessing the Impact of COVID-19 on Trade: a Machine Learning Counterfactual Analysis

By:	Dueñas, Marco; Ortiz, Víctor; Riccaboni, Massimo; Serti, Francesco
Abstract:	By interpreting exporters’ dynamics as a complex learning process, this paper constitutes the first attempt to investigate the effectiveness of different Machine Learning (ML) techniques in predicting firms’ trade status. We focus on the probability of Colombian firms surviving in the export market under two different scenarios: a COVID-19 setting and a non-COVID-19 counterfactual situation. By comparing the resulting predictions, we estimate the individual treatment effect of the COVID-19 shock on firms’ outcomes. Finally, we use recursive partitioning methods to identify subgroups with differential treatment effects. We find that, besides the temporal dimension, the main factors predicting treatment heterogeneity are interactions between firm size and industry.
Keywords:	Machine Learning; International Trade; COVID-19
JEL:	F14 F17 D22 L25
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:rie:riecdt:79&r=all

Event studies on investor sentiment

By:	Marc-Aurèle Divernois (EPFL; Swiss Finance Institute); Damir Filipović (Ecole Polytechnique Fédérale de Lausanne; Swiss Finance Institute)
Abstract:	60 million tweets are scraped from Stocktwits.com over 10 years and classified into bullish, bearish or neutral classes to create firm-individual polarity time-series. Changes in polarity are associated with changes of the same sign in contemporaneous stock returns. On average, polarity is not able to predict next day stock returns but when we focus on specific events (defined as sudden peak of tweet activity), polarity has predictive powers on abnormal returns. Finally, we show that bad events act more as surprises than good events.
Keywords:	Investor sentiment, Event study, Polarity, Social Media, Microblogging, Natural Language Processing, Crowd Wisdom
JEL:	G11 G14 C32
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp2133&r=

Adoption of digital technologies: Insights from a global survey initiative

By:	James Fudurich; Lena Suchanek; Lise Pichette
Abstract:	Firms are at the forefront of adopting new technology. Using survey data from a global network of central banks, we assess the effects of digitalization on firms’ pricing and employment decisions.
Keywords:	Firm dynamics; Inflation and prices; Labour markets
JEL:	D22 E31 J21 O33
URL:	http://d.repec.org/n?u=RePEc:bca:bocadp:21-7&r=

Information theoretic causality detection between financial and sentiment data

By:	Roberta Scaramozzino (University of Pavia); Paola Cerchiello (University of Pavia); Tomaso Aste (Computer Science Department, University College London)
Abstract:	The interaction between the flow of sentiment expressed on blogs and media and the dynamics of the stock market prices are analyzed through an information-theoretic measure, the transfer entropy, to quantify causality relations. We analyzed daily stock price and daily social media sentiment for the top 50 companies in the S&P index during the period from November 2018 to November 2020. We also analyzed news mentioning these companies during the same period. We found that there is a causal flux of information that links those companies. The largest fraction of significant causal links are between prices and between sentiments, but there is also significant causal information which goes both ways from sentiment to prices and from prices to sentiment. We observe that the strongest causal signal between sentiment and prices is associated with the Tech sector.
Keywords:	Information theory; Textual analysis; Transfer Entropy; Financial news; Causality; Time Series
Date:	2021–04
URL:	http://d.repec.org/n?u=RePEc:pav:demwpp:demwp0202&r=all

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.