nep-big 2022-05-30 papers

on Big Data

Issue of 2022‒05‒30
twenty-two papers chosen by
Tom Coupé
University of Canterbury

Is your machine better than you? You may never know By Francis de Véricourt; Huseyin Gurkan
Fair Governance with Humans and Machines By Yoan Hermstrüwer; Pascal Langenbach
Supervised machine learning classification for short straddles on the S&P500 By Alexander Brunhuemer; Lukas Larcher; Philipp Seidl; Sascha Desmettre; Johannes Kofler; Gerhard Larcher
Discovering material information using hierarchical Reformer model on financial regulatory filings By Francois Mercier; Makesh Narsimhan
Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets By Pasquale Maddaloni; Davide Nicola Continanza; Andrea del Monaco; Daniele Figoli; Marco di Lucido; Filippo Quarta; Giuseppe Turturiello
A Neural Network Approach to the Environmental Kuznets Curve By Mikkel Bennedsen; Eric Hillebrand; Sebastian Jensen
Assessment of Support Vector Machine performance for default prediction and credit rating By Karim Amzile; Mohamed Habachi
Application of the XGBoost algorithm and Bayesian optimization for the Bitcoin price prediction during the COVID-19 period By Jakub Drahokoupil
Modeling dynamic volatility under uncertain environment with fuzziness and randomness By Xianfei Hui; Baiqing Sun; Yan Zhou
The Determinants of Risk Weighted Asset in Europe By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Matarrese, Marco Maria
Big data analytics application in multi-criteria decision making: the case of eWallet adoption By Babak Naysary; Mehdi Malekzadeh; Ruth Tacneng; Amine Tarazi
The impact of Artficial Intelligence and how it is shaping banking By Theuri, Joseph; Olukuru, John
High Performance Export Portfolio: Design Growth-Enhancing Export Structure with Machine Learning By Ms. Natasha X Che; Xuege Zhang
The response of illegal mining to revealing its existence By Saavedra, Santiago
Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets By Masanori Hirano; Hiroki Sakaji; Kiyoshi Izumi
The response of illegal mining to revealing its existence By Saavedra, S
Fuzzy Expert System for Stock Portfolio Selection: An Application to Bombay Stock Exchange By Gour Sundar Mitra Thakur; Rupak Bhattacharyyab; Seema Sarkar
Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump By Sihao Hu; Zhen Zhang; Shengliang Lu; Bingsheng He; Zhao Li
Measuring Firm Activity from Outer Space By Katarzyna Anna Bilicka; André Seidel
Local Gaussian process extrapolation for BART models with applications to causal inference By Meijiang Wang; Jingyu He; P. Richard Hahn
Towards a “Text as Data” Approach in the History of Economics: An Application to Adam Smith’s Classics By Ballandonne, Matthieu; Cersosimo, Igor
What COVID-19 May Leave Behind: Technology-Related Job Postings in Canada By Bellatin, Alejandra; Galassi, Gabriela

Is your machine better than you? You may never know

By:	Francis de Véricourt (ESMT European School of Management and Technology GmbH); Huseyin Gurkan (ESMT European School of Management and Technology GmbH)
Abstract:	Artificial intelligence systems are increasingly demonstrating their capacity to make better predictions than human experts. Yet, recent studies suggest that professionals sometimes doubt the quality of these systems and overrule machine-based prescriptions. This paper explores the extent to which a decision maker (DM) supervising a machine to make high-stake decisions can properly assess whether the machine produces better recommendations. To that end, we study a set-up, in which a machine performs repeated decision tasks (e.g., whether to perform a biopsy) under the DM’s supervision. Because stakes are high, the DM primarily focuses on making the best choice for the task at hand. Nonetheless, as the DM observes the correctness of the machine’s prescriptions across tasks, she updates her belief about the machine. However, the DM observes the machine’s correctness only if she ultimately decides to act on the task. Further, the DM sometimes overrides the machine depending on her belief, which affects learning. In this set-up, we characterize the evolution of the DM’s belief and overruling decisions over time. We identify situations under which the DM hesitates forever whether the machine is better, i.e., she never fully ignores but regularly overrules it. Moreover, the DM sometimes wrongly believes with positive probability that the machine is better. We fully characterize the conditions under which these learning failures occur and explore how mistrusting the machine affects them. Our results highlight some fundamental limitations in determining whether machines make better decisions than experts and provide a novel explanation for human-machine complementarity.
Keywords:	machine accuracy, decision making, human-in-the-loop, algorithm aversion, dynamic learning
Date:	2022–05–23
URL:	http://d.repec.org/n?u=RePEc:esm:wpaper:esmt-22-02&r=

Fair Governance with Humans and Machines

By:	Yoan Hermstrüwer (Max Planck Institute for Research on Collective Goods, Bonn); Pascal Langenbach (Max Planck Institute for Research on Collective Goods, Bonn)
Abstract:	How fair are government decisions based on algorithmic predictions? And to what extent can the government delegate decisions to machines without sacrificing procedural fairness? Using a set of vignettes in the context of predictive policing, school admissions, and refugee-matching, we explore how different degrees of human-machine interaction affect fairness perceptions an procedural preferences. We implement four treatments varying the extent of responsibility delegation to the machine and the degree of human involvement in the decision-making process, ranging from full human discretion, machine-based predictions with high human involvement, machine-based predictions with low human involvement, and fully machine-based decisions. We find that machine-based predictions with high human involvement yield the highest and fully machine-based decisions the lowest fairness scores. Different accuracy assessments can partly explain these differences. Fairness scores follow a similar pattern across contexts, with a negative level effect and lower fairness perceptions of human decisions in the context of predictive policing. Our results shed light on the behavioral foundations of several legal human-in-the-loop rules.
Keywords:	algorithms, predictive policing, school admissions, refugee-matching, fairness
Date:	2022–05–24
URL:	http://d.repec.org/n?u=RePEc:mpg:wpaper:2022_04&r=

Supervised machine learning classification for short straddles on the S&P500

By:	Alexander Brunhuemer; Lukas Larcher; Philipp Seidl; Sascha Desmettre; Johannes Kofler; Gerhard Larcher
Abstract:	In this working paper we present our current progress in the training of machine learning models to execute short option strategies on the S&P500. As a first step, this paper is breaking this problem down to a supervised classification task to decide if a short straddle on the S&P500 should be executed or not on a daily basis. We describe our used framework and present an overview over our evaluation metrics on different classification models. In this preliminary work, using standard machine learning techniques and without hyperparameter search, we find no statistically significant outperformance to a simple "trade always" strategy, but gain additional insights on how we could proceed in further experiments.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.13587&r=

Discovering material information using hierarchical Reformer model on financial regulatory filings

By:	Francois Mercier; Makesh Narsimhan
Abstract:	Most applications of machine learning for finance are related to forecasting tasks for investment decisions. Instead, we aim to promote a better understanding of financial markets with machine learning techniques. Leveraging the tremendous progress in deep learning models for natural language processing, we construct a hierarchical Reformer ([15]) model capable of processing a large document level dataset, SEDAR, from canadian financial regulatory filings. Using this model, we show that it is possible to predict trade volume changes using regulatory filings. We adapt the pretraining task of HiBERT ([36]) to obtain good sentence level representations using a large unlabelled document dataset. Finetuning the model to successfully predict trade volume changes indicates that the model captures a view from financial markets and processing regulatory filings is beneficial. Analyzing the attention patterns of our model reveals that it is able to detect some indications of material information without explicit training, which is highly relevant for investors and also for the market surveillance mandate of financial regulators.
Date:	2022–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.05979&r=

Stacking machine-learning models for anomaly detection: comparing AnaCredit to other banking datasets

By:	Pasquale Maddaloni (Bank of Italy); Davide Nicola Continanza (Bank of Italy); Andrea del Monaco (Bank of Italy); Daniele Figoli (Bank of Italy); Marco di Lucido (Bank of Italy); Filippo Quarta (Bank of Italy); Giuseppe Turturiello (Bank of Italy)
Abstract:	This paper addresses the issue of assessing the quality of granular datasets reported by banks via machine learning models. In particular, it investigates how supervised and unsupervised learning algorithms can exploit patterns that can be recognized in other data sources dealing with similar phenomena (although these phenomena are available at a different level of aggregation), in order to detect potential outliers to be submitted to banks for their own checks. The above machine learning algorithms are finally stacked in a semi-supervised fashion in order to enhance their individual outlier detection ability. The described methodology is applied to compare the granular AnaCredit dataset, firstly with the Balance Sheet Items statistics (BSI), and secondly with the harmonised supervisory statistics of the Financial Reporting (FinRep), which are compiled for the Eurosystem and the Single Supervisory Mechanism, respectively. In both cases, we show that the performance of the stacking technique, in terms of F1-score, is higher than in each algorithm alone.
Keywords:	banking data, data quality management, outlier and anomaly detection, machine learning, auto-encoder, robust regression, pseudo labelling
JEL:	C18 C81 G21
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:bdi:opques:qef_689_22&r=

A Neural Network Approach to the Environmental Kuznets Curve

By:	Mikkel Bennedsen (Aarhus University and CREATES); Eric Hillebrand (Aarhus University and CREATES); Sebastian Jensen (Aarhus University and CREATES)
Abstract:	We investigate the relationship between per capita gross domestic product and per capita carbon dioxide emissions using national-level panel data for the period 1960-2018. We propose a novel semiparametric panel data methodology that combines country and time fixed effects with a nonparametric neural network regression component. Globally and for the regions OECD and Asia, we find evidence of an inverse U-shaped relationship, often referred to as an environmental Kuznets curve (EKC). For OECD, the EKC-shape disappears when using consumption-based emissions data, suggesting the EKC-shape observed for OECD is driven by emissions exports. For Asia, the EKC-shape becomes even more pronounced when using consumption-based emissions data and exhibits an earlier turning point. JEL classifcation: C14, C23, C45, C51, C52, C53 Key words: Territorial carbon dioxide emissions, Consumption-based carbon dioxide emissions, Environmental Kuznets curve, Climate econometrics, Panel data, Machine learning, Neural networks
Date:	2022–05–24
URL:	http://d.repec.org/n?u=RePEc:aah:create:2022-09&r=

Assessment of Support Vector Machine performance for default prediction and credit rating

By:	Karim Amzile (Université Mohammed V); Mohamed Habachi (Université Mohammed V)
Abstract:	Predicting the creditworthiness of bank customers is a major concern for banking institutions, as modeling the probability of default is a key focus of the Basel regulations. Practitioners propose different default modeling techniques such as linear discriminant analysis, logistic regression, Bayesian approach, and artificial intelligence techniques. The performance of the default prediction is evaluated by the Receiver Operating Characteristic (ROC) curve using three types of kernels, namely, the polynomial kernel, the linear kernel and the Gaussian kernel. To justify the performance of the model, the study compares the prediction of default by the support vector with the logistic regression using data from a portfolio of particular bank customers. The results of this study showed that the model based on the Support Vector Machine approach with the Radial Basis Function kernel, performs better in prediction, compared to the logistic regression model, with a value of the ROC curve equal to 98%, against 71.7% for the logistic regression model. Also, this paper presents the conception of a support vector machine-based rating tool designed to classify bank customers and determine their probability of default. This probability has been computed empirically and represents the proportion of defaulting customers in each class.
Keywords:	bank,credit risk,data mining,probability of default,scoring,artificial intelligence
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:hal:journl:halshs-03643738&r=

Application of the XGBoost algorithm and Bayesian optimization for the Bitcoin price prediction during the COVID-19 period

By:	Jakub Drahokoupil
Abstract:	Aim of this paper is to use Machine Learning algorithm called XGBoost developed by Tianqi Chen and Carlos Guestrin in 2016 to predict future development of the Bitcoin (BTC) price and build an algorithmic trading strategy based on the predictions from the model. For the final algorithmic strategy, six XGBoost models are estimated in total, estimating following n-th day BTC Close predictions: 1,2,5,10,20,30. Bayesian optimization techniques are used twice during the development of the trading strategy. First, when appropriate hyperparameters of the XGBoost model are selected. Second, for the optimization of each model prediction weight, in order to obtain the most profitable trading strategy. The paper shows, that even though the XGBoost model has several limitations, it can fairly accurately predict future development of the BTC price, even for further predictions. The paper aims specifically for the potential of algorithmic trading during the COVID-19 period, where BTC cryptocurrency suffered extremely volatile period, reaching its new all-time highest prices as well as 50% losses during few consecutive months. The applied trading strategy shows promising results, as it beats the B&H strategy both from the perspective of total profit, Sharpe ratio or Sortino ratio.
Keywords:	XGBoost, Bayesian Optimization, Bitcoin, Algorithmic trading
JEL:	C11 C39 C61 G11
Date:	2022–03–24
URL:	http://d.repec.org/n?u=RePEc:prg:jnlwps:v:4:y:2022:id:4.006&r=

Modeling dynamic volatility under uncertain environment with fuzziness and randomness

By:	Xianfei Hui; Baiqing Sun; Yan Zhou
Abstract:	Predicting the dynamic volatility in financial market provides a promising method for risk prediction, asset pricing and market supervision. Barndorff-Nielsen and Shephard model (BN-S) model, used to capture the stochastic behavior of high-frequency time series, is an accepted stochastic volatility model with L\' evy process. Although this model is attractive and successful in theory, it needs to be improved in application. We build a new generalized BN-S model suitable for uncertain environment with fuzziness and randomness. This new model considers the delay phenomenon between price fluctuation and volatility changes, solves the problem of the lack of long-range dependence of classic models. Calculation results show that new model outperforms the classic model in volatility forecasting. Experiments on Dow Jones Industrial Average futures price data are conducted to verify feasibility and practicability of our proposed approach. Numerical examples are provided to illustrate the theoretical result. Three machine learning algorithms are applied to estimate new model parameter. Compared with the classical model, our method effectively combines the uncertain environmental characteristics, which makes the prediction of dynamic volatility more flexible and has ideal performance.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.12657&r=

The Determinants of Risk Weighted Asset in Europe

By:	Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Matarrese, Marco Maria
Abstract:	We have estimated the level of Risk Weighted Assets among 30 countries in Europe, in 30 trimesters, using data of the European Banking Authority-EBA of 139 variables. We perform an econometric model using Pooled OLS, Panel Data with Fixed Effects, Panel Data with Random Effects, Weighted Least Squares. We found that Risk Weighted Assets is negatively associated, among others, to the level of NFC loans in mining and quarrying, in public administration and defence, and in financial and insurance activities and positively associated, among others to distribution of NFC loans in human health services and social work activities, in education and the level of net fee and commission income. Furthermore, we apply a cluster analysis with the k-Means algorithm, and we find the presence of two clusters. A comparison was then made between eight different machine learning algorithms for predicting the value of the RWAs and we found that the best predictor is the linear regression. The RWA value is predicted to increase by 1.5%.
Keywords:	Financial Institutions and Services; General; Banks, Depository Institutions, Micro Finance Institutions, Mortgages; Investment Banking, Government Policy, and Regulation
JEL:	G0 G20 G21 G24 G28
Date:	2022–05–01
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112924&r=

Big data analytics application in multi-criteria decision making: the case of eWallet adoption

By:	Babak Naysary (Monash University, School of Business, Selangor, Malaysia); Mehdi Malekzadeh (Service Rocket Inc., Kuala Lumpur, Malaysia); Ruth Tacneng (LAPE - Laboratoire d'Analyse et de Prospective Economique - GIO - Gouvernance des Institutions et des Organisations - UNILIM - Université de Limoges); Amine Tarazi (LAPE - Laboratoire d'Analyse et de Prospective Economique - GIO - Gouvernance des Institutions et des Organisations - UNILIM - Université de Limoges)
Abstract:	This multidisciplinary study aims to overcome the shortcomings of traditional data collection methods used in the literature to investigate drivers of e-wallet adoption. We apply big data analytics to gather and analyze real-world data from users' sentiments and opinions available on online platforms. We use a text analytics approach to identify and categorize principal themes of concern affecting user adoption. After, we use the Analytical Hierarchy Process (AHP) technique to weigh and rank these themes and subsequently construct a structural framework for choosing the optimal e-wallet alternative in the market. Our results identify 10 clusters of e-wallet adoption drivers that can be categorized into three groups. The first group includes factors such as usefulness, ease of use, trust, risk security, and associated costs, confirming existing findings in the literature. The second group reinforces the importance of more implicit factors which existing theories fail to integrate, such as customer service, user interface, and promotional rewards. And finally, the last group comprises interoperability, highlighting the importance of e-wallet connectivity and how conveniently it performs transactions with other platforms, systems, and applications. Based on the results of clustering and the AHP model, we provide several managerial recommendations that can guide decision-making and eventually optimize the performance of e-wallets. Our study makes significant contribution by adopting a holistic, multi-criteria framework to evaluate ewallet adoption comprehensively.
Keywords:	E-wallet adoption,big data analytics,AHP,mobile payment,text mining
Date:	2022–04–06
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03632834&r=

The impact of Artficial Intelligence and how it is shaping banking

By:	Theuri, Joseph; Olukuru, John
Abstract:	The importance and adoption of Artificial Intelligence schemes in supporting business operations in risk management and spurring revenue growth continues to gain traction globally. While this has been exacerbated by the disruptions caused by COVID-19 pandemic on the traditional sources of information, its utilization remains low particularly across many countries. In advanced economies however, as AI gains popularity in banking, financial institutions (FIs) are building on their existing solutions to transform customer experiences to solve increasingly complex challenges and expectations. This paper illustrates the potential of employing AI in banking to reduce costs, including through opportunities it offers to banks to leverage algorithms on the front end to smooth customer identification and authentication, mimic live employees through chatbots and voice assistants, deepen customer relationships, and provide personalized insights and recommendations.Further, AI can also be used by banks within middle-office functions to assess risks, detect and prevent payments fraud, improve processes for anti-money laundering (AML) and perform know-your-customer (KYC) regulatory checks.The main output involves an interactive dashboard illustrating application of the descriptive and predictive analytics at a click for a given business unit of a bank.
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:zbw:kbawps:61&r=

High Performance Export Portfolio: Design Growth-Enhancing Export Structure with Machine Learning

By:	Ms. Natasha X Che; Xuege Zhang
Abstract:	This paper studies the relationship between export structure and growth performance. We design an export recommendation system using a collaborative filtering algorithm based on countries' revealed comparative advantages. The system is used to produce export portfolio recommendations covering over 190 economies and over 30 years. We find that economies with their export structure more aligned with the recommended export structure achieve better growth performance, in terms of both higher GDP growth rate and lower growth volatility. These findings demonstrate that export structure matters for obtaining high and stable growth. Our recommendation system can serve as a practical tool for policymakers seeking actionable insights on their countries’ export potential and diversification strategies that may be complex and hard to quantify.
Keywords:	export diversification, comparative advantage, machine learning, collaborative filtering, economic growth, international trade; export structure; export portfolio recommendation; export recommendation system; performance export portfolio; export potential; Exports; Comparative advantage; Export diversification; Human capital; Total factor productivity; Global; East Asia
Date:	2022–04–29
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2022/075&r=

The response of illegal mining to revealing its existence

By:	Saavedra, Santiago
Abstract:	New monitoring technologies can help curb illegal activities by reducing information asymmetries between enforcing and monitoring government agents. I created a novel dataset using machine learning predictions on satellite imagery that detects illegal mining. Then I disclosed the predictions to government agents to study the response of illegal activity. I randomly assigned municipalities to one of four groups: (1) information to the observer (local government) of potential mine locations in his jurisdiction; (2) information to the enforcer (National government) of potential mine locations; (3) information to both observer and enforcer, and (4) a control group, where I informed no one. The effect of information is relatively similar regardless of who is informed: in treated municipalities, illegal mining is reduced by 11\% in the disclosed locations and surrounding areas. However, when accounting for negative spillovers --- increases in illegal mining in areas not targeted by the information --- the net reduction is only 7\%. These results illustrate the benefits of new technologies for building state capacity and reducing illegal activity.
Keywords:	Illegal mining; Monitoring Technology; Colombia
JEL:	H26 K42 O13 O17 Q53
Date:	2022–05
URL:	http://d.repec.org/n?u=RePEc:rie:riecdt:89&r=

Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets

By:	Masanori Hirano; Hiroki Sakaji; Kiyoshi Izumi
Abstract:	This study proposes a new generative adversarial network (GAN) for generating realistic orders in financial markets. In some previous works, GANs for financial markets generated fake orders in continuous spaces because of GAN architectures' learning limitations. However, in reality, the orders are discrete, such as order prices, which has minimum order price unit, or order types. Thus, we change the generation method to place the generated fake orders into discrete spaces in this study. Because this change disabled the ordinary GAN learning algorithm, this study employed a policy gradient, frequently used in reinforcement learning, for the learning algorithm. Through our experiments, we show that our proposed model outperforms previous models in generated order distribution. As an additional benefit of introducing the policy gradient, the entropy of the generated policy can be used to check GAN's learning status. In the future, higher performance GANs, better evaluation methods, or the applications of our GANs can be addressed.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.13338&r=

The response of illegal mining to revealing its existence

By:	Saavedra, S
Abstract:	New monitoring technologies can help curb illegal activities by reducing informa- tion asymmetries between enforcing and monitoring government agents. I created a novel dataset using machine learning predictions on satellite imagery that detects illegal mining. Then I disclosed the predictions to government agents to study the response of illegal activity. I randomly assigned municipalities to one of four groups: (1) information to the observer (local government) of potential mine locations in his jurisdiction; (2) information to the enforcer (National government) of potential mine locations; (3) information to both observer and enforcer, and (4) a control group, where I informed no one. The effect of information is relatively similar regardless of who is informed: in treated municipalities, illegal mining is reduced by 11% in the disclosed locations and surrounding areas. However, when accounting for negative spillovers â€” increases in illegal mining in areas not targeted by the information â€” the net reduction is only 7%. These results illustrate the benefits of new technologies for building state capacity and reducing illegal activity.
Keywords:	Illegal mining, Monitoring Technology, Colombia
JEL:	H26 K42 O13 O17 Q53
Date:	2022–05–09
URL:	http://d.repec.org/n?u=RePEc:col:000092:020078&r=

Fuzzy Expert System for Stock Portfolio Selection: An Application to Bombay Stock Exchange

By:	Gour Sundar Mitra Thakur (Mondal); Rupak Bhattacharyyab (Mondal); Seema Sarkar (Mondal)
Abstract:	Selection of proper stocks, before allocating investment ratios, is always a crucial task for the investors. Presence of many influencing factors in stock performance have motivated researchers to adopt various Artificial Intelligence (AI) techniques to make this challenging task easier. In this paper a novel fuzzy expert system model is proposed to evaluate and rank the stocks under Bombay Stock Exchange (BSE). Dempster-Shafer (DS) evidence theory is used for the first time to automatically generate the consequents of the fuzzy rule base to reduce the effort in knowledge base development of the expert system. Later a portfolio optimization model is constructed where the objective function is considered as the ratio of the difference of fuzzy portfolio return and the risk free return to the weighted mean semi-variance of the assets that has been used. The model is solved by applying Ant Colony Optimization (ACO) algorithm by giving preference to the top ranked stocks. The performance of the model proved to be satisfactory for short-term investment period when compared with the recent performance of the stocks.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.13385&r=

Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump

By:	Sihao Hu; Zhen Zhang; Shengliang Lu; Bingsheng He; Zhao Li
Abstract:	As the pump-and-dump schemes (P&Ds) proliferate in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance, to inform potentially susceptible investors before they become victims. In this paper, we focus on the target coin prediction task, i.e., to predict the pump probability of all coins listed in the target exchange before a pump. We conduct a comprehensive study of the latest P&Ds, investigate 709 events organized in Telegram channels from Jan. 2019 to Jan. 2022, and unearth some abnormal yet interesting patterns of P&Ds. Empirical analysis demonstrates that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity, which inspires us to develop a novel sequence-based neural network named SNN. Specifically, SNN encodes each channel's pump history as a sequence representation via a positional attention mechanism, which filters useful information and alleviates the noise introduced when the sequence length is long. We also identify and address the coin-side cold-start problem in a practical setting. Extensive experiments show a lift of 1.6% AUC and 41.0% Hit Ratio@3 brought by our method, making it well-suited for real-world application. As a side contribution, we release the source code of our entire data science pipeline on GitHub, along with the dataset tailored for studying the latest P&Ds.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.12929&r=

Measuring Firm Activity from Outer Space

By:	Katarzyna Anna Bilicka; André Seidel
Abstract:	To understand how global firm networks operate, we need consistent information on their activities, unbiased by their reporting choices. In this paper, we collect a novel dataset on the light that factories emit at night for a large sample of car manufacturing plants. We show that nightlight data can measure activity at such a granular level, using annual firm financial data and high-frequency data related to Covid-19 pandemic production shocks. We use this data to quantify the extent of misreported global operations of these car manufacturing firms and examine differences between sources of nightlight.
Keywords:	multinational firms, nightlight data, global firm networks
JEL:	H32 H26 F23
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_9701&r=

Local Gaussian process extrapolation for BART models with applications to causal inference

By:	Meijiang Wang; Jingyu He; P. Richard Hahn
Abstract:	Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically provide inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts Gaussian processes to the leaf nodes in BART for predicting points outside the range of the observed data. The new method is compared to standard BART implementations and recent frequentist resampling-based methods for predictive inference. We apply the new approach to a challenging problem from causal inference, wherein for some regions of predictor space, only treated or untreated units are observed (but not both). In simulations studies, the new approach boasts superior performance compared to popular alternatives, such as Jackknife+.
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2204.10963&r=

Towards a “Text as Data” Approach in the History of Economics: An Application to Adam Smith’s Classics

By:	Ballandonne, Matthieu; Cersosimo, Igor
Abstract:	Quantitative techniques have received increasing attention in the history and methodology of economics. Nonetheless, a “text as data” approach has mostly been overlooked and its applicability to the history of economics remains to be examined. To understand what we gain from such quantitative techniques in relation to existing historical analyses, we apply some “text as data” techniques to Adam Smith’s The Theory of Moral Sentiments and The Wealth of Nations. We explore the books’ topics, styles, and sentiments. We show how word frequency analysis can be used to examine the differences between the books, shed light on conceptual discussions and reveal an important stylistic aspect, specifically Smith’s use of personal pronouns. Style analysis shows the similarities and differences in terms of lexical richness and readability between the two books. Finally, we show the limitations of a third technique, sentiment analysis, when applied to historical economic texts.
Date:	2022–04–09
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:mg3zb&r=

What COVID-19 May Leave Behind: Technology-Related Job Postings in Canada

By:	Bellatin, Alejandra (University of Toronto); Galassi, Gabriela (Bank of Canada)
Abstract:	We use data from online job postings listed on a job board to study how the demand for jobs linked to new technologies during the COVID-19 crisis responded to pandemic mitigation policies. We classify job postings into a standard occupation classification, using text analytics, and we group occupations according to their involvement in the production and use of digital technologies. We leverage the variation in the stringency of containment policies over time and across provinces. We find that when policies become more stringent, job postings in occupations that are related to digital infrastructure or that allow for remote work fare relatively better than postings in more traditional occupations. Job postings for positions in occupations with low risk of automation recover faster during reopenings than postings for more traditional occupations. Occupations typically populated by disadvantaged groups (e.g., women and low-wage workers) gather relatively few job postings if they are not linked to new technologies. We also find that cities with scarce pre-pandemic job postings related to digital technologies post fewer job ads overall when policies become more stringent.
Keywords:	COVID-19, job vacancies, technology adoption
JEL:	J23 J24 O14
Date:	2022–04
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp15209&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.