nep-big 2022-04-25 papers

on Big Data

Issue of 2022‒04‒25
23 papers chosen by
Tom Coupé
University of Canterbury

Machine learning in international trade research - evaluating the impact of trade agreements By Breinlich, Holger; Corradi, Valentina; Rocha, Nadia; Ruta, Michele; Silva, J.M.C. Santos; Zylkin, Tom
Nowcasting GDP - A Scalable Approach Using DFM, Machine Learning and Novel Data, Applied to European Economies By Mr. Jean-Francois Dauphin; Marzie Taheri Sanjani; Mrs. Nujin Suphaphiphat; Mr. Kamil Dybczak; Hanqi Zhang; Morgan Maneely; Yifei Wang
e-Government in Europe. A Machine Learning Approach By Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
Big data forecasting of South African inflation By Byron Botha; Rulof Burger; Kevin Kotze; Neil Rankin,; Daan Steenkamp
Broadband Price Index in Europe By Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
ICT Specialists in Europe By Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito; Massaro, Alessandro
Fixed Broadband Take-Up in Europe By Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
The application of techniques derived from artificial intelligence to the prediction of the solvency of bank customers: case of the application of the cart type decision tree (dt) By Karim Amzile; Rajaa Amzile
The Hidden Cost of Smoking: Rent Premia in the Housing Market By Cigdem Gedikli; Robert Hill; Oleksandr Talavera; Okan Yilmaz
The Ensemble Approach to Forecasting: A Review and Synthesis By Hao Wu; David Levinson
New Evidence on the Effect of Technology on Employment and Skill Demand By Hirvonen, Johannes; Stenhammar, Aapo; Tuhkuri, Joonas
JAQ of All Trades: Job Mismatch, Firm Productivity and Managerial Quality By Coraggio, Luca; Pagano, Marco; Scognamiglio, Annalisa; Tåg, Joacim
â€˜When a Stranger Shall Sojourn with Thee': The Impact of the Venezuelan Exodus on Colombian Labor Markets By Santamaria, J.
Automation and the changing nature of work By Josten, Cecily; Lordan, Grace
Implementing and managing Algorithmic Decision-Making in the public sector By Rocco, Salvatore
Hidden hazards and Screening Policy: Predicting Undetected Lead Exposure in Illinois Using Machine Learning By Abbasi, A; Gazze, L; Pals, B
Anomaly Detection applied to Money Laundering Detecion using Ensemble Learning By Otero Gomez, Daniel; Agudelo, Santiago Cartagena; Patiño, Andres Ospina; Lopez-Rojas, Edgar
GCNET: graph-based prediction of stock price movement using graph convolutional network By Alireza Jafari; Saman Haratizadeh
Exposición al default: estimación para un portafolio de tarjeta de crédito By Bambino-Contreras, Carlos; Morales-Oñate, Víctor
Text Mining Approaches Oriented on Customer Care Efficiency By Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Leogrande, Angelo
Text Mining Approaches Oriented on Customer Care Efficiency By Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito; Leogrande, Angelo
Information flows and the law of one price By Rui Fan; Oleksandr Talavera; Vu Tran
Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management By Ruan Pretorius; Terence van Zyl

Machine learning in international trade research - evaluating the impact of trade agreements

By:	Breinlich, Holger; Corradi, Valentina; Rocha, Nadia; Ruta, Michele; Silva, J.M.C. Santos; Zylkin, Tom
Abstract:	Modern trade agreements contain a large number of provisions in addition to tariff reductions, in areas as diverse as services trade, competition policy, trade-related investment measures, or public procurement. Existing research has struggled with overfitting and severe multicollinearity problems when trying to estimate the effects of these provisions on trade flows. Building on recent developments in the machine learning and variable selection literature, this paper proposes data-driven methods for selecting the most important provisions and quantifying their impact on trade flows, without the need of making ad hoc assumptions on how to aggregate individual provisions. The analysis finds that provisions related to antidumping, competition policy, technical barriers to trade, and trade facilitation are associated with enhancing the trade-increasing effect of trade agreements.
Keywords:	lasso; machine learning; preferential trade agreements; deep trade agreements; EST013567/1
JEL:	F14 F15 F17
Date:	2021–06–16
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:114379&r=

Nowcasting GDP - A Scalable Approach Using DFM, Machine Learning and Novel Data, Applied to European Economies

By:	Mr. Jean-Francois Dauphin; Marzie Taheri Sanjani; Mrs. Nujin Suphaphiphat; Mr. Kamil Dybczak; Hanqi Zhang; Morgan Maneely; Yifei Wang
Abstract:	This paper describes recent work to strengthen nowcasting capacity at the IMF’s European department. It motivates and compiles datasets of standard and nontraditional variables, such as Google search and air quality. It applies standard dynamic factor models (DFMs) and several machine learning (ML) algorithms to nowcast GDP growth across a heterogenous group of European economies during normal and crisis times. Most of our methods significantly outperform the AR(1) benchmark model. Our DFMs tend to perform better during normal times while many of the ML methods we used performed strongly at identifying turning points. Our approach is easily applicable to other countries, subject to data availability.
Keywords:	Nowcasting, Factor Model, Machine Learning, Large Data Sets
Date:	2022–03–11
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2022/052&r=

e-Government in Europe. A Machine Learning Approach

By:	Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
Abstract:	The following article analyzes the determinants of e-government in 28 European countries between 2016 and 2021. The DESI-Digital Economy and Society Index database was used. The econometric analysis involved the use of the Panel Data with Fixed Effects and Panel Data with Variable Effects methods. The results show that the value of “e-Government” is negatively associated with “Fast BB (NGA) coverage”, “Female ICT specialists”, “e-Invoices”, “Big data” and positively associated with “Open Data”, “e-Government Users”, “ICT for environmental sustainability”, “Artificial intelligence”, “Cloud”, “SMEs with at least a basic level of digital intensity”, “ICT Specialists”, “At least 1 Gbps take-up”, “At least 100 Mbps fixed BB take-up”, “Fixed Very High Capacity Network (VHCN) coverage”. A cluster analysis was carried out below using the unsupervised k-Means algorithm optimized with the Silhouette coefficient with the identification of 4 clusters. Finally, a comparison was made between eight different machine learning algorithms using "augmented data". The most efficient algorithm in predicting the value of e-government both in the historical series and with augmented data is the ANN-Artificial Neural Network.
Keywords:	Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
JEL:	O30 O31 O32 O33 O34
Date:	2022–03–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112242&r=

Big data forecasting of South African inflation

By:	Byron Botha; Rulof Burger; Kevin Kotze; Neil Rankin,; Daan Steenkamp
Abstract:	We investigate whether the use of machine learning techniques and big data can enhance the accuracy of inflation forecasts and our understanding of the drivers of South African inflation. We make use of a large dataset for the disaggregated prices of consumption goods and services to compare the forecasting performance of a suite of different statistical learning models to several traditional time series models. We find that the statistical learning models are able to compete with most benchmarks, but their relative performance is more impressive when the rate of inflation deviates from its steady state, as was the case during the recent COVID-19 lockdown, and where one makes use of a conditional forecasting function that allows for the use of future information relating to the evolution of the inflationary process. We find that the accuracy of the Reserve Bankâ€™s near-term inflation forecasts compare favourably to those from the models considered, reflecting the inclusion of off-model information such as electricity tariff adjustments and within-month data. Lastly, we generate Shapley values to identify the most important contributors to future inflationary pressure and provide policymakers with information about the potential sources of future inflationary pressure.
Keywords:	Micro-data, Inflation, High dimensional regression, Penalised likelihood, Bayesian methods, Statistical learning
JEL:	C10 C11 C52 C55 E31
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:rza:wpaper:873&r=

Broadband Price Index in Europe

By:	Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
Abstract:	This article analyzes the determinants of the "Broadband Price Index" in Europe. The data used refer to 28 European countries between 2014 and 2021. The database used is the Digital, Economy and Society Index-DESI of the European Commission. The data were analyzed using the following econometric techniques, namely Panel Data with Random Effects, Panel Data with Fixed Effects, Pooled OLS, WLS and Dynamic Panel. The value of the "Broadband Price Index" is positively associated with the DESI Index, and "Connectivity" while it is negatively associated with "Fixed Broadband Take Up", "Fixed Broadband Coverage", "Mobile Broadband", "e-Government", "Advanced Skills and Development", "Integration of Digital Technology", "At Least Basic Digital Skills ", "Above Basic Digital Skills "," At Least Basic Software Skills ". A cluster analysis was carried out below using the k-Means algorithm optimized with the Silhouette coefficient. The analysis revealed the existence of three clusters. Finally, an analysis of the machine learning algorithms was carried out to predict the future value of the "Broadband Price Index". The result shows that the most useful algorithm for prediction is the Artificial Neural Network-ANN with an estimated value equal to an amount of 9.21%.
Keywords:	Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation
JEL:	O3 O30 O31 O32 O33
Date:	2022–03–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112243&r=

ICT Specialists in Europe

By:	Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito; Massaro, Alessandro
Abstract:	The following article estimates the value of ICT Specialists in Europe between 2016 and 2021 for 28 European countries. The data were analyzed using the following econometric techniques, namely: Panel Data with Fixed Effects, Panel Data with Random Effects, WLS and Pooled OLS. The results show that the value of ICT Specialists in Europe is positively associated with the following variables: "Desi Index", "SMEs with at least a basic level of digital intensity", "At least 100 Mbps fixed BB take-up" and negatively associated with the following variables: "4G Coverage","5G Coverage", "5G Readiness", "Fixed broadband coverage", "e-Government", "At least Basic Digital Skills", "Fixed broadband take-up", "Broadband price index", "Integration of Digital Technology". Subsequently, two European clusters were found by value of "ICTG Specialists" using the k-Means clustering algorithm optimized by using the Silhouette coefficient. Finally, eight different machine learning algorithms were compared to predict the value of "ICT Specialists" in Europe. The results show that the best prediction algorithm is ANNArtificial Neural Network with an estimated growth value of 12.53%. Finally, "augmented data" were obtained through the use of the ANN-Artificial Neural Network, through which a new prediction was made which estimated a growing value of the estimated variable equal to 3.18%.
Keywords:	Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
JEL:	O30 O31 O32 O33 O34
Date:	2022–03–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112241&r=

Fixed Broadband Take-Up in Europe

By:	Leogrande, Angelo; Magaletti, Nicola; Cosoli, Gabriele; Massaro, Alessandro
Abstract:	In this article the value of “Fixed Broadband Take Up” in Europe is investigated. Data are collected from the DESI-Digital Economy and Society Index for 28 countries in the period 2016-2021. Data are analyzed with Panel Data with Fixed Effects and Random Effects. The Fixed Broadband Take-Up value is positively associated with the value of "Connectivity", "Human Capital", "Desi Index", "Fast BB NGA Coverage", "Fixed Very High-Capacity Network VHCN coverage". Fixed Broadband Take-Up value is negatively associated with "Digital Public Services for Businesses", "e-Government", "At least Basic Digital Skills", "At Least Basic Software Skills", "Above Basic Digital Skills", "Advanced Skills and Development", "Integration of Digital Technology", "Broadband Price Index", "Mobile Broadband", "Fixed Broadband Coverage". Subsequently the k-Means algorithm optimized by the Silhouette coefficient was used to identify the number of clusters. The analysis shows the presence of the two clusters. Eight different machine learning algorithms were then used to predict the future value of the "Fixed Broadband Take-Up in Europe". The analysis shows that the most efficient algorithm for the prediction is "ANN-Artificial Neural Network" with an estimated value of the prediction equal to 26.39%.
Keywords:	nnovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation.
JEL:	O3 O30 O31 O32 O33 O34
Date:	2022–03–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112246&r=

The application of techniques derived from artificial intelligence to the prediction of the solvency of bank customers: case of the application of the cart type decision tree (dt)

By:	Karim Amzile; Rajaa Amzile
Abstract:	In this study we applied the CART-type Decision Tree (DT-CART) method derived from artificial intelligence technique to the prediction of the solvency of bank customers, for this we used historical data of bank customers. However we have adopted the process of Data Mining techniques, for this purpose we started with a data preprocessing in which we clean the data and we deleted all rows with outliers or missing values as well as rows with empty columns, then we fixed the variable to be explained (dependent or Target) and we also thought to eliminate all explanatory (independent) variables that are not significant using univariate analysis as well as the correlation matrix, then we applied our CART decision tree method using the SPSS tool. After completing our process of building our model (AD-CART), we started the process of evaluating and testing the performance of our model, by which we found that the accuracy and precision of our model is 71%, so we calculated the error ratios, and we found that the error rate equal to 29%, this allowed us to conclude that our model at a fairly good level in terms of precision, predictability and very precisely in predicting the solvency of our banking customers.
Date:	2022–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2203.13001&r=

The Hidden Cost of Smoking: Rent Premia in the Housing Market

By:	Cigdem Gedikli (Swansea University); Robert Hill (University of Graz); Oleksandr Talavera (University of Birmingham); Okan Yilmaz (Swansea University)
Abstract:	In this paper, we provide novel evidence on the additional costs associated with smoking. While it may not be surprising that smokers pay a rent premium, we are the first to quantify the size of this premium. Our approach is innovative in that we use text mining methods that extract implicit information on landlords' attitudes to smoking directly from Zoopla UK rental listings. Applying hedonic, matching and machine-learning methods to the text-mined data, we find a positive smoking rent premium of around 6 percent. This translates into 14.40GBP of indirect costs, in addition to 40GBP of weekly spending on cigarettes estimated for an average smoker in the UK.
Keywords:	Smoking; Rental market; Hedonic regression; Matching; Text mining; Random forest; Smoking rent premium; Contracting frictions
JEL:	I30 R21 R31
Date:	2022–03
URL:	http://d.repec.org/n?u=RePEc:bir:birmec:22-06&r=

The Ensemble Approach to Forecasting: A Review and Synthesis

By:	Hao Wu; David Levinson (TransportLab, School of Civil Engineering, University of Sydney)
Abstract:	Ensemble forecasting is a modeling approach that combines data sources, models of different types, with alternative assumptions, using distinct pattern recognition methods. The aim is to use all available information in predictions, without the limiting and arbitrary choices and dependencies resulting from a single statistical or machine learning approach or a single functional form, or results from a limited data source. Uncertainties are systematically accounted for. Outputs of ensemble models can be presented as a range of possibilities, to indicate the amount of uncertainty in modeling. We review methods and applications of ensemble models both within and outside of transport research. The review finds that ensemble forecasting generally improves forecast accuracy, robustness in many fields, particularly in weather forecasting where the method originated. We note that ensemble methods are highly siloed across different disciplines, and both the knowledge and application of ensemble forecasting are lacking in transport. In this paper we review and synthesize methods of ensemble forecasting with a unifying framework, categorizing ensemble methods into two broad and not mutually exclusive categories, namely combining models, and combining data; this framework further extends to ensembles of ensembles. We apply ensemble forecasting to transport related cases, which shows the potential of ensemble models in improving forecast accuracy and reliability. This paper sheds light on the apparatus of ensemble forecasting, which we hope contributes to the better understanding and wider adoption of ensemble models.
Keywords:	Ensemble forecasting, Combining models, Data fusion, Ensembles of ensembles
JEL:	R41 C93
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:nex:wpaper:ensembleapproachforecasting&r=

New Evidence on the Effect of Technology on Employment and Skill Demand

By:	Hirvonen, Johannes; Stenhammar, Aapo; Tuhkuri, Joonas
Abstract:	Abstract We present novel evidence on the effects of advanced technologies on employment, skill demand, and firm performance. The main finding is that advanced technologies led to increases in employment and no change in skill composition. Our main research design focuses on a technology subsidy program in Finland that induced sharp increases in technology investment in manufacturing firms. Our data directly measure multiple technologies and skills and track firms and workers over time. We demonstrate novel text analysis and machine learning methods to perform matching and to measure specific technological changes. To explain our findings, we outline a theoretical framework that contrasts two types of technological change: process versus product. We document that firms used new technologies to produce new types of output rather than replace workers with technologies within the same type of production. The results contrast with the ideas that technologies necessarily replace workers or are skill biased.
Keywords:	Technology, Labor, Skills, Industrial policy
JEL:	J23 J24 O33
Date:	2022–04–11
URL:	http://d.repec.org/n?u=RePEc:rif:wpaper:93&r=

JAQ of All Trades: Job Mismatch, Firm Productivity and Managerial Quality

By:	Coraggio, Luca (University of Naples Federico II); Pagano, Marco (University of Naples Federico II, and); Scognamiglio, Annalisa (University of Naples Federico II, and); Tåg, Joacim (Research Institute of Industrial Economics (IFN))
Abstract:	Does the matching between workers and jobs help explain productivity differentials across firms? To address this question we develop a job-worker allocation quality measure (JAQ) by combining employer-employee administrative data with machine learning techniques. The proposed measure is positively and significantly associated with labor earnings over workers' careers. At firm level, it features a robust positive correlation with firm productivity, and with managerial turnover leading to an improvement in the quality and experience of management. JAQ can be constructed for any employer-employee data including workers' occupations, and used to explore the effect of corporate restructuring on workers' allocation and careers.
Keywords:	Jobs; Workers; Matching; Mismatch; Machine Learning; Productivity; Management
JEL:	D22 D23 D24 G34 J24 J31 J62 L22 L23 M12 M54
Date:	2022–04–01
URL:	http://d.repec.org/n?u=RePEc:hhs:iuiwop:1427&r=

â€˜When a Stranger Shall Sojourn with Thee': The Impact of the Venezuelan Exodus on Colombian Labor Markets

By:	Santamaria, J.
Abstract:	This paper analyzes the effect of open-door immigration policies on local labor markets. Using the sharp and unprecedented surge of Venezuelan refugees into Colombia, I study the impact on wages and employment in a context where work permits were granted at scale. To identify which labor markets immigrants are entering, I overcome limitations in offcial records and generate novel evidence of refugee settlement patterns by tracking the geographical distribution of Internet search terms that Venezuelans but not Colombians use. While offcial records suggest migrants are concentrated in a few cities, the Internet search index shows migrants are located across the country. Using this index, high-frequency labor market data, and a difference-in-differences design, I find precise null effects on employment and wages in the formal and informal sectors. A machine learning approach that compares counterfactual cities with locations most impacted by immigration yields similar results. All in all, the results suggest that open-door policies do not harm labor markets in the host community.
Keywords:	Migration; Employment; Wages; Google searches
JEL:	J61 J68 C81
Date:	2022–02–07
URL:	http://d.repec.org/n?u=RePEc:col:000561:020046&r=

Automation and the changing nature of work

By:	Josten, Cecily; Lordan, Grace
Abstract:	This study identifies the job attributes, and in particular skills and abilities, which predict the likelihood a job is recently automatable drawing on the Josten and Lordan (2020) classification of automatability, EU labour force survey data and a machine learning regression approach. We find that skills and abilities which relate to non-linear abstract thinking are those that are the safest from automation. We also find that jobs that require ‘people’ engagement interacted with ‘brains’ are also less likely to be automated. The skills that are required for these jobs include soft skills. Finally, we find that jobs that require physically making objects or physicality more generally are most likely to be automated unless they involve interaction with ‘brains’ and/or ‘people’.
Keywords:	work; automatability; job skills; job abilities; EU Labour Force Survey
JEL:	R14 J01
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:114539&r=

Implementing and managing Algorithmic Decision-Making in the public sector

By:	Rocco, Salvatore
Abstract:	This paper examines the current evolution of Artificial Intelligence (AI) systems for “algorithmic decision-making” (ADM) in the public sector (§1). In particular, it will focus on the challenges brought by such new uses of AI in the field of governance and public administration. From a review of the rising global scholarship on the matter, three strands of research are hereby expanded. First, the technical approach (§2). To close the gaps between law, policy and technology, it is indeed necessary to understand what an AI system is and why and how it can affect decision-making. Second, the legal and “algor-ethical” approach (§3). This is aimed at showing the big picture wherein the governance concerns arise – namely, the wider framework of principles and key-practices needed to secure a good use of AI in the public sector against its potential risks and misuses. Third, as the core subject of this analysis, the governance approach stricto sensu (§4). This aims to trace back the renowned issue of the “governance of AI” to essentially four major sets of challenges which ADM poses in the public management chain: (i) defining clear goals and responsibilities; (ii) gaining competency and knowledge; (iii) managing and involving stakeholders; (iv) managing and auditing risks.
Date:	2022–03–27
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:ex93w&r=

Hidden hazards and Screening Policy: Predicting Undetected Lead Exposure in Illinois Using Machine Learning

By:	Abbasi, A (University of California San Francisco); Gazze, L (University of Warwick); Pals, B (New York University)
Abstract:	Lead exposure remains a significant threat to childrenâ€™s health despite decades of policies aimed at getting the lead out of homes and neighborhoods. Generally, lead hazards are identified through inspections triggered by high blood lead levels (BLLs) in children. Yet, it is unclear how best to screen children for lead exposure to balance the costs of screening and the potential benefits of early detection, treatment, and lead hazard removal. While some states require universal screening, others employ a targeted approach, but no regime achieves 100% compliance. We estimate the extent and geographic distribution of undetected lead poisoning in Illinois. We then compare the estimated detection rate of a universal screening program to the current targeted screening policy under different compliance levels. To do so, we link 2010-2016 Illinois lead test records to 2010-2014 birth records, demographics, and housing data. We train a random forest classifier that predicts the likelihood a child has a BLL above 5Âµg/dL. We estimate that 10,613 untested children had a BLLâ‰¥5Âµg/dL in addition to the 18,115 detected cases. Due to the unequal spatial distribution of lead hazards, 60% of these undetected cases should have been screened under the current policy, suggesting limited benefits from universal screening.
Keywords:	Lead Poisoning, Environmental Health, Screening JEL Classification:
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:cge:wacage:612&r=

Anomaly Detection applied to Money Laundering Detecion using Ensemble Learning

By:	Otero Gomez, Daniel; Agudelo, Santiago Cartagena; Patiño, Andres Ospina; Lopez-Rojas, Edgar
Abstract:	Financial crime and, specifically, the illegal business of money laundering are increasing dramatically with the expansion of modern technology and global communication, resulting in the loss of billions of dollars worldwide each year. Money laundering, known as the process that transforms the proceeds of crime into clean legitime assets, is a common phenomenon that occurs around the world. Irregular obtained money is generally cleaned up thanks to transfers involving banks or companies, see Walker (1999). Hence, one of the main problems remains to find an efficient way to identify suspicious actors and transactions, in each operation attention should be focused on the type, amount, motive, frequency, and consistency with the previous activity and the geographic area. This identification must be the result of a process that cannot be based solely on individual judgments but must, at least in part, be automated. Although prevention technologies are the best way to reduce fraud, fraudsters are adaptive and, given time, will usually find ways to overcome such measures, see Perols (2011). Then, what we propose is to enrich this set of information by building an anomaly detection model in operations related to money transfer in order to benefit from the power of artificial intelligence. Now, antimoney laundering is a complex problem but We believe Artificial Intelligence can play a powerful role in this area.
Date:	2021–12–08
URL:	http://d.repec.org/n?u=RePEc:osf:osfxxx:f84ht&r=

GCNET: graph-based prediction of stock price movement using graph convolutional network

By:	Alireza Jafari; Saman Haratizadeh
Abstract:	The prediction of stocks' direction of movement using the historical price information has attracted considerable attention as a challenging problem in the field of machine learning. However, modeling and analyzing the hidden relations among stock prices as an important source of information for the prediction of their future behavior has not been explored well yet. The existing methods in this domain suffer from the lack of generality and flexibility and cannot be easily applied on any set of inter-related stocks. The main challenges in this domain are to find a way for modeling the existing relations among an arbitrary set of stocks and to exploit such a model for improving the prediction performance for those stocks. In this paper, we introduce a novel framework, called GCNET that models the relations among an arbitrary set of stocks as a graph structure called influence network and uses a set of history-based prediction models to infer plausible initial labels for a subset of the stock nodes in the graph. Finally, GCNET uses the Graph Convolutional Network algorithm to analyzes this partially labeled graph and predicts the next price direction of movement for each stock in the graph. GCNET is a general prediction framework that can be applied for the prediction of the price fluctuations for any set of interacting stocks based on their historical data. Our experiments and evaluations on sets of stocks from S\&P500 and NASDAQ show that GCNET significantly improves the performance of SOTA in terms of accuracy and MCC measures.
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2203.11091&r=

Exposición al default: estimación para un portafolio de tarjeta de crédito

By:	Bambino-Contreras, Carlos; Morales-Oñate, Víctor
Abstract:	This work estimates the exposure at default of a credit card portfolio of an Ecuadorian bank without using the credit conversion factor, a common mechanism used in the expected loss distribution estimation literature and suggested by the Basel Committee. To achieve this goal, the probability distribution of this variable (exposure at default) has been identified so that it can be used in the context of generalized linear models. The results show that the model can be used to make predictions based on assumptions closer to the reality of customer behavior based on the variables used in the regression.
Keywords:	Expected loss, Credit risk, Exposure at default, Generalized linear models, Gamma Distribution, Machine Learning
JEL:	C1 G32
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112333&r=

Text Mining Approaches Oriented on Customer Care Efficiency

By:	Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito O. M.; Leogrande, Angelo
Abstract:	In the proposed work is performed a text classification for a chatbot application used by a company working in assistance services of automatic warehouses. industries. Specifically, text mining technique is adopted for the classification of questions and answers. Business Process Modeling Notation (BPMN) models describe the passage from “AS-IS” to “TO BE” in the context of the analyzed industry, by focusing the attention mainly on customer and technical support services where chatbot is adopted. A two-step process model is used to connect technological improvements and relationship marketing in chatbot assistance: the first step provides the hierarchical clustering able to classify questions and answers through Latent Dirichlet Allocation -LDA- algorithm, and the second one executes the Tag Cloud representing the visual representation of more frequent words contained in the experimental dataset. Tag cloud is used to show the critical issues that customers find in the usage of the proposed service. By considering an initial dataset, 24 hierarchical clusters are found representing the preliminary combination of the couple’s question-answer. The proposed approach is suitable to automatically construct a combination of chatbot questions and appropriate answers in intelligent systems.
Keywords:	Chatbot, Speech Recognition, Natural Language Processing-NLP, Hierarchical Clustering, Business Process Management and Notation-BPMN
JEL:	O30 O31 O32 O33 O34
Date:	2022–03–08
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112300&r=

Text Mining Approaches Oriented on Customer Care Efficiency

By:	Massaro, Alessandro; Magaletti, Nicola; Cosoli, Gabriele; Giardinelli, Vito; Leogrande, Angelo
Abstract:	In the proposed work is performed a text classification for a chatbot application used by a company working in assistance services of automatic warehouses. industries. Specifically, text mining technique is adopted for the classification of questions and answers. Business Process Modeling Notation (BPMN) models describe the passage from “AS-IS” to “TO BE” in the context of the analyzed industry, by focusing the attention mainly on customer and technical support services where chatbot is adopted. A two-step process model is used to connect technological improvements and relationship marketing in chatbot assistance: the first step provides the hierarchical clustering able to classify questions and answers through Latent Dirichlet Allocation -LDA- algorithm, and the second one executes the Tag Cloud representing the visual representation of more frequent words contained in the experimental dataset. Tag cloud is used to show the critical issues that customers find in the usage of the proposed service. By considering an initial dataset, 24 hierarchical clusters are found representing the preliminary combination of the couple’s question-answer. The proposed approach is suitable to automatically construct a combination of chatbot questions and appropriate answers in intelligent systems.
Keywords:	Chatbot, Speech Recognition, Natural Language Processing-NLP, Hierarchical Clustering, Business Process Management and Notation-BPMN.
JEL:	O3 O30 O31 O32 O33 O34
Date:	2022–03–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:112244&r=

Information flows and the law of one price

By:	Rui Fan (Swansea University); Oleksandr Talavera (University of Birmingham); Vu Tran (University of Reading)
Abstract:	This paper explores the role of information flows for the law of one price in an almost frictionless environment. Specifically, we examine whether the volume and content of social media messages are related to the exchange rate pass-through to prices of dual-listed stocks. Our sample includes 37 million Twitter messages mentioning the name of a UK-US cross-listed stock from 2015 to 2018. Using a high-frequency intraday data sample, we observe a negative (positive) link of volume (agreement). The findings suggest that large information flows and a high degree of disagreement add extra frictions for the law of one price. In addition, there is an asymmetric pattern of the pass-through, notwithstanding that there are no import/export or geographically-related frictions. This presents further evidence for the importance of information flows in understanding the law of one price.
Keywords:	Twitter, investor sentiment, exchange rate pass-through, dual-listing, market integration, text classification, computational linguistics
JEL:	G12 G14 L86
Date:	2022–03
URL:	http://d.repec.org/n?u=RePEc:bir:birmec:22-05&r=

Deep Reinforcement Learning and Convex Mean-Variance Optimisation for Portfolio Management

By:	Ruan Pretorius; Terence van Zyl
Abstract:	Traditional portfolio management methods can incorporate specific investor preferences but rely on accurate forecasts of asset returns and covariances. Reinforcement learning (RL) methods do not rely on these explicit forecasts and are better suited for multi-stage decision processes. To address limitations of the evaluated research, experiments were conducted on three markets in different economies with different overall trends. By incorporating specific investor preferences into our RL models' reward functions, a more comprehensive comparison could be made to traditional methods in risk-return space. Transaction costs were also modelled more realistically by including nonlinear changes introduced by market volatility and trading volume. The results of this study suggest that there can be an advantage to using RL methods compared to traditional convex mean-variance optimisation methods under certain market conditions. Our RL models could significantly outperform traditional single-period optimisation (SPO) and multi-period optimisation (MPO) models in upward trending markets, but only up to specific risk limits. In sideways trending markets, the performance of SPO and MPO models can be closely matched by our RL models for the majority of the excess risk range tested. The specific market conditions under which these models could outperform each other highlight the importance of a more comprehensive comparison of Pareto optimal frontiers in risk-return space. These frontiers give investors a more granular view of which models might provide better performance for their specific risk tolerance or return targets.
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2203.11318&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.