nep-big 2021-11-01 papers

on Big Data

Issue of 2021‒11‒01
eleven papers chosen by
Tom Coupé
University of Canterbury

Machine Learning in Finance-Emerging Trends and Challenges By Jaydip Sen; Rajdeep Sen; Abhishek Dutta
A new machine learning-based treatment bite for long run minimum wage evaluations By Börschlein, Benjamin; Bossler, Mario
Predicting Status of Pre and Post M&A Deals Using Machine Learning and Deep Learning Techniques By Tugce Karatas; Ali Hirsa
Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning By Shareefuddin Mohammed; Rusty Bealer; Jason Cohen
The impact of CSR performance on Efficiency of Investments using Machine Learning By Nadia Lakhal; Asma Guizani; Asma Sghaier; Mohammed El َamine Abdelli; Imen Ben Slimene
Bank transactions embeddings help to uncover current macroeconomics By Maria Begicheva; Oleg Travkin; Alexey Zaytsev
Forecasting Financial Market Structure from Network Features using Machine Learning By Douglas Castilho; Tharsis T. P. Souza; Soong Moon Kang; Jo\~ao Gama; Andr\'e C. P. L. F. de Carvalho
Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK) By Keita, Moussa
Using sentiment analysis in tourism research: A systematic, bibliometric, and integrative review By Cristina Franciele; Thays Christina Domareski Ruiz
An Economy of Neural Networks: Learning from Heterogeneous Experiences By Artem Kuriksha
Understanding a Less Developed Labor Market through the Lens of Social Security Data By Nada Wasi; Chinnawat Devahastin Na Ayudhya; Pucktada Treeratpituk; Chommanart Nittayo

Machine Learning in Finance-Emerging Trends and Challenges

By:	Jaydip Sen; Rajdeep Sen; Abhishek Dutta
Abstract:	The paradigm of machine learning and artificial intelligence has pervaded our everyday life in such a way that it is no longer an area for esoteric academics and scientists putting their effort to solve a challenging research problem. The evolution is quite natural rather than accidental. With the exponential growth in processing speed and with the emergence of smarter algorithms for solving complex and challenging problems, organizations have found it possible to harness a humongous volume of data in realizing solutions that have far-reaching business values. This introductory chapter highlights some of the challenges and barriers that organizations in the financial services sector at the present encounter in adopting machine learning and artificial intelligence-based models and applications in their day-to-day operations.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.11999&r=

A new machine learning-based treatment bite for long run minimum wage evaluations

By: Börschlein, Benjamin; Bossler, Mario

JEL: J31 J38 C49 C21

Date: 2021

URL: http://d.repec.org/n?u=RePEc:zbw:vfsc21:242441&r=

Predicting Status of Pre and Post M&A Deals Using Machine Learning and Deep Learning Techniques

By:	Tugce Karatas; Ali Hirsa
Abstract:	Risk arbitrage or merger arbitrage is a well-known investment strategy that speculates on the success of M&A deals. Prediction of the deal status in advance is of great importance for risk arbitrageurs. If a deal is mistakenly classified as a completed deal, then enormous cost can be incurred as a result of investing in target company shares. On the contrary, risk arbitrageurs may lose the opportunity of making profit. In this paper, we present an ML and DL based methodology for takeover success prediction problem. We initially apply various ML techniques for data preprocessing such as kNN for data imputation, PCA for lower dimensional representation of numerical variables, MCA for categorical variables, and LSTM autoencoder for sentiment scores. We experiment with different cost functions, different evaluation metrics, and oversampling techniques to address class imbalance in our dataset. We then implement feedforward neural networks to predict the success of the deal status. Our preliminary results indicate that our methodology outperforms the benchmark models such as logit and weighted logit models. We also integrate sentiment scores into our methodology using different model architectures, but our preliminary results show that the performance is not changing much compared to the simple FFNN framework. We will explore different architectures and employ a thorough hyperparameter tuning for sentiment scores as a future work.
Date:	2021–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.09315&r=

Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning

By:	Shareefuddin Mohammed; Rusty Bealer; Jason Cohen
Abstract:	In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the right features from a dataset, which can be a challenge for nuanced financial planning scenarios. Reinforcement learning is a machine learning approach that can be employed with complex data sets where picking the right features can be nearly impossible. In this paper, we will explore the use of machine learning for financial forecasting, predicting economic indicators, and creating a savings strategy. Vanguard ML algorithm for goals-based financial planning is based on deep reinforcement learning that identifies optimal savings rates across multiple goals and sources of income to help clients achieve financial success. Vanguard learning algorithms are trained to identify market indicators and behaviors too complex to capture with formulas and rules, instead, it works to model the financial success trajectory of investors and their investment outcomes as a Markov decision process. We believe that reinforcement learning can be used to create value for advisors and end-investors, creating efficiency, more personalized plans, and data to enable customized solutions.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.12003&r=

The impact of CSR performance on Efficiency of Investments using Machine Learning

By:	Nadia Lakhal (Lamided, ISG, Université de Sousse - LAMIDED); Asma Guizani (LIRSA - Laboratoire interdisciplinaire de recherche en sciences de l'action - CNAM - Conservatoire National des Arts et Métiers [CNAM]); Asma Sghaier (Department of Finance, University Of Sousse, Sousse, Tunisia.); Mohammed El َamine Abdelli (University of Brest); Imen Ben Slimene (UGA [2016-2019] - Université Grenoble Alpes [2016-2019])
Keywords:	CSR,Investment Efficiency,Machine learning,Stakeholder Theory
Date:	2021–09–28
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03375264&r=

Bank transactions embeddings help to uncover current macroeconomics

By:	Maria Begicheva; Oleg Travkin; Alexey Zaytsev
Abstract:	Macroeconomic indexes are of high importance for banks: many risk-control decisions utilize these indexes. A typical workflow of these indexes evaluation is costly and protracted, with a lag between the actual date and available index being a couple of months. Banks predict such indexes now using autoregressive models to make decisions in a rapidly changing environment. However, autoregressive models fail in complex scenarios related to appearances of crises. We propose to use clients' financial transactions data from a large Russian bank to get such indexes. Financial transactions are long, and a number of clients is huge, so we develop an efficient approach that allows fast and accurate estimation of macroeconomic indexes based on a stream of transactions consisting of millions of transactions. The approach uses a neural networks paradigm and a smart sampling scheme. The results show that our neural network approach outperforms the baseline method on hand-crafted features based on transactions. Calculated embeddings show the correlation between the client's transaction activity and bank macroeconomic indexes over time.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.12000&r=

Forecasting Financial Market Structure from Network Features using Machine Learning

By:	Douglas Castilho; Tharsis T. P. Souza; Soong Moon Kang; Jo\~ao Gama; Andr\'e C. P. L. F. de Carvalho
Abstract:	We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph (DAG), Dynamic Minimal Spanning Tree (DMST) and Dynamic Threshold Networks (DTN). Experimental results show that the proposed model can forecast market structure with high predictive performance with up to $40\%$ improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.11751&r=

Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK)

By:	Keita, Moussa
Abstract:	Over the past decade, many technological solutions have been designed to meet the multiple challenges of Big Data, namely the problematic of storing and processing huge volumes of data generated at continuous pace. Two major concepts are at the heart of the solutions designed to meet the challenges: storage in distributed architecture and parallelized processing. HADOOP is one of the first frameworks that implemented this approach. In this document, we provide a general overview of the HADOOP framework, its main functionalities as well as some technological layers that form its ecosystem. First, we present the basic components of HADOOP technology: HDFS, MAPREDUCE and YARN. And secondly, we present some tools that allow exploiting data stored in HADOOP environment. Especially, we present HIVE a query engine, HBASE a distributed database, KAFKA a tool of ingestion and integration of streams of data and SPARK a parallelized data processing engine.
Keywords:	Big data, data Science, Hadoop, HDFS, MAPREDUCE, YARN, Spark, Kafka, Hbase, java, python, scala
JEL:	C8
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:110334&r=

Using sentiment analysis in tourism research: A systematic, bibliometric, and integrative review

By:	Cristina Franciele (UFPR - Universidade Federal do Paraná); Thays Christina Domareski Ruiz (UFPR - Universidade Federal do Parana [Curitiba] - UFPR - Universidade Federal do Paraná)
Abstract:	Purpose: Sentiment analysis is built from the information provided through text (reviews) to help understand the social sentiment toward their brand, product, or service. The main purpose of this paper is to draw an overview of the topics and the use of the sentiment analysis approach in tourism research. Methods: The study is a bibliometric analysis (VOSviewer), with a systematic and integrative review. The search occurred in March 2021 (Scopus) applying the search terms "sentiment analysis" and "tourism" in the title, abstract, or keywords, resulting in a final sample of 111 papers. Results: This analysis pointed out that China (35) and the United States (24) are the leading countries studying sentiment analysis with tourism. The first paper using sentiment analysis was published in 2012; there is a growing interest in this topic, presenting qualitative and quantitative approaches. The main results present four clusters to understand this subject. Cluster 1 discusses sentiment analysis and its application in tourism research, searching how online reviews can impact decision-making. Cluster 2 examines the resources used to make sentiment analysis, such as social media. Cluster 3 argues about methodological approaches in sentiment analysis and tourism, such as deep learning and sentiment classification, to understand the usergenerated content. Cluster 4 highlights questions relating to the internet and tourism. Implications: The use of sentiment analysis in tourism research shows that government and entrepreneurship can draw and enhance communication strategies, reduce cost, and time, and mainly contribute to the decisionmaking process and understand consumer behavior.
Keywords:	Sentiment analysis,tourism,bibliometrics,systematic review,integrative review,Vosviewer
Date:	2021–10–18
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03373984&r=

An Economy of Neural Networks: Learning from Heterogeneous Experiences

By:	Artem Kuriksha
Abstract:	This paper proposes a new way to model behavioral agents in dynamic macro-financial environments. Agents are described as neural networks and learn policies from idiosyncratic past experiences. I investigate the feedback between irrationality and past outcomes in an economy with heterogeneous shocks similar to Aiyagari (1994). In the model, the rational expectations assumption is seriously violated because learning of a decision rule for savings is unstable. Agents who fall into learning traps save either excessively or save nothing, which provides a candidate explanation for several empirical puzzles about wealth distribution. Neural network agents have a higher average MPC and exhibit excess sensitivity of consumption. Learning can negatively affect intergenerational mobility.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2110.11582&r=

Understanding a Less Developed Labor Market through the Lens of Social Security Data

By:	Nada Wasi; Chinnawat Devahastin Na Ayudhya; Pucktada Treeratpituk; Chommanart Nittayo
Abstract:	While understanding labor market dynamics is crucial for designing the countryâ€™s social protection programs, prohibitive longitudinal surveys are rarely available in less developed countries. We illustrate that employment history from Social Security records can provide several important insights by using data from a middle-income country, Thailand. First, in contrary to the traditional view, we find that the formal and informal sectors are quite connected. Our analysis of millions of individual histories by a machine learning technique shows that more than half of registered workers left the formal sector either seasonally or permanently long before their retirement age. This finding raises a question of whether the social protection schemes being separately designed for formal and informal workers are effective. Second, the semi-formal workers also had a much flatter wage-age profile compared to those always staying in the formal sector. This observation calls for effective redistributive tools to prevent earnings inequality to translate into disparities in old-age and transmit to the next generation. Lastly, on the employer size, we find that almost half of formally registered firms had fewer than five employees, the benchmark often used to define informal firms. This result suggests that the distributions of firm sizes differ across countries and the employer size alone is unlikely sufficient to define informal workers.
Keywords:	Employment; Work History; Social Security; K-means Clustering; Thailand
JEL:	J01 J08 J21 J60
Date:	2021–01
URL:	http://d.repec.org/n?u=RePEc:pui:dpaper:147&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Börschlein, Benjamin; Bossler, Mario
JEL:	J31 J38 C49 C21
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:vfsc21:242441&r=