nep-big New Economics Papers
on Big Data
Issue of 2021‒11‒01
eleven papers chosen by
Tom Coupé
University of Canterbury

  1. Machine Learning in Finance-Emerging Trends and Challenges By Jaydip Sen; Rajdeep Sen; Abhishek Dutta
  2. A new machine learning-based treatment bite for long run minimum wage evaluations By Börschlein, Benjamin; Bossler, Mario
  3. Predicting Status of Pre and Post M&A Deals Using Machine Learning and Deep Learning Techniques By Tugce Karatas; Ali Hirsa
  4. Embracing advanced AI/ML to help investors achieve success: Vanguard Reinforcement Learning for Financial Goal Planning By Shareefuddin Mohammed; Rusty Bealer; Jason Cohen
  5. The impact of CSR performance on Efficiency of Investments using Machine Learning By Nadia Lakhal; Asma Guizani; Asma Sghaier; Mohammed El َamine Abdelli; Imen Ben Slimene
  6. Bank transactions embeddings help to uncover current macroeconomics By Maria Begicheva; Oleg Travkin; Alexey Zaytsev
  7. Forecasting Financial Market Structure from Network Features using Machine Learning By Douglas Castilho; Tharsis T. P. Souza; Soong Moon Kang; Jo\~ao Gama; Andr\'e C. P. L. F. de Carvalho
  8. Big Data et Technologies de Stockage et de Traitement des Données Massives : Comprendre les bases de l’écosystème HADOOP (HDFS, MAPREDUCE, YARN, HIVE, HBASE, KAFKA et SPARK) By Keita, Moussa
  9. Using sentiment analysis in tourism research: A systematic, bibliometric, and integrative review By Cristina Franciele; Thays Christina Domareski Ruiz
  10. An Economy of Neural Networks: Learning from Heterogeneous Experiences By Artem Kuriksha
  11. Understanding a Less Developed Labor Market through the Lens of Social Security Data By Nada Wasi; Chinnawat Devahastin Na Ayudhya; Pucktada Treeratpituk; Chommanart Nittayo

  1. By: Jaydip Sen; Rajdeep Sen; Abhishek Dutta
    Abstract: The paradigm of machine learning and artificial intelligence has pervaded our everyday life in such a way that it is no longer an area for esoteric academics and scientists putting their effort to solve a challenging research problem. The evolution is quite natural rather than accidental. With the exponential growth in processing speed and with the emergence of smarter algorithms for solving complex and challenging problems, organizations have found it possible to harness a humongous volume of data in realizing solutions that have far-reaching business values. This introductory chapter highlights some of the challenges and barriers that organizations in the financial services sector at the present encounter in adopting machine learning and artificial intelligence-based models and applications in their day-to-day operations.
    Date: 2021–10
  2. By: Börschlein, Benjamin; Bossler, Mario
    JEL: J31 J38 C49 C21
    Date: 2021
  3. By: Tugce Karatas; Ali Hirsa
    Abstract: Risk arbitrage or merger arbitrage is a well-known investment strategy that speculates on the success of M&A deals. Prediction of the deal status in advance is of great importance for risk arbitrageurs. If a deal is mistakenly classified as a completed deal, then enormous cost can be incurred as a result of investing in target company shares. On the contrary, risk arbitrageurs may lose the opportunity of making profit. In this paper, we present an ML and DL based methodology for takeover success prediction problem. We initially apply various ML techniques for data preprocessing such as kNN for data imputation, PCA for lower dimensional representation of numerical variables, MCA for categorical variables, and LSTM autoencoder for sentiment scores. We experiment with different cost functions, different evaluation metrics, and oversampling techniques to address class imbalance in our dataset. We then implement feedforward neural networks to predict the success of the deal status. Our preliminary results indicate that our methodology outperforms the benchmark models such as logit and weighted logit models. We also integrate sentiment scores into our methodology using different model architectures, but our preliminary results show that the performance is not changing much compared to the simple FFNN framework. We will explore different architectures and employ a thorough hyperparameter tuning for sentiment scores as a future work.
    Date: 2021–08
  4. By: Shareefuddin Mohammed; Rusty Bealer; Jason Cohen
    Abstract: In the world of advice and financial planning, there is seldom one right answer. While traditional algorithms have been successful in solving linear problems, its success often depends on choosing the right features from a dataset, which can be a challenge for nuanced financial planning scenarios. Reinforcement learning is a machine learning approach that can be employed with complex data sets where picking the right features can be nearly impossible. In this paper, we will explore the use of machine learning for financial forecasting, predicting economic indicators, and creating a savings strategy. Vanguard ML algorithm for goals-based financial planning is based on deep reinforcement learning that identifies optimal savings rates across multiple goals and sources of income to help clients achieve financial success. Vanguard learning algorithms are trained to identify market indicators and behaviors too complex to capture with formulas and rules, instead, it works to model the financial success trajectory of investors and their investment outcomes as a Markov decision process. We believe that reinforcement learning can be used to create value for advisors and end-investors, creating efficiency, more personalized plans, and data to enable customized solutions.
    Date: 2021–10
  5. By: Nadia Lakhal (Lamided, ISG, Université de Sousse - LAMIDED); Asma Guizani (LIRSA - Laboratoire interdisciplinaire de recherche en sciences de l'action - CNAM - Conservatoire National des Arts et Métiers [CNAM]); Asma Sghaier (Department of Finance, University Of Sousse, Sousse, Tunisia.); Mohammed El َamine Abdelli (University of Brest); Imen Ben Slimene (UGA [2016-2019] - Université Grenoble Alpes [2016-2019])
    Keywords: CSR,Investment Efficiency,Machine learning,Stakeholder Theory
    Date: 2021–09–28
  6. By: Maria Begicheva; Oleg Travkin; Alexey Zaytsev
    Abstract: Macroeconomic indexes are of high importance for banks: many risk-control decisions utilize these indexes. A typical workflow of these indexes evaluation is costly and protracted, with a lag between the actual date and available index being a couple of months. Banks predict such indexes now using autoregressive models to make decisions in a rapidly changing environment. However, autoregressive models fail in complex scenarios related to appearances of crises. We propose to use clients' financial transactions data from a large Russian bank to get such indexes. Financial transactions are long, and a number of clients is huge, so we develop an efficient approach that allows fast and accurate estimation of macroeconomic indexes based on a stream of transactions consisting of millions of transactions. The approach uses a neural networks paradigm and a smart sampling scheme. The results show that our neural network approach outperforms the baseline method on hand-crafted features based on transactions. Calculated embeddings show the correlation between the client's transaction activity and bank macroeconomic indexes over time.
    Date: 2021–10
  7. By: Douglas Castilho; Tharsis T. P. Souza; Soong Moon Kang; Jo\~ao Gama; Andr\'e C. P. L. F. de Carvalho
    Abstract: We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph (DAG), Dynamic Minimal Spanning Tree (DMST) and Dynamic Threshold Networks (DTN). Experimental results show that the proposed model can forecast market structure with high predictive performance with up to $40\%$ improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.
    Date: 2021–10
  8. By: Keita, Moussa
    Abstract: Over the past decade, many technological solutions have been designed to meet the multiple challenges of Big Data, namely the problematic of storing and processing huge volumes of data generated at continuous pace. Two major concepts are at the heart of the solutions designed to meet the challenges: storage in distributed architecture and parallelized processing. HADOOP is one of the first frameworks that implemented this approach. In this document, we provide a general overview of the HADOOP framework, its main functionalities as well as some technological layers that form its ecosystem. First, we present the basic components of HADOOP technology: HDFS, MAPREDUCE and YARN. And secondly, we present some tools that allow exploiting data stored in HADOOP environment. Especially, we present HIVE a query engine, HBASE a distributed database, KAFKA a tool of ingestion and integration of streams of data and SPARK a parallelized data processing engine.
    Keywords: Big data, data Science, Hadoop, HDFS, MAPREDUCE, YARN, Spark, Kafka, Hbase, java, python, scala
    JEL: C8
    Date: 2021–10
  9. By: Cristina Franciele (UFPR - Universidade Federal do Paraná); Thays Christina Domareski Ruiz (UFPR - Universidade Federal do Parana [Curitiba] - UFPR - Universidade Federal do Paraná)
    Abstract: Purpose: Sentiment analysis is built from the information provided through text (reviews) to help understand the social sentiment toward their brand, product, or service. The main purpose of this paper is to draw an overview of the topics and the use of the sentiment analysis approach in tourism research. Methods: The study is a bibliometric analysis (VOSviewer), with a systematic and integrative review. The search occurred in March 2021 (Scopus) applying the search terms "sentiment analysis" and "tourism" in the title, abstract, or keywords, resulting in a final sample of 111 papers. Results: This analysis pointed out that China (35) and the United States (24) are the leading countries studying sentiment analysis with tourism. The first paper using sentiment analysis was published in 2012; there is a growing interest in this topic, presenting qualitative and quantitative approaches. The main results present four clusters to understand this subject. Cluster 1 discusses sentiment analysis and its application in tourism research, searching how online reviews can impact decision-making. Cluster 2 examines the resources used to make sentiment analysis, such as social media. Cluster 3 argues about methodological approaches in sentiment analysis and tourism, such as deep learning and sentiment classification, to understand the usergenerated content. Cluster 4 highlights questions relating to the internet and tourism. Implications: The use of sentiment analysis in tourism research shows that government and entrepreneurship can draw and enhance communication strategies, reduce cost, and time, and mainly contribute to the decisionmaking process and understand consumer behavior.
    Keywords: Sentiment analysis,tourism,bibliometrics,systematic review,integrative review,Vosviewer
    Date: 2021–10–18
  10. By: Artem Kuriksha
    Abstract: This paper proposes a new way to model behavioral agents in dynamic macro-financial environments. Agents are described as neural networks and learn policies from idiosyncratic past experiences. I investigate the feedback between irrationality and past outcomes in an economy with heterogeneous shocks similar to Aiyagari (1994). In the model, the rational expectations assumption is seriously violated because learning of a decision rule for savings is unstable. Agents who fall into learning traps save either excessively or save nothing, which provides a candidate explanation for several empirical puzzles about wealth distribution. Neural network agents have a higher average MPC and exhibit excess sensitivity of consumption. Learning can negatively affect intergenerational mobility.
    Date: 2021–10
  11. By: Nada Wasi; Chinnawat Devahastin Na Ayudhya; Pucktada Treeratpituk; Chommanart Nittayo
    Abstract: While understanding labor market dynamics is crucial for designing the country’s social protection programs, prohibitive longitudinal surveys are rarely available in less developed countries. We illustrate that employment history from Social Security records can provide several important insights by using data from a middle-income country, Thailand. First, in contrary to the traditional view, we find that the formal and informal sectors are quite connected. Our analysis of millions of individual histories by a machine learning technique shows that more than half of registered workers left the formal sector either seasonally or permanently long before their retirement age. This finding raises a question of whether the social protection schemes being separately designed for formal and informal workers are effective. Second, the semi-formal workers also had a much flatter wage-age profile compared to those always staying in the formal sector. This observation calls for effective redistributive tools to prevent earnings inequality to translate into disparities in old-age and transmit to the next generation. Lastly, on the employer size, we find that almost half of formally registered firms had fewer than five employees, the benchmark often used to define informal firms. This result suggests that the distributions of firm sizes differ across countries and the employer size alone is unlikely sufficient to define informal workers.
    Keywords: Employment; Work History; Social Security; K-means Clustering; Thailand
    JEL: J01 J08 J21 J60
    Date: 2021–01

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.