nep-big 2019-08-19 papers

on Big Data

Issue of 2019‒08‒19
fourteen papers chosen by
Tom Coupé
University of Canterbury

A short review on the economics of artificial intelligence By Yingying Lu; Yixiao Zhou
Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images By Christian S. Otchia; Simplice A. Asongu
Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images By Christian S. Otchia; Simplice A. Asongu
Mid-price Prediction Based on Machine Learning Methods with Technical and Quantitative Indicators By Adamantios Ntakaris; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
Clustering, Forecasting and Cluster Forecasting: using k-medoids, k-NNs and random forests for cluster selection By Dinesh Reddy Vangumalli; Konstantinos Nikolopoulos; Konstantia Litsiou
Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks By Marco Schreyer; Timur Sattarov; Christian Schulze; Bernd Reimer; Damian Borth
Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction By Xinyi Li; Yinchuan Li; Xiao-Yang Liu; Christina Dan Wang
Neural network regression for Bermudan option pricing By Bernard Lapeyre; Jérôme Lelong
Machine Learning for Forecasting Excess Stock Returns – The Five-Year-View By Ioannis Kyriakou; Parastoo Mousavi; Jens Perch Nielsen; Michael Scholz
Machine learning explainability in finance: an application to default risk analysis By Bracke, Philippe; Datta, Anupam; Jung, Carsten; Sen, Shayak
Agglomerative Fast Super-Paramagnetic Clustering By Lionel Yelibi; Tim Gebbie
Worried about the fourth industrial revolution's impact on jobs? Scale up skills development and training! By Terry McKinley
Managing the Complexity of Processing Financial Data at Scale -- an Experience Report By Sebastian Frischbier; Mario Paic; Alexander Echler; Christian Roth
Solving high-dimensional optimal stopping problems using deep learning By Sebastian Becker; Patrick Cheridito; Arnulf Jentzen; Timo Welti

A short review on the economics of artificial intelligence

By:	Yingying Lu; Yixiao Zhou
Abstract:	The rapid development of artificial intelligence (AI) is not only a scientific breakthrough but also impacts on human society and economy as well as the development of economics. Research on AI economics is new and growing fast, with a current focus on the productivity and employment effects of AI. This paper reviews recent literature in order to answer three key questions. First, what approaches are being used to represent AI in economic models? Second, will AI technology have a different impact on the economy than previous new technologies? Third, in which aspects will AI have an impact and what is the empirical evidence of these effects of AI? Our review reveals that most empirical studies cannot deny the existence of the Solow Paradox for AI technology, but some studies find that AI would have a different and broader impact than previous technologies such as information technology, although it would follow a similar adoption path. Secondly, the key to incorporating AI into economic models raises fundamental questions including what the human being is and what the role of the human being in economic models is. This also poses the question of whether AI can be an economic agent in such models. Thirdly, studies on the labor market seem to have reached consensus on the stylized fact that AI would increase unemployment within sectors but may create employment gains at the aggregate level. AI also increases the income gap between low- and medium-skilled workers and high-skilled workers. AI’s impacts on international trade and education have been largely neglected in the current literature and are worth further research in the future.
Keywords:	Artificial Intelligence, Development of Economics, Literature Review
JEL:	A12 E1 E24 E65 F41 J21
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:een:camaaa:2019-54&r=all

Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images

By:	Christian S. Otchia (Hyogo, Japan); Simplice A. Asongu (Yaoundé, Cameroon)
Abstract:	This study uses nightlight time data and machine learning techniques to predict industrial development in Africa. The results provide the first evidence on how machine learning techniques and nightlight data can be used to predict economic development in places where subnational data are missing or not precise. Taken together, the research confirms four groups of important determinants of industrial growth: natural resources, agriculture growth, institutions, and manufacturing imports. Our findings indicate that Africa should follow a more multisector approach for development, putting natural resources and agriculture productivity growth at the forefront.
Keywords:	Industrial growth; Machine learning; Africa
JEL:	I32 O15 O40 O55
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:exs:wpaper:19/046&r=all

Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images

By:	Christian S. Otchia (Hyogo, Japan); Simplice A. Asongu (Yaoundé, Cameroon)
Abstract:	This study uses nightlight time data and machine learning techniques to predict industrial development in Africa. The results provide the first evidence on how machine learning techniques and nightlight data can be used to predict economic development in places where subnational data are missing or not precise. Taken together, the research confirms four groups of important determinants of industrial growth: natural resources, agriculture growth, institutions, and manufacturing imports. Our findings indicate that Africa should follow a more multisector approach for development, putting natural resources and agriculture productivity growth at the forefront.
Keywords:	Industrial growth; Machine learning; Africa
JEL:	I32 O15 O40 O55
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:agd:wpaper:19/046&r=all

Mid-price Prediction Based on Machine Learning Methods with Technical and Quantitative Indicators

By:	Adamantios Ntakaris; Juho Kanniainen; Moncef Gabbouj; Alexandros Iosifidis
Abstract:	Stock price prediction is a challenging task, but machine learning methods have recently been used successfully for this purpose. In this paper, we extract over 270 hand-crafted features (factors) inspired by technical and quantitative analysis and tested their validity on short-term mid-price movement prediction. We focus on a wrapper feature selection method using entropy, least-mean squares, and linear discriminant analysis. We also build a new quantitative feature based on adaptive logistic regression for online learning, which is constantly selected first among the majority of the proposed feature selection methods. This study examines the best combination of features using high frequency limit order book data from Nasdaq Nordic. Our results suggest that sorting methods and classifiers can be used in such a way that one can reach the best performance with a combination of only very few advanced hand-crafted features.
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1907.09452&r=all

Clustering, Forecasting and Cluster Forecasting: using k-medoids, k-NNs and random forests for cluster selection

By:	Dinesh Reddy Vangumalli (Oracle America Inc); Konstantinos Nikolopoulos (Bangor University); Konstantia Litsiou (Manchester Metropolitan University)
Abstract:	Data analysts when facing a forecasting task involving a large number of time series, they regularly employ one of the following two methodological approaches: either select a single forecasting method for the entire dataset (aggregate selection), or use the best forecasting method for each time series (individual selection). There is evidence in the predictive analytics literature that the former is more robust than the latter, as in individual selection you tend to overfit models to the data. A third approach is to firstly identify homogeneous clusters within the dataset, and then select a single forecasting method for each cluster (cluster selection). This research examines the performance of three well-celebrated machine learning clustering methods: k-medoids, k-NN and random forests. We then forecast every cluster with the best possible method, and the performance is compared to that of aggregate selection. The aforementioned methods are very often used for classification tasks, but since in our case there is no set of predefined classes, the methods are used for pure clustering. The evaluation is performed in the 645 yearly series of the M3 competition. The empirical evidence suggests that: a) random forests provide the best clusters for the sequential forecasting task, and b) cluster selection has the potential to outperform aggregate selection.
Keywords:	Clustering; k-medoids; Nearest Neighbors; Random Forests; Forecasting;
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:bng:wpaper:19016&r=all

Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks

By:	Marco Schreyer; Timur Sattarov; Christian Schulze; Bernd Reimer; Damian Borth
Abstract:	The detection of fraud in accounting data is a long-standing challenge in financial statement audits. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. In contrast, more advanced approaches inspired by the recent success of deep learning often lack seamless interpretability of the detected results. To overcome this challenge, we propose the application of adversarial autoencoder networks. We demonstrate that such artificial neural networks are capable of learning a semantic meaningful representation of real-world journal entries. The learned representation provides a holistic view on a given set of journal entries and significantly improves the interpretability of detected accounting anomalies. We show that such a representation combined with the networks reconstruction error can be utilized as an unsupervised and highly adaptive anomaly assessment. Experiments on two datasets and initial feedback received by forensic accountants underpinned the effectiveness of the approach.
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1908.00734&r=all

Risk Management via Anomaly Circumvent: Mnemonic Deep Learning for Midterm Stock Prediction

By:	Xinyi Li; Yinchuan Li; Xiao-Yang Liu; Christina Dan Wang
Abstract:	Midterm stock price prediction is crucial for value investments in the stock market. However, most deep learning models are essentially short-term and applying them to midterm predictions encounters large cumulative errors because they cannot avoid anomalies. In this paper, we propose a novel deep neural network Mid-LSTM for midterm stock prediction, which incorporates the market trend as hidden states. First, based on the autoregressive moving average model (ARMA), a midterm ARMA is formulated by taking into consideration both hidden states and the capital asset pricing model. Then, a midterm LSTM-based deep neural network is designed, which consists of three components: LSTM, hidden Markov model and linear regression networks. The proposed Mid-LSTM can avoid anomalies to reduce large prediction errors, and has good explanatory effects on the factors affecting stock prices. Extensive experiments on S&P 500 stocks show that (i) the proposed Mid-LSTM achieves 2-4% improvement in prediction accuracy, and (ii) in portfolio allocation investment, we achieve up to 120.16% annual return and 2.99 average Sharpe ratio.
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1908.01112&r=all

Neural network regression for Bermudan option pricing

By:	Bernard Lapeyre (CERMICS - Centre d'Enseignement et de Recherche en Mathématiques et Calcul Scientifique - ENPC - École des Ponts ParisTech, MATHRISK - Mathematical Risk Handling - Inria de Paris - Inria - Institut National de Recherche en Informatique et en Automatique - ENPC - École des Ponts ParisTech - UPEM - Université Paris-Est Marne-la-Vallée); Jérôme Lelong (LJK - Laboratoire Jean Kuntzmann - UPMF - Université Pierre Mendès France - Grenoble 2 - UJF - Université Joseph Fourier - Grenoble 1 - Institut Polytechnique de Grenoble - Grenoble Institute of Technology - CNRS - Centre National de la Recherche Scientifique - UGA - Université Grenoble Alpes)
Abstract:	The pricing of Bermudan options amounts to solving a dynamic programming principle , in which the main difficulty, especially in large dimension, comes from the computation of the conditional expectation involved in the continuation value. These conditional expectations are classically computed by regression techniques on a finite dimensional vector space. In this work, we study neural networks approximation of conditional expectations. We prove the convergence of the well-known Longstaff and Schwartz algorithm when the standard least-square regression is replaced by a neural network approximation.
Keywords:	Bermudan options,Optimal stopping,Regression methods,Deep learning,Neural networks
Date:	2019–07–15
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02183587&r=all

Machine Learning for Forecasting Excess Stock Returns – The Five-Year-View

By:	Ioannis Kyriakou (Cass Business School, City, University of London, UK); Parastoo Mousavi (Cass Business School, City, University of London, UK); Jens Perch Nielsen (Cass Business School, City, University of London, UK); Michael Scholz (University of Graz, Austria)
Abstract:	In this paper, we apply machine learning to forecast stock returns in excess of different benchmarks, including the short-term interest rate, long-term interest rate, earnings-by-price ratio, and the inflation. In particular, we adopt and implement a fully nonparametric smoother with the covariates and the smoothing parameter chosen by cross-validation. We find that for both one-year and five-year returns, the term spread is, overall, the most powerful predictive variable for excess stock returns. Differently combined covariates can then achieve higher predictability for different forecast horizons. Nevertheless, the set of earnings-by-price and term spread predictors under the inflation benchmark strikes the right balance between the one-year and five-year horizon.
Keywords:	Benchmark; Cross-validation; Prediction; Stock returns; Long-term forecasts; Overlapping returns; Autocorrelation
JEL:	C14 C53 C58 G17 G22
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:grz:wpaper:2019-06&r=all

Machine learning explainability in finance: an application to default risk analysis

By:	Bracke, Philippe (UK Financial Conduct Authority); Datta, Anupam (Carnegie Mellon University); Jung, Carsten (Bank of England); Sen, Shayak (Carnegie Mellon University)
Abstract:	We propose a framework for addressing the ‘black box’ problem present in some Machine Learning (ML) applications. We implement our approach by using the Quantitative Input Influence (QII) method of Datta et al (2016) in a real‑world example: a ML model to predict mortgage defaults. This method investigates the inputs and outputs of the model, but not its inner workings. It measures feature influences by intervening on inputs and estimating their Shapley values, representing the features’ average marginal contributions over all possible feature combinations. This method estimates key drivers of mortgage defaults such as the loan‑to‑value ratio and current interest rate, which are in line with the findings of the economics and finance literature. However, given the non‑linearity of ML model, explanations vary significantly for different groups of loans. We use clustering methods to arrive at groups of explanations for different areas of the input space. Finally, we conduct simulations on data that the model has not been trained or tested on. Our main contribution is to develop a systematic analytical framework that could be used for approaching explainability questions in real world financial applications. We conclude though that notable model uncertainties do remain which stakeholders ought to be aware of.
Keywords:	Machine learning; explainability; mortgage defaults
JEL:	G21
Date:	2019–08–09
URL:	http://d.repec.org/n?u=RePEc:boe:boeewp:0816&r=all

Agglomerative Fast Super-Paramagnetic Clustering

By:	Lionel Yelibi; Tim Gebbie
Abstract:	We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present a fast non-expensive agglomerative algorithm. The method is tested on synthetic correlated time-series and noisy synthetic data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We argue that ASPC can reduce compute time costs and resource usage cost for large scale clustering while being serialized and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1908.00951&r=all

Worried about the fourth industrial revolution's impact on jobs? Scale up skills development and training!

By:	Terry McKinley (IPC-IG)
Abstract:	"We have been living through the third industrial revolution?'digitalisation'?since 1980. However, the fourth industrial revolution (driven mainly by robotics and artificial intelligence) already appears to be fast approaching. What will be its likely impacts on jobs, incomes and economic inequality? And, more importantly, what can be done about them? This One Pager focuses on this revolution's practical implications for social protection programmes". (...)
Keywords:	fourth industrial revolution, impact on jobs, scale up, skills, development, training
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:ipc:oparab:425&r=all

Managing the Complexity of Processing Financial Data at Scale -- an Experience Report

By:	Sebastian Frischbier; Mario Paic; Alexander Echler; Christian Roth
Abstract:	Financial markets are extremely data-driven and regulated. Participants rely on notifications about significant events and background information that meet their requirements regarding timeliness, accuracy, and completeness. As one of Europe's leading providers of financial data and regulatory solutions vwd processes a daily average of 18 billion notifications from 500+ data sources for 30 million symbols. Our large-scale geo-distributed systems handle daily peak rates of 1+ million notifications/sec. In this paper we give practical insights about the different types of complexity we face regarding the data we process, the systems we operate, and the regulatory constraints we must comply with. We describe the volume, variety, velocity, and veracity of the data we process, the infrastructure we operate, and the architecture we apply. We illustrate the load patterns created by trading and how the markets' attention to the Brexit vote and similar events stressed our systems.
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1908.03206&r=all

Solving high-dimensional optimal stopping problems using deep learning

By:	Sebastian Becker; Patrick Cheridito; Arnulf Jentzen; Timo Welti
Abstract:	Nowadays many financial derivatives which are traded on stock and futures exchanges, such as American or Bermudan options, are of early exercise type. Often the pricing of early exercise options gives rise to high-dimensional optimal stopping problems, since the dimension corresponds to the number of underlyings in the associated hedging portfolio. High-dimensional optimal stopping problems are, however, notoriously difficult to solve due to the well-known curse of dimensionality. In this work we propose an algorithm for solving such problems, which is based on deep learning and computes, in the context of early exercise option pricing, both approximations for an optimal exercise strategy and the price of the considered option. The proposed algorithm can also be applied to optimal stopping problems that arise in other areas where the underlying stochastic process can be efficiently simulated. We present numerical results for a large number of example problems, which include the pricing of many high-dimensional American and Bermudan options such as, for example, Bermudan max-call options in up to 5000~dimensions. Most of the obtained results are compared to reference values computed by exploiting the specific problem design or, where available, to reference values from the literature. These numerical results suggest that the proposed algorithm is highly effective in the case of many underlyings, in terms of both accuracy and speed.
Date:	2019–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1908.01602&r=all

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.