nep-big New Economics Papers
on Big Data
Issue of 2017‒07‒23
six papers chosen by
Tom Coupé
University of Canterbury

  1. Regulatory Learning: how to supervise machine learning models? An application to credit scoring By Dominique Guegan; Bertrand Hassani
  2. Machine learning application in online lending risk prediction By Xiaojiao Yu
  3. Forecasting the U.S. Real House Price Index By Vasilios Plakandaras; Rangan Gupta; Periklis Gogas; Theophilos Papadimitriou
  4. Sequence Classification of the Limit Order Book using Recurrent Neural Networks By Matthew F Dixon
  5. The R package MitISEM: Efficient and robust simulation procedures for Bayesian inference By Nalan Basturk; Stefano Grassi; Lennart Hoogerheide; Anne Opschoor; Herman K. van Dijk
  6. "Investment with deep learning" (in Japanese) This paper considers investment methods with deep learning in neural networks. In particular, as one can create various investment strategies by different specifications of a loss function, the current work presents two examples based on return anomalies detected by supervised deep learning (SL) or profit maximization by deep reinforcement learning (RL). It also applies learning of individual asset return dynamics to portfolio strategies. Moreover, it turns out that the investment performance are quite sensitive to exogenously specified items such as frequency of input data(e.g.monthly or daily returns), selection of a learning method, update of learning, number of layers in a network and number of units in intermediate layers. Especially, RL provides relatively fine records in portfolio investment, where a further analysis implies the possibility of better performance by reduction in number of units from the input to intermediate layers. By Takaya Fukui; Akihiko Takahashi

  1. By: Dominique Guegan (Centre d'Economie de la Sorbonne and LabEx ReFi); Bertrand Hassani (Group Capgemini and Centre d'Economie de la Sorbonne and LabEx ReFi)
    Abstract: The arrival of big data strategies is threatening the lastest trends in financial regulation related to the simplification of models and the enhancement of the comparability of approaches chosen by financial institutions. Indeed, the intrinsic dynamic philosophy of Big Data strategies is almost incompatible with the current legal and regulatory framework as illustrated in this paper. Besides, as presented in our application to credit scoring, the model selection may also evolve dynamically forcing both practitioners and regulators to develop libraries of models, strategies allowing to switch from one to the other as well as supervising approaches allowing financial institutions to innovate in a risk mitigated environment. The purpose of this paper is therefore to analyse the issues related to the Big Data environment and in particular to machine learning models highlighting the issues present in the current framework confronting the data flows, the model selection process and the necessity to generate appropriate outcomes
    Keywords: Financial Regulation; Algorithm; Big Data; Risk
    JEL: C55
    Date: 2017–07
  2. By: Xiaojiao Yu
    Abstract: Online leading has disrupted the traditional consumer banking sector with more effective loan processing. Risk prediction and monitoring is critical for the success of the business model. Traditional credit score models fall short in applying big data technology in building risk model. In this manuscript, data with various format and size were collected from public website, third-parties and assembled with client's loan application information data. Ensemble machine learning models, random forest model and XGBoost model, were built and trained with the historical transaction data and subsequently tested with separate data. XGBoost model shows higher K-S value, suggesting better classification capability in this task. Top 10 important features from the two models suggest external data such as zhimaScore, multi-platform stacking loans information, and social network information are important factors in predicting loan default probability.
    Date: 2017–07
  3. By: Vasilios Plakandaras; Rangan Gupta; Periklis Gogas; Theophilos Papadimitriou
    Abstract: The 2006 sudden and immense downturn in U.S. House Prices sparked the 2007 global financial crisis and revived the interest about forecasting such imminent threats for economic stability. In this paper we propose a novel hybrid forecasting methodology that combines the Ensemble Empirical Mode Decomposition (EEMD) from the field of signal processing with the Support Vector Regression (SVR) methodology that originates from machine learning. We test the forecasting ability of the proposed model against a Random Walk (RW) model, a Bayesian Autoregressive and a Bayesian Vector Autoregressive model. The proposed methodology outperforms all the competing models with half the error of the RW model with and without drift in out-of-sample forecasting. Finally, we argue that this new methodology can be used as an early warning system for forecasting sudden house prices drops with direct policy implications.
    Date: 2017–07
  4. By: Matthew F Dixon
    Abstract: Recurrent neural networks (RNNs) are types of artificial neural networks (ANNs) that are well suited to forecasting and sequence classification. They have been applied extensively to forecasting univariate financial time series, however their application to high frequency trading has not been previously considered. This paper solves a sequence classification problem in which a short sequence of observations of limit order book depths and market orders is used to predict a next event price-flip. The capability to adjust quotes according to this prediction reduces the likelihood of adverse price selection. Our results demonstrate the ability of the RNN to capture the non-linear relationship between the near-term price-flips and a spatio-temporal representation of the limit order book. The RNN compares favorably with other classifiers, including a linear Kalman filter, using S&P500 E-mini futures level II data over the month of August 2016. Further results assess the effect of retraining the RNN daily and the sensitivity of the performance to trade latency.
    Date: 2017–07
  5. By: Nalan Basturk (Maastricht University & RCEA); Stefano Grassi (University of Rome “Tor Vergata”); Lennart Hoogerheide (Vrije Universiteit Amsterdam & Tinbergen Institute); Anne Opschoor (Vrije Universiteit Amsterdam & Tinbergen Institute); Herman K. van Dijk (Erasmus University Rotterdam, Norges Bank (Central Bank of Norway) & Tinbergen Institute & RCEA)
    Abstract: This paper presents the R package MitISEM (mixture of t by importance sampling weighted expectation maximization) which provides an automatic and flexible two-stage method to approximate a non-elliptical target density kernel – typically a posterior density kernel – using an adaptive mixture of Student-t densities as approximating density. In the first stage a mixture of Student-t densities isfitted to the target using an expectation maximization algorithm where each step of the optimization procedure is weighted using importance sampling. In the second stage this mixture density is a candidate density for efficient and robust application of importance sampling or the Metropolis-Hastings (MH) method to estimate properties of the target distribution. The package enables Bayesian inference and prediction on model parameters and probabilities, in particular, for models where densities have multi-modal or other non-elliptical shapes like curved ridges. These shapes occur in research topics in several scientific fields. For instance, analysis of DNA data in bio-informatics, obtaining loans in the banking sector by heterogeneous groups in financial economics and analysis of education's effect on earned income in labor economics. The package MitISEM provides also an extended algorithm, 'sequential MitISEM', which substantially decreases computation time when the target density has to be approximated for increasing data samples. This occurs when the posterior or predictive density is updated with new observations and/or when one computes model probabilities using predictive likelihoods. We illustrate the MitISEM algorithm using three canonical statistical and econometric models that are characterized by several types of non-elliptical posterior shapes and that describe well-known data patterns in econometrics and finance. We show that MH using the candidate density obtained by MitISEM outperforms, in terms of numerical efficiency, MH using a simpler candidate, as well as the Gibbs sampler. The MitISEM approach is also used for Bayesian model comparison using predictive likelihoods.
    Keywords: Finite mixtures, Student-t densities, importance sampling, MCMC, MetropolisHastings algorithm, expectation maximization, Bayesian inference, R software
    Date: 2017–06–26
  6. By: Takaya Fukui (Mizuho Securities Co., Ltd.); Akihiko Takahashi (Faculty of Economics, University of Tokyo)

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.