nep-big New Economics Papers
on Big Data
Issue of 2019‒01‒14
seventeen papers chosen by
Tom Coupé
University of Canterbury

  1. An evaluation of early warning models for systemic banking crises: Does machine learning improve predictions? By Beutel, Johannes; List, Sophia; von Schweinitz, Gregor
  2. Antecedents and consequences of individuals' trust formation in artificial intelligence in Korea By Kim, Moon-Koo; Park, Jong-Hyun; Lee, Duk Hee
  3. Racing With or Against the Machine? Evidence from Europe By Gregory, Terry; Salomons, Anna; Zierahn, Ulrich
  4. Modified Causal Forests for Estimating Heterogeneous Causal Effects By Lechner, Michael
  5. Predicting the Stock Price of Frontier Markets Using Modified Black-Scholes Option Pricing Model and Machine Learning By Reaz Chowdhury; M. R. C. Mahdy; Tanisha Nourin Alam; Golam Dastegir Al Quaderi
  6. Deep neural networks algorithms for stochastic control problems on finite horizon, Part 2: numerical applications By Achref Bachouch; Côme Huré; Nicolas Langrené; Huyen Pham
  7. Sustainability and Wellbeing: A Text Analysis of New Zealand Parliamentary Debates, Official Yearbooks and Ministerial Documents By Mubashir Qasim
  8. In the Eye of the Beholder? Empirically Decomposing Different Economic Implications of the Online Rating Variance By Dominik Gutt
  9. Use of big data in financial sector of Bangladesh – A review By Abu Taher, Sheikh; Uddin, Md. Kama
  10. Adoption of AI in Firms and the Issues to be Overcome - An Empirical Analyses of the Evolutionary Path of Development by Firms By Miyazaki, Kumiko; Sato; Ryusuke
  11. Machine learning applied to accounting variables yields the risk-return metrics of private company portfolios* By Elias Cavalcante-Filho; Flavio Abdenur, Rodrigo De Losso
  12. Multimodal deep learning for short-term stock volatility prediction By Marcelo Sardelich; Suresh Manandhar
  13. Regime changes and fiscal sustainability in Kenya with comparative nonlinear Granger causalities across East-African countries By William Nganga; Julien Chevallier; Simon Ndiritu
  14. What Are We Learning about Artificial Intelligence in Financial Services?: a speech at Fintech and the New Financial Landscape, Philadelphia, Pennsylvania By Brainard, Lael
  15. Semiparametric Difference-in-Differences with Potentially Many Control Variables By Neng-Chieh Chang
  16. Generative Adversarial Networks for Financial Trading Strategies Fine-Tuning and Combination By Adriano Koshiyama; Nick Firoozye; Philip Treleaven
  17. Inference on average treatment effects in aggregate panel data settings By Victor Chernozhukov; Kaspar Wuthrich; Yinchu Zhu

  1. By: Beutel, Johannes; List, Sophia; von Schweinitz, Gregor
    Abstract: This paper compares the out-of-sample predictive performance of different early warning models for systemic banking crises using a sample of advanced economies covering the past 45 years. We compare a benchmark logit approach to several machine learning approaches recently proposed in the literature. We find that while machine learning methods often attain a very high in-sample fit, they are outperformed by the logit approach in recursive out-of-sample evaluations. This result is robust to the choice of performance measure, crisis definition, preference parameter, and sample length, as well as to using different sets of variables and data transformations. Thus, our paper suggests that further enhancements to machine learning early warning models are needed before they are able to offer a substantial value-added for predicting systemic banking crises. Conventional logit models appear to use the available information already fairly efficiently, and would for instance have been able to predict the 2007/2008 financial crisis out-of-sample for many countries. In line with economic intuition, these models identify credit expansions, asset price booms and external imbalances as key predictors of systemic banking crises.
    Keywords: early warning system,logit,machine learning,systemic banking crises
    JEL: C35 C53 G01
    Date: 2018
  2. By: Kim, Moon-Koo; Park, Jong-Hyun; Lee, Duk Hee
    Abstract: Since the mid-2010s, the rapid advancement of artificial intelligence (AI) has increased both people's expectations and their anxieties. Technology-centered optimism is largely widespread, hoping that AI will lead to a blessing of human life and society by maximizing productivity and efficiency. However, serious concerns, such as job substitution, deepening polarization, and human alienation reinforce society's skepticism of AI (Hurlburt, 2017). To achieve a hopeful and sustainable diffusion of AI, building human trust toward the technology becomes a very critical task. Some studies have stressed the role and importance of trust in the successful deployment and diffusion of AI-based applications (Choi and Ji, 2015; Hengstler et al., 2016; Mcknight et al., 2011). However, to the best of our knowledge, little or no attention has been paid to the antecedents and consequences of trust formation in AI. Therefore, against the background of Korean context, we aim to investigate the personal and technical factors influencing that trust formation, which in turn will impact individuals' value-perceptions on AI. We address this problem with three research questions. RQ1: What are the perceived technological factors that affect the formation of trust in AI? RQ2: What are the personal characteristics that affect the formation of trust in AI? RQ3: Does trust in AI affect individuals' value-perceptions?
    Date: 2018
  3. By: Gregory, Terry (IZA); Salomons, Anna (Utrecht University); Zierahn, Ulrich (ZEW Mannheim)
    Abstract: A fast-growing literature shows that digital technologies are displacing labor from routine tasks, raising concerns that labor is racing against the machine. We develop a task-based framework to estimate the aggregate labor demand and employment effects of routine-replacing technological change (RRTC), along with the underlying mechanisms. We show that while RRTC has indeed had strong displacement effects in the European Union between 1999 and 2010, it has simultaneously created new jobs through increased product demand, outweighing displacement effects and resulting in net employment growth. However, we also show that this finding depends on the distribution of gains from technological progress.
    Keywords: labor demand, employment, routine-replacing technological change, tasks, local demand spillovers
    JEL: E24 J23 J24 O33
    Date: 2019–01
  4. By: Lechner, Michael
    Abstract: Uncovering the heterogeneity of causal effects of policies and business decisions at various levels of granularity provides substantial value to decision makers. This paper develops new estimation and inference procedures for multiple treatment models in a selection-on-observables framework by modifying the Causal Forest approach suggested by Wager and Athey (2018). The new estimators have desirable theoretical and computational properties for various aggregation levels of the causal effects. An Empirical Monte Carlo study shows that they may outperform previously suggested estimators. Inference tends to be accurate for effects relating to larger groups and conservative for effects relating to fine levels of granularity. An application to the evaluation of an active labour market programme shows the value of the new methods for applied research.
    Keywords: Causal machine learning, statistical learning, average treatment effects, conditional average treatment effects, multiple treatments, selection-on-observable, causal forests
    JEL: C21 J68
    Date: 2019–01
  5. By: Reaz Chowdhury; M. R. C. Mahdy; Tanisha Nourin Alam; Golam Dastegir Al Quaderi
    Abstract: The Black-Scholes Option pricing model (BSOPM) has long been in use for valuation of equity options to find the prices of stocks. In this work, using BSOPM, we have come up with a comparative analytical approach and numerical technique to find the price of call option and put option and considered these two prices as buying price and selling price of stocks of frontier markets so that we can predict the stock price (close price). Changes have been made to the model to find the parameters strike price and the time of expiration for calculating stock price of frontier markets. To verify the result obtained using modified BSOPM we have used machine learning approach using the software Rapidminer, where we have adopted different algorithms like the decision tree, ensemble learning method and neural network. It has been observed that, the prediction of close price using machine learning is very similar to the one obtained using BSOPM. Machine learning approach stands out to be a better predictor over BSOPM, because Black-Scholes-Merton equation includes risk and dividend parameter, which changes continuously. We have also numerically calculated volatility. As the prices of the stocks goes high due to overpricing, volatility increases at a tremendous rate and when volatility becomes very high market tends to fall, which can be observed and determined using our modified BSOPM. The proposed modified BSOPM has also been explained based on the analogy of Schrodinger equation (and heat equation) of quantum physics.
    Date: 2018–12
  6. By: Achref Bachouch (UiO - University of Oslo); Côme Huré (LPSM UMR 8001 - Laboratoire de Probabilités, Statistique et Modélisation - UPMC - Université Pierre et Marie Curie - Paris 6 - UPD7 - Université Paris Diderot - Paris 7 - CNRS - Centre National de la Recherche Scientifique, UPD7 - Université Paris Diderot - Paris 7); Nicolas Langrené (CSIRO - Data61 [Canberra] - ANU - Australian National University - CSIRO - Commonwealth Scientific and Industrial Research Organisation [Canberra]); Huyen Pham (LPSM UMR 8001 - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique, UPD7 - Université Paris Diderot - Paris 7)
    Abstract: This paper presents several numerical applications of deep learning-based algorithms that have been analyzed in [11]. Numerical and comparative tests using TensorFlow illustrate the performance of our different algorithms, namely control learning by performance iteration (algorithms NNcontPI and ClassifPI), control learning by hybrid iteration (algorithms Hybrid-Now and Hybrid-LaterQ), on the 100-dimensional nonlinear PDEs examples from [6] and on quadratic Backward Stochastic Differential equations as in [5]. We also provide numerical results for an option hedging problem in finance, and energy storage problems arising in the valuation of gas storage and in microgrid management.
    Date: 2018–12–12
  7. By: Mubashir Qasim (University of Waikato)
    Abstract: Recent advances in natural language processing and semantic analysis methods are enabling scholars to analyse text extensively. These techniques have not only minimized the margins of error arising from missing data from a traditionally conducted discourse analysis but also permitted reproducibility of research results. In this paper, we use several text analysis methods to analyse the evolution of the terms ‘sustainability’ and ‘wellbeing’ (SaW) from parliamentary debates (Hansard), New Zealand Official Yearbooks (NZOYBs) and ministerial documents over 125 years. The term ‘welfare’ has existed in the NZOYBs and Hansard text since the start of our analysis (1893), with a steadily increasing trend until the mid-1980s. The term ‘wellbeing’ gained momentum in mid-1930s and has been linked strongly with ‘sustainability’ in the following decades. Our analysis re-emphasizes the importance of the Brundtland Report (‘Our Common Future’) which acted as a catalyst to the sustainable movement in late 1980s. ‘Sustainability’ and ‘wellbeing’ then began to appear in conjunction. Our analysis includes the finding that SaW differ significantly when political parties are considered.
    Keywords: sustainable development; wellbeing; text analysis; resilience; parliamentary debates; Hansard
    JEL: C80 I31 N00 Q01 Q56
    Date: 2019–01–10
  8. By: Dominik Gutt (Paderborn University)
    Abstract: The growing body of literature on online ratings has reached a consensus of the positive impact of the average rating and the number of ratings on economic outcomes. Yet, little is known about the economic implication of the online rating variance, and existing studies have presented contradictory results. Therefore, this study examines the impact of the online rating variance on the prices and sales of digital cameras from The key feature of our study is that we employ and validate a machine learning approach to decompose the online rating variance into a product failure-related and taste-related share. In line with our theoretical foundation, our empirical results highlight that the failure-related variance share has a negative impact on price and sales, while the impact of the taste-related share is positive. Our results highlight a new perspective on the online rating variance that has been largely neglected by prior studies. Sellers can benefit from our results by adjusting their pricing strategy and improving their sales forecasts. Review platforms can facilitate the identification of product failure-related ratings to support the purchasing decision process of customers.
    Keywords: Online Rating Variance, Text Mining, Econometrics, User-Generated Social Media.
    JEL: M15 M31 O32 D12
    Date: 2018–12
  9. By: Abu Taher, Sheikh; Uddin, Md. Kama
    Abstract: The objective of the paper is to review the use of big data analytics (BDA) in banks and non-banks financial institutions (BNBFI) in Bangladesh. Since the advent of information technology (IT), data collection for BNBFI becomes easy through various channels. As BNBFI conduct business through information, data plays essential role to take an accurate decision. Besides, literature suggests use of big data helps BNBFI to reduce customer churn rate, enhance loyalty, manage risk and increase revenue. BNBFI are leveraging big data to transform their processes, their organizations and soon, the entire industry. For this, the study attempts to explore the effect of BDA on the efficiency of BNBFI in Bangladesh using an explorative study. Since no study has been conducted until now, collection of data becomes difficult. However, data has been collected from extensive literature review, company websites, annual reports and formal conversation with the bank employees. The primary observations suggests, some BNBFIs use data analytics regarding customer through ATM transactions, debit and credit card use, online banking and generate the data from Internet and computer that has better performance than the banks which do not use it. But the performance, however, is not identified in which specific functions BNBFI can emphasize the most to promote efficiency and growth. Besides, the observation is preliminary in nature and needs further study to provide recommendation.
    Keywords: big data analytics,finance,efficiency,innovation
    Date: 2018
  10. By: Miyazaki, Kumiko; Sato; Ryusuke
    Abstract: AI has been through several booms and we have currently reached the 3rd AI boom. Although AI has been evolving over six decades it seems that the current boom is different from the previous booms. In this paper, we attempt to elucidate the issues for widespread adoption of AI in firms. Through one of the authors work experience related to AI, it appears that although companies are willing to consider adopting AI for various applications, only a few are willing to make a commitment to go for full scale adoption. The main goal of this paper is to identify the characteristics of the current 3rd AI boom and to analyze the issues for adoption by firms. For this purpose we have put forward 3 research questions. 1) How has the technological performance in AI changed at the national level during the 2nd and the 3rd boom? 2) How have the key technologies and the applications of AI changed over time? 3) How is the companies' perspective on AI and what are the necessary conditions for firms to adopt AI? Through bibliometric analysis, we were able to extract the important keywords in the 3rd AI boom, which were Machine learning and Deep learning. The main focus of AI research has been shifting towards AI applications. The interviews with firms which were considering adopting AI suggested the existence of a gap between the needs of the company and what AI can deliver at present. AI could be used for finding suitable treatment for genetic illnesses if some issues are solved.
    Date: 2018
  11. By: Elias Cavalcante-Filho; Flavio Abdenur, Rodrigo De Losso
    Abstract: Constructing optimal Markowitz Mean-Variance portfolios of publicly-traded stock is a straighforward and well-known task. Doing the same for portfolios of privately-owned firms, given the lack of historical price data, is a challenge. We apply machine learning models to historical accounting variable data to estimate risk-return metrics – specifically, expected excess returns, price volatility and (pairwise) price correlation – of private companies, which should allow the construction of Mean-Variance optimized portfolios consisting of private companies. We attain out-of-sample 𠑅2 s around 45%, while linear regressions yield 𠑅2 s of only about 10%. This short paper is the result of a real-world consulting project on behalf of Votorantim S.A (“VSA†), a multinational holding company. To the authors’ best knowledge this is a novel application of machine learning in the finance literature.
    Keywords: assent pricing; Machine Learning; Portfolio Theory
    JEL: G12 G17
    Date: 2018–12–20
  12. By: Marcelo Sardelich; Suresh Manandhar
    Abstract: Stock market volatility forecasting is a task relevant to assessing market risk. We investigate the interaction between news and prices for the one-day-ahead volatility prediction using state-of-the-art deep learning approaches. The proposed models are trained either end-to-end or using sentence encoders transfered from other tasks. We evaluate a broad range of stock market sectors, namely Consumer Staples, Energy, Utilities, Heathcare, and Financials. Our experimental results show that adding news improves the volatility forecasting as compared to the mainstream models that rely only on price data. In particular, our model outperforms the widely-recognized GARCH(1,1) model for all sectors in terms of coefficient of determination $R^2$, $MSE$ and $MAE$, achieving the best performance when training from both news and price data.
    Date: 2018–12
  13. By: William Nganga (Strathmore University); Julien Chevallier (UP8 - Université Paris 8 Vincennes-Saint-Denis); Simon Ndiritu (Strathmore University)
    Abstract: This study seeks to investigate the nature of fiscal policy regime in Kenya, and the extent to which fiscal policy is sustainable in the long run by taking into account periodic regime changes. Markov-switching models were used to determine fiscal policy regimes endogenously. Regime switching tests were used to test whether the No-Ponzi game condition and the debt stabilizing condition were met. The results established that the regime-switching model was suitable in explaining regime sustainable and sustainable cycles. An investigation of fiscal policy regimes established that both sustainable and unsustainable regimes were dominant and each lasted for an average of four years. There was evidence to suggest the existence of procyclical fiscal policy in Kenya. Regime switching tests for long-run sustainability suggested that the No-Ponzi game condition weakly holds in the Kenyan economy. Regime-based sensitivity analysis suggests that the persistence of unsustainability regime for more than four years could threaten long-run fiscal sustainability. Sensitivity tests are conducted by resorting to (i) Self-Exciting Threshold Autoregressive Models at the country-level, and (ii) non-linear Granger causalities across a Feed- Forward Artificial Neural Network composed of East-African countries (Burundi, Kenya, Rwanda, Tanzania and Uganda).
    Keywords: Fiscal policy; Markov-switching; No-Ponzi game condition; SETAR; Non-linear Granger causality; Feed-Forward Artificial Neural Network
    Date: 2018–11–30
  14. By: Brainard, Lael (Board of Governors of the Federal Reserve System (U.S.))
    Date: 2018–11–13
  15. By: Neng-Chieh Chang
    Abstract: This paper discusses difference-in-differences (DID) estimation when there exist many control variables, potentially more than the sample size. In this case, traditional estimation methods, which require a limited number of variables, do not work. One may consider using statistical or machine learning (ML) methods. However, by the well-known theory of inference of ML methods proposed in Chernozhukov et al. (2018), directly applying ML methods to the conventional semiparametric DID estimators will cause significant bias and make these DID estimators fail to be sqrt{N}-consistent. This article proposes three new DID estimators for three different data structures, which are able to shrink the bias and achieve sqrt{N}-consistency and asymptotic normality with mean zero when applying ML methods. This leads to straightforward inferential procedures. In addition, I show that these new estimators have the small bias property (SBP), meaning that their bias will converge to zero faster than the pointwise bias of the nonparametric estimator on which it is based.
    Date: 2018–12
  16. By: Adriano Koshiyama; Nick Firoozye; Philip Treleaven
    Abstract: Systematic trading strategies are algorithmic procedures that allocate assets aiming to optimize a certain performance criterion. To obtain an edge in a highly competitive environment, the analyst needs to proper fine-tune its strategy, or discover how to combine weak signals in novel alpha creating manners. Both aspects, namely fine-tuning and combination, have been extensively researched using several methods, but emerging techniques such as Generative Adversarial Networks can have an impact into such aspects. Therefore, our work proposes the use of Conditional Generative Adversarial Networks (cGANs) for trading strategies calibration and aggregation. To this purpose, we provide a full methodology on: (i) the training and selection of a cGAN for time series data; (ii) how each sample is used for strategies calibration; and (iii) how all generated samples can be used for ensemble modelling. To provide evidence that our approach is well grounded, we have designed an experiment with multiple trading strategies, encompassing 579 assets. We compared cGAN with an ensemble scheme and model validation methods, both suited for time series. Our results suggest that cGANs are a suitable alternative for strategies calibration and combination, providing outperformance when the traditional techniques fail to generate any alpha.
    Date: 2019–01
  17. By: Victor Chernozhukov; Kaspar Wuthrich; Yinchu Zhu
    Abstract: This paper studies inference on treatment effects in aggregate panel data settings with a single treated unit and many control units. We propose new methods for making inference on average treatment effects in settings where both the number of pre-treatment and the number of post-treatment periods are large. We use linear models to approximate the counterfactual mean outcomes in the absence of the treatment. The counterfactuals are estimated using constrained Lasso, an essentially tuning free regression approach that nests difference-in-differences and synthetic control as special cases. We propose a $K$-fold cross-fitting procedure to remove the bias induced by regularization. To avoid the estimation of the long run variance, we construct a self-normalized $t$-statistic. The test statistic has an asymptotically pivotal distribution (a student $t$-distribution with $K-1$ degrees of freedom), which makes our procedure very easy to implement. Our approach has several theoretical advantages. First, it does not rely on any sparsity assumptions. Second, it is fully robust against misspecification of the linear model. Third, it is more efficient than difference-in-means and difference-in-differences estimators. The proposed method demonstrates an excellent performance in simulation experiments, and is taken to a data application, where we re-evaluate the economic consequences of terrorism.
    Date: 2018–12

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.