nep-big New Economics Papers
on Big Data
Issue of 2020‒03‒09
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. Data and Competition: a General Framework with Applications to Mergers, Market Structure, and Privacy Policy By de Cornière, Alexandre; Taylor, Greg
  2. Inventory effects on the price dynamics of VSTOXX futures quantified via machine learning By Daniel Guterding
  3. Firms Default Prediction with Machine Learning By Tesi Aliaj; Aris Anagnostopoulos; Stefano Piersanti
  4. Cross-sectional Stock Price Prediction using Deep Learning for Actual Investment Management By Masaya Abe; Kei Nakagawa
  5. TPLVM: Portfolio Construction by Student's $t$-process Latent Variable Model By Yusuke Uchiyama; Kei Nakagawa
  6. Does a District-Vote Matter for the Behavior of Politicians? A Textual Analysis of Parliamentary Speeches By Born, Andreas; Janssen, Aljoscha
  7. Deep Learning for Asset Bubbles Detection By Oksana Bashchenko; Alexis Marchal
  8. Double/Debiased Machine Learning for Dynamic Treatment Effects By Greg Lewis; Vasilis Syrgkanis
  9. G-Learner and GIRL: Goal Based Wealth Management with Reinforcement Learning By Matthew Dixon; Igor Halperin
  10. Using Reinforcement Learning in the Algorithmic Trading Problem By Evgeny Ponomarev; Ivan Oseledets; Andrzej Cichocki
  11. Safe Counterfactual Reinforcement Learning By Yusuke Narita; Shota Yasui; Kohei Yata
  12. Trimming the Sail: A Second-order Learning Paradigm for Stock Prediction By Chi Chen; Li Zhao; Wei Cao; Jiang Bian; Chunxiao Xing
  13. Tourism Demand Forecasting with Tourist Attention: An Ensemble Deep Learning Approach By Shaolong Sun; Yanzhao Li; Shouyang Wang; Ju-e Guo
  14. ESG investments: Filtering versus machine learning approaches By Carmine de Franco; Christophe Geissler; Vincent Margot; Bruno Monnier
  15. The interconnectedness of the economic content in the speeches of the US Presidents By Matteo Cinellia Valerio Ficcadenti; Jessica Riccionib

  1. By: de Cornière, Alexandre; Taylor, Greg
    Abstract: What role does data play in competition? This question has been at the center of a fierce debate around competition policy in the digital economy. We use a competition-in-utilities approach to provide a general framework for studying the competitive effects of data, encompassing a wide range of markets where data has many different uses. We identify conditions for data to be unilaterally proor anti-competitive (UPC or UAC). The conditions are simple and often require no information about market demand. We apply our framework to study various applications of data, including training algorithms, targeting advertisements, and personalizing prices. We also show that whether data is UPC or UAC has important implications for policy issues such as data-driven mergers, market structure, and privacy policy.
    JEL: L1 L4 L5
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:tse:wpaper:124102&r=all
  2. By: Daniel Guterding
    Abstract: The VSTOXX index tracks the expected 30-day volatility of the EURO STOXX 50 equity index. Futures on the VSTOXX index can, therefore, be used to hedge against economic uncertainty. We investigate the effect of trader inventory on the price of VSTOXX futures through a combination of stochastic processes and machine learning methods. We formulate a simple and efficient pricing methodology for VSTOXX futures, which assumes a Heston-type stochastic process for the underlying EURO STOXX 50 market. Under these dynamics, approximate analytical formulas for the implied volatility smile and the VSTOXX index have recently been derived. We use the EURO STOXX 50 option implied volatilities and the VSTOXX index value to estimate the parameters of this Heston model. Following the calibration, we calculate theoretical VSTOXX future prices and compare them to the actual market prices. While theoretical and market prices are usually in line, we also observe time periods, during which the market price does not agree with our Heston model. We collect a variety of market features that could potentially explain the price deviations and calibrate two machine learning models to the price difference: a regularized linear model and a random forest. We find that both models indicate a strong influence of accumulated trader positions on the VSTOXX futures price.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.08207&r=all
  3. By: Tesi Aliaj; Aris Anagnostopoulos; Stefano Piersanti
    Abstract: Academics and practitioners have studied over the years models for predicting firms bankruptcy, using statistical and machine-learning approaches. An earlier sign that a company has financial difficulties and may eventually bankrupt is going in \emph{default}, which, loosely speaking means that the company has been having difficulties in repaying its loans towards the banking system. Firms default status is not technically a failure but is very relevant for bank lending policies and often anticipates the failure of the company. Our study uses, for the first time according to our knowledge, a very large database of granular credit data from the Italian Central Credit Register of Bank of Italy that contain information on all Italian companies' past behavior towards the entire Italian banking system to predict their default using machine-learning techniques. Furthermore, we combine these data with other information regarding companies' public balance sheet data. We find that ensemble techniques and random forest provide the best results, corroborating the findings of Barboza et al. (Expert Syst. Appl., 2017).
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.11705&r=all
  4. By: Masaya Abe; Kei Nakagawa
    Abstract: Stock price prediction has been an important research theme both academically and practically. Various methods to predict stock prices have been studied until now. The feature that explains the stock price by a cross-section analysis is called a "factor" in the field of finance. Many empirical studies in finance have identified which stocks having features in the cross-section relatively increase and which decrease in terms of price. Recently, stock price prediction methods using machine learning, especially deep learning, have been proposed since the relationship between these factors and stock prices is complex and non-linear. However, there are no practical examples for actual investment management. In this paper, therefore, we present a cross-sectional daily stock price prediction framework using deep learning for actual investment management. For example, we build a portfolio with information available at the time of market closing and invest at the time of market opening the next day. We perform empirical analysis in the Japanese stock market and confirm the profitability of our framework.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.06975&r=all
  5. By: Yusuke Uchiyama; Kei Nakagawa
    Abstract: Optimal asset allocation is a key topic in modern finance theory. To realize the optimal asset allocation on investor's risk aversion, various portfolio construction methods have been proposed. Recently, the applications of machine learning are rapidly growing in the area of finance. In this article, we propose the Student's $t$-process latent variable model (TPLVM) to describe non-Gaussian fluctuations of financial timeseries by lower dimensional latent variables. Subsequently, we apply the TPLVM to minimum-variance portfolio as an alternative of existing nonlinear factor models. To test the performance of the proposed portfolio, we construct minimum-variance portfolios of global stock market indices based on the TPLVM or Gaussian process latent variable model. By comparing these portfolios, we confirm the proposed portfolio outperforms that of the existing Gaussian process latent variable model.
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.06243&r=all
  6. By: Born, Andreas (Department of Economics); Janssen, Aljoscha (Singapore Management University)
    Abstract: In most democracies, members of parliament are either elected over a party list or by a district. We use a discontinuity in the German parliamentary system to investigate the causal effect of a district-election on an MP’s conformity with her party-line. A district-election does not affect roll call voting behavior causally, possibly due to overall high adherence to party voting. Analyzing the parliamentary speeches of each MP allows us to overcome the high party discipline with regard to parliamentary voting. Using textual analysis and machine learning techniques, we create two measures of closeness of an MP’s speeches to her party. We find that district-elected members of parliament do not differ, in terms of speeches, from those of their party-peers who have been elected through closed party lists. However, both speeches and voting correlate with district characteristics suggesting that district-elections allow districts to select more similar politicians.
    Keywords: Party-line; Textual Analysis; Regression Discontinuity; Parliamentary Speeches; Voting
    JEL: D72
    Date: 2020–02–24
    URL: http://d.repec.org/n?u=RePEc:hhs:iuiwop:1320&r=all
  7. By: Oksana Bashchenko; Alexis Marchal
    Abstract: We develop a methodology for detecting asset bubbles using a neural network. We rely on the theory of local martingales in continuous-time and use a deep network to estimate the diffusion coefficient of the price process more accurately than the current estimator, obtaining an improved detection of bubbles. We show the outperformance of our algorithm over the existing statistical method in a laboratory created with simulated data. We then apply the network classification to real data and build a zero net exposure trading strategy that exploits the risky arbitrage emanating from the presence of bubbles in the US equity market from 2006 to 2008. The profitability of the strategy provides an estimation of the economical magnitude of bubbles as well as support for the theoretical assumptions relied on.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.06405&r=all
  8. By: Greg Lewis; Vasilis Syrgkanis
    Abstract: We consider the estimation of treatment effects in settings when multiple treatments are assigned over time and treatments can have a causal effect on future outcomes. We formulate the problem as a linear state space Markov process with a high dimensional state and propose an extension of the double/debiased machine learning framework to estimate the dynamic effects of treatments. Our method allows the use of arbitrary machine learning methods to control for the high dimensional state, subject to a mean square error guarantee, while still allowing parametric estimation and construction of confidence intervals for the dynamic treatment effect parameters of interest. Our method is based on a sequential regression peeling process, which we show can be equivalently interpreted as a Neyman orthogonal moment estimator. This allows us to show root-n asymptotic normality of the estimated causal effects.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.07285&r=all
  9. By: Matthew Dixon; Igor Halperin
    Abstract: We present a reinforcement learning approach to goal based wealth management problems such as optimization of retirement plans or target dated funds. In such problems, an investor seeks to achieve a financial goal by making periodic investments in the portfolio while being employed, and periodically draws from the account when in retirement, in addition to the ability to re-balance the portfolio by selling and buying different assets (e.g. stocks). Instead of relying on a utility of consumption, we present G-Learner: a reinforcement learning algorithm that operates with explicitly defined one-step rewards, does not assume a data generation process, and is suitable for noisy data. Our approach is based on G-learning - a probabilistic extension of the Q-learning method of reinforcement learning. In this paper, we demonstrate how G-learning, when applied to a quadratic reward and Gaussian reference policy, gives an entropy-regulated Linear Quadratic Regulator (LQR). This critical insight provides a novel and computationally tractable tool for wealth management tasks which scales to high dimensional portfolios. In addition to the solution of the direct problem of G-learning, we also present a new algorithm, GIRL, that extends our goal-based G-learning approach to the setting of Inverse Reinforcement Learning (IRL) where rewards collected by the agent are not observed, and should instead be inferred. We demonstrate that GIRL can successfully learn the reward parameters of a G-Learner agent and thus imitate its behavior. Finally, we discuss potential applications of the G-Learner and GIRL algorithms for wealth management and robo-advising.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.10990&r=all
  10. By: Evgeny Ponomarev; Ivan Oseledets; Andrzej Cichocki
    Abstract: The development of reinforced learning methods has extended application to many areas including algorithmic trading. In this paper trading on the stock exchange is interpreted into a game with a Markov property consisting of states, actions, and rewards. A system for trading the fixed volume of a financial instrument is proposed and experimentally tested; this is based on the asynchronous advantage actor-critic method with the use of several neural network architectures. The application of recurrent layers in this approach is investigated. The experiments were performed on real anonymized data. The best architecture demonstrated a trading strategy for the RTS Index futures (MOEX:RTSI) with a profitability of 66% per annum accounting for commission. The project source code is available via the following link: http://github.com/evgps/a3c_trading.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.11523&r=all
  11. By: Yusuke Narita; Shota Yasui; Kohei Yata
    Abstract: We develop a method for predicting the performance of reinforcement learning and bandit algorithms, given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at the fast $\sqrt{N}$ rate, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are really important. These theoretical guarantees make our estimator safe to use. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.08536&r=all
  12. By: Chi Chen; Li Zhao; Wei Cao; Jiang Bian; Chunxiao Xing
    Abstract: Nowadays, machine learning methods have been widely used in stock prediction. Traditional approaches assume an identical data distribution, under which a learned model on the training data is fixed and applied directly in the test data. Although such assumption has made traditional machine learning techniques succeed in many real-world tasks, the highly dynamic nature of the stock market invalidates the strict assumption in stock prediction. To address this challenge, we propose the second-order identical distribution assumption, where the data distribution is assumed to be fluctuating over time with certain patterns. Based on such assumption, we develop a second-order learning paradigm with multi-scale patterns. Extensive experiments on real-world Chinese stock data demonstrate the effectiveness of our second-order learning paradigm in stock prediction.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.06878&r=all
  13. By: Shaolong Sun; Yanzhao Li; Shouyang Wang; Ju-e Guo
    Abstract: The large amount of tourism-related data presents a series of challenges for tourism demand forecasting, including data deficiencies, multicollinearity and long calculation time. A Bagging-based multivariate ensemble deep learning model, integrating Stacked Autoencoders and KELM (B-SAKE) is proposed to address these challenges in this study. We forecast tourist arrivals arriving in Beijing from four countries adopting historical data on tourist arrivals arriving in Beijing, economic indicators and tourist online behavior variables. The results from the cases of four origin countries suggest that our proposed B-SAKE model outperforms than benchmark models whether in horizontal accuracy, directional accuracy or statistical significance. Both Bagging and Stacked Autoencoder can improve the forecasting performance of the models. Moreover, the forecasting performance of the models is evaluated with consistent results by means of the multi-step-ahead forecasting scheme.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.07964&r=all
  14. By: Carmine de Franco; Christophe Geissler; Vincent Margot; Bruno Monnier
    Abstract: We designed a machine learning algorithm that identifies patterns between ESG profiles and financial performances for companies in a large investment universe. The algorithm consists of regularly updated sets of rules that map regions into the high-dimensional space of ESG features to excess return predictions. The final aggregated predictions are transformed into scores which allow us to design simple strategies that screen the investment universe for stocks with positive scores. By linking the ESG features with financial performances in a non-linear way, our strategy based upon our machine learning algorithm turns out to be an efficient stock picking tool, which outperforms classic strategies that screen stocks according to their ESG ratings, as the popular best-in-class approach. Our paper brings new ideas in the growing field of financial literature that investigates the links between ESG behavior and the economy. We show indeed that there is clearly some form of alpha in the ESG profile of a company, but that this alpha can be accessed only with powerful, non-linear techniques such as machine learning.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.07477&r=all
  15. By: Matteo Cinellia Valerio Ficcadenti; Jessica Riccionib
    Abstract: The speeches stated by influential politicians can have a decisive impact on the future of a country. In particular, the economic content of such speeches affects the economy of countries and their financial markets. For this reason, we examine a novel dataset containing the economic content of 951 speeches stated by 45 US Presidents from George Washington (April 1789) to Donald Trump (February 2017). In doing so, we use an economic glossary carried out by means of text mining techniques. The goal of our study is to examine the structure of significant interconnections within a network obtained from the economic content of presidential speeches. In such a network, nodes are represented by talks and links by values of cosine similarity, the latter computed using the occurrences of the economic terms in the speeches. The resulting network displays a peculiar structure made up of a core (i.e. a set of highly central and densely connected nodes) and a periphery (i.e. a set of non-central and sparsely connected nodes). The presence of different economic dictionaries employed by the Presidents characterize the core-periphery structure. The Presidents' talks belonging to the network's core share the usage of generic (non-technical) economic locutions like "interest" or "trade". While the use of more technical and less frequent terms characterizes the periphery (e.g. "yield" ). Furthermore, the speeches close in time share a common economic dictionary. These results together with the economics glossary usages during the US periods of boom and crisis provide unique insights on the economic content relationships among Presidents' speeches.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.07880&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.