nep-big 2017-09-17 papers

on Big Data

Issue of 2017–09–17
seven papers chosen by
Tom Coupé, University of Canterbury

Scalable Price Targeting By Jean-Pierre Dubé; Sanjog Misra
A Modified Levy Jump-Diffusion Model Based on Market Sentiment Memory for Online Jump Prediction By Zheqing Zhu; Jian-guo Liu; Lei Li
Deep Stock Representation Learning: From Candlestick Charts to Investment Decisions By Guosheng Hu; Yuxin Hu; Kai Yang; Zehao Yu; Flood Sung; Zhihong Zhang; Fei Xie; Jianguo Liu; Neil Robertson; Timothy Hospedales; Qiangwei Miemie
Rights on Data: The EU Communication ‘Building a European Data Economy’ From an Economic Perspective By Wolfgang Kerber
Support Spinor Machine By Kabin Kanjamapornkul; Richard Pin\v{c}\'ak; Sanphet Chunithpaisan; Erik Barto\v{s}
eltmle: Ensemble learning targeted maximum likelihood estimation By Miguel-Angel Luque Fernandez
kmatch: Kernel matching with automatic bandwidth selection By Ben Jann

By:	Jean-Pierre Dubé; Sanjog Misra
Abstract:	We study the welfare implications of scalable price targeting, an extreme form of third-degree price discrimination implemented with machine learning for a large, digital firm. Targeted prices are computed by solving the firm's Bayesian Decision-Theoretic pricing problem based on a database with a high-dimensional vector of customer features that are observed prior to the price quote. To identify the causal effect of price on demand, we first run a large, randomized price experiment and use these data to train our demand model. We use l1 regularization (lasso) to select the set of customer features that moderate the heterogeneous treatment effect of price on demand. We use a weighted likelihood Bayesian bootstrap to quantify the firm's approximate statistical uncertainty in demand and profitability. We then conduct a second experiment that implements our proposed price targeting scheme out of sample. Theoretically, both firm and customer surplus could rise with scalable price targeting. Optimized uniform pricing improves revenues by 64.9% relative to the control pricing, whereas scalable price targeting improves revenues by 81.5%. Firm profits increase by over 10% under targeted pricing relative to optimal uniform pricing. Customer surplus declines by less than 1% with price targeting; although nearly 70% of customers are charged less than the uniform price. Our weighted likelihood bootstrap estimator also predicts demand and demand uncertainty out of sample better than several alternative approaches.
JEL:	C11 C93 D4 L11 M3
Date:	2017–09
URL:	https://d.repec.org/n?u=RePEc:nbr:nberwo:23775

A Modified Levy Jump-Diffusion Model Based on Market Sentiment Memory for Online Jump Prediction

By:	Zheqing Zhu; Jian-guo Liu; Lei Li
Abstract:	In this paper, we propose a modified Levy jump diffusion model with market sentiment memory for stock prices, where the market sentiment comes from data mining implementation using Tweets on Twitter. We take the market sentiment process, which has memory, as the signal of Levy jumps in the stock price. An online learning and optimization algorithm with the Unscented Kalman filter (UKF) is then proposed to learn the memory and to predict possible price jumps. Experiments show that the algorithm provides a relatively good performance in identifying asset return trends.
Date:	2017–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1709.03611

Deep Stock Representation Learning: From Candlestick Charts to Investment Decisions

By:	Guosheng Hu; Yuxin Hu; Kai Yang; Zehao Yu; Flood Sung; Zhihong Zhang; Fei Xie; Jianguo Liu; Neil Robertson; Timothy Hospedales; Qiangwei Miemie
Abstract:	We propose a novel investment decision strategy based on deep learning. Many conventional algorithmic strategies are based on raw time-series analysis of historical prices. In contrast many human traders make decisions based on visually observing candlestick charts of prices. Our key idea is to endow an algorithmic strategy with the ability to make decisions with a similar kind of visual cues used by human traders. To this end we apply Convolutional AutoEncoder (CAE) to learn an asset representation based on visual inspection of the asset's trading history. Based on this representation we propose a novel portfolio construction strategy by: (i) using the deep learned representation and modularity optimisation to cluster stocks and identify diverse sectors, (ii) picking stocks within each cluster according to their Sharpe ratio. Overall this strategy provides low-risk high-return portfolios. We use the Financial Times Stock Exchange 100 Index (FTSE 100) data for evaluation. Results show our portfolio outperforms FTSE 100 index and many well known funds in terms of total return in 2000 trading days.
Date:	2017–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1709.03803

Rights on Data: The EU Communication ‘Building a European Data Economy’ From an Economic Perspective

By:	Wolfgang Kerber (University of Marburg)
Abstract:	In its Communication "Building a European data economy" the EU Commission discusses the introduction of a new exclusive property right on data ("data producer right") for non-personal (or anonymised) machine-generated data, and mandatory access rights to privately held data for achieving more access, transfer and reuse of data, esp. in the context of "Internet of Things" applications. This article analyzes the problem of "rights on data" from an economic perspective (incentive problem, data markets, bargaining power problems, access problems in multi-stakeholder situations) and the reasonings and proposals in the Communication from an economic perspective. Important results are that a "data producer right" cannot be recom-mended but that access rights to data can be part of specifically tailored data governance solu-tions in certain sectors.
Keywords:	Big Data, machine-generated data, data ownership, data access, data markets, internet of things
JEL:	L86 O34
Date:	2017
URL:	https://d.repec.org/n?u=RePEc:mar:magkse:201735

Support Spinor Machine

By:	Kabin Kanjamapornkul; Richard Pin\v{c}\'ak; Sanphet Chunithpaisan; Erik Barto\v{s}
Abstract:	We generalize a support vector machine to a support spinor machine by using the mathematical structure of wedge product over vector machine in order to extend field from vector field to spinor field. The separated hyperplane is extended to Kolmogorov space in time series data which allow us to extend a structure of support vector machine to a support tensor machine and a support tensor machine moduli space. Our performance test on support spinor machine is done over one class classification of end point in physiology state of time series data after empirical mode analysis and compared with support vector machine test. We implement algorithm of support spinor machine by using Holo-Hilbert amplitude modulation for fully nonlinear and nonstationary time series data analysis.
Date:	2017–09
URL:	https://d.repec.org/n?u=RePEc:arx:papers:1709.03943

eltmle: Ensemble learning targeted maximum likelihood estimation

By:	Miguel-Angel Luque Fernandez (London School of Hygiene and Tropical Medicine)
Abstract:	Modern Epidemiology has been able to identify significant limitations of classic epidemiological methods, like outcome regression analysis, when estimating causal quantities such as the average treatment effect (ATE) for observational data. For example, using classical regression models to estimate the ATE requires assuming the effect measure is constant across levels of confounders included in the model, i.e. that there is no effect modification. Other methods do not require this assumption, including g-methods (e.g. the g- formula) and targeted maximum likelihood estimation (TMLE). Many estimators of the ATE but not all rely on parametric modeling assumptions. Therefore, the correct model specification is crucial to obtain unbiased estimates of the true ATE. TMLE is a semiparametric, efficient substitution estimator allowing for data-adaptive estimation while obtaining valid statistical inference based on the targeted minimum loss-based estimation. Being doubly robust, TMLE allows inclusion of machine learning algorithms to minimise the risk of model misspecification, a problem that persists for competing estimators. Evidence shows that TMLE typically provides the least unbiased estimates of the ATE compared with other double robust estimators. eltmle is a Stata program implementing the targeted maximum likelihood estimation for the ATE for a binary outcome and binary treatment. eltmle includes the use of a super-learner called from the Super Learner R-package v.2.0-21 (Polley E., et al. 2011). The Super-Learner uses V-fold cross-validation (10-fold by default) to assess the performance of prediction regarding the potential outcomes and the propensity score as weighted averages of a set of machine learning algorithms. We used the default Super Learner algorithms implemented in the base installation of the tmle-R package v.1.2.0- 5 (Susan G. and Van der Laan M., 2017), which included the following: i) stepwise selection, ii) generalized linear modelling (GLM), iii) a GLM variant that includes second order polynomials and two-by-two interactions of the main terms included in the model. Additionally, eltmle users will have the option to include Bayes Generalized Linear Models and Generalised Additive Models as additional Super-Learner algorithms. Future implementations will offer more advanced machine learning algorithms.
Date:	2017–09–14
URL:	https://d.repec.org/n?u=RePEc:boc:usug17:18

kmatch: Kernel matching with automatic bandwidth selection

By:	Ben Jann (Institute of Sociology, University of Bern)
Abstract:	In this talk I will present a new matching software for Stata called kmatch. The command matches treated and untreated observations with respect to covariates and, if outcome variables are provided, estimates treatment effects based on the matched observations, optionally including regression adjustment bias-correction. Multivariate (Mahalanobis) distance matching as well as propensity score matching is supported, either using kernel matching, ridge matching, or nearest-neighbor matching. For kernel and ridge matching, several methods for data-driven bandwidth selection such as cross-validation are offered. The package also includes various commands for evaluating balancing and common-support violations. A focus of the talk will be on how kernel and ridge matching with automatic bandwidth selection compare to nearest-neighbor matching.
Date:	2017–09–14
URL:	https://d.repec.org/n?u=RePEc:boc:usug17:11

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.