Big Data
http://lists.repec.orgmailman/listinfo/nep-big
Big Data
2017-09-17
Scalable Price Targeting
http://d.repec.org/n?u=RePEc:nbr:nberwo:23775&r=big
We study the welfare implications of scalable price targeting, an extreme form of third-degree price discrimination implemented with machine learning for a large, digital firm. Targeted prices are computed by solving the firm's Bayesian Decision-Theoretic pricing problem based on a database with a high-dimensional vector of customer features that are observed prior to the price quote. To identify the causal effect of price on demand, we first run a large, randomized price experiment and use these data to train our demand model. We use l1 regularization (lasso) to select the set of customer features that moderate the heterogeneous treatment effect of price on demand. We use a weighted likelihood Bayesian bootstrap to quantify the firm's approximate statistical uncertainty in demand and profitability. We then conduct a second experiment that implements our proposed price targeting scheme out of sample. Theoretically, both firm and customer surplus could rise with scalable price targeting. Optimized uniform pricing improves revenues by 64.9% relative to the control pricing, whereas scalable price targeting improves revenues by 81.5%. Firm profits increase by over 10% under targeted pricing relative to optimal uniform pricing. Customer surplus declines by less than 1% with price targeting; although nearly 70% of customers are charged less than the uniform price. Our weighted likelihood bootstrap estimator also predicts demand and demand uncertainty out of sample better than several alternative approaches.
Jean-Pierre Dubé
Sanjog Misra
2017-09
A Modified Levy Jump-Diffusion Model Based on Market Sentiment Memory for Online Jump Prediction
http://d.repec.org/n?u=RePEc:arx:papers:1709.03611&r=big
In this paper, we propose a modified Levy jump diffusion model with market sentiment memory for stock prices, where the market sentiment comes from data mining implementation using Tweets on Twitter. We take the market sentiment process, which has memory, as the signal of Levy jumps in the stock price. An online learning and optimization algorithm with the Unscented Kalman filter (UKF) is then proposed to learn the memory and to predict possible price jumps. Experiments show that the algorithm provides a relatively good performance in identifying asset return trends.
Zheqing Zhu
Jian-guo Liu
Lei Li
2017-09
Deep Stock Representation Learning: From Candlestick Charts to Investment Decisions
http://d.repec.org/n?u=RePEc:arx:papers:1709.03803&r=big
We propose a novel investment decision strategy based on deep learning. Many conventional algorithmic strategies are based on raw time-series analysis of historical prices. In contrast many human traders make decisions based on visually observing candlestick charts of prices. Our key idea is to endow an algorithmic strategy with the ability to make decisions with a similar kind of visual cues used by human traders. To this end we apply Convolutional AutoEncoder (CAE) to learn an asset representation based on visual inspection of the asset's trading history. Based on this representation we propose a novel portfolio construction strategy by: (i) using the deep learned representation and modularity optimisation to cluster stocks and identify diverse sectors, (ii) picking stocks within each cluster according to their Sharpe ratio. Overall this strategy provides low-risk high-return portfolios. We use the Financial Times Stock Exchange 100 Index (FTSE 100) data for evaluation. Results show our portfolio outperforms FTSE 100 index and many well known funds in terms of total return in 2000 trading days.
Guosheng Hu
Yuxin Hu
Kai Yang
Zehao Yu
Flood Sung
Zhihong Zhang
Fei Xie
Jianguo Liu
Neil Robertson
Timothy Hospedales
Qiangwei Miemie
2017-09
Rights on Data: The EU Communication ‘Building a European Data Economy’ From an Economic Perspective
http://d.repec.org/n?u=RePEc:mar:magkse:201735&r=big
In its Communication "Building a European data economy" the EU Commission discusses the introduction of a new exclusive property right on data ("data producer right") for non-personal (or anonymised) machine-generated data, and mandatory access rights to privately held data for achieving more access, transfer and reuse of data, esp. in the context of "Internet of Things" applications. This article analyzes the problem of "rights on data" from an economic perspective (incentive problem, data markets, bargaining power problems, access problems in multi-stakeholder situations) and the reasonings and proposals in the Communication from an economic perspective. Important results are that a "data producer right" cannot be recom-mended but that access rights to data can be part of specifically tailored data governance solu-tions in certain sectors.
Wolfgang Kerber
Big Data, machine-generated data, data ownership, data access, data markets, internet of things
2017
Support Spinor Machine
http://d.repec.org/n?u=RePEc:arx:papers:1709.03943&r=big
We generalize a support vector machine to a support spinor machine by using the mathematical structure of wedge product over vector machine in order to extend field from vector field to spinor field. The separated hyperplane is extended to Kolmogorov space in time series data which allow us to extend a structure of support vector machine to a support tensor machine and a support tensor machine moduli space. Our performance test on support spinor machine is done over one class classification of end point in physiology state of time series data after empirical mode analysis and compared with support vector machine test. We implement algorithm of support spinor machine by using Holo-Hilbert amplitude modulation for fully nonlinear and nonstationary time series data analysis.
Kabin Kanjamapornkul
Richard Pin\v{c}\'ak
Sanphet Chunithpaisan
Erik Barto\v{s}
2017-09
eltmle: Ensemble learning targeted maximum likelihood estimation
http://d.repec.org/n?u=RePEc:boc:usug17:18&r=big
Modern Epidemiology has been able to identify significant limitations of classic epidemiological methods, like outcome regression analysis, when estimating causal quantities such as the average treatment effect (ATE) for observational data. For example, using classical regression models to estimate the ATE requires assuming the effect measure is constant across levels of confounders included in the model, i.e. that there is no effect modification. Other methods do not require this assumption, including g-methods (e.g. the g- formula) and targeted maximum likelihood estimation (TMLE). Many estimators of the ATE but not all rely on parametric modeling assumptions. Therefore, the correct model specification is crucial to obtain unbiased estimates of the true ATE. TMLE is a semiparametric, efficient substitution estimator allowing for data-adaptive estimation while obtaining valid statistical inference based on the targeted minimum loss-based estimation. Being doubly robust, TMLE allows inclusion of machine learning algorithms to minimise the risk of model misspecification, a problem that persists for competing estimators. Evidence shows that TMLE typically provides the least unbiased estimates of the ATE compared with other double robust estimators. eltmle is a Stata program implementing the targeted maximum likelihood estimation for the ATE for a binary outcome and binary treatment. eltmle includes the use of a super-learner called from the Super Learner R-package v.2.0-21 (Polley E., et al. 2011). The Super-Learner uses V-fold cross-validation (10-fold by default) to assess the performance of prediction regarding the potential outcomes and the propensity score as weighted averages of a set of machine learning algorithms. We used the default Super Learner algorithms implemented in the base installation of the tmle-R package v.1.2.0- 5 (Susan G. and Van der Laan M., 2017), which included the following: i) stepwise selection, ii) generalized linear modelling (GLM), iii) a GLM variant that includes second order polynomials and two-by-two interactions of the main terms included in the model. Additionally, eltmle users will have the option to include Bayes Generalized Linear Models and Generalised Additive Models as additional Super-Learner algorithms. Future implementations will offer more advanced machine learning algorithms.
Miguel-Angel Luque Fernandez
2017-09-14
kmatch: Kernel matching with automatic bandwidth selection
http://d.repec.org/n?u=RePEc:boc:usug17:11&r=big
In this talk I will present a new matching software for Stata called kmatch. The command matches treated and untreated observations with respect to covariates and, if outcome variables are provided, estimates treatment effects based on the matched observations, optionally including regression adjustment bias-correction. Multivariate (Mahalanobis) distance matching as well as propensity score matching is supported, either using kernel matching, ridge matching, or nearest-neighbor matching. For kernel and ridge matching, several methods for data-driven bandwidth selection such as cross-validation are offered. The package also includes various commands for evaluating balancing and common-support violations. A focus of the talk will be on how kernel and ridge matching with automatic bandwidth selection compare to nearest-neighbor matching.
Ben Jann
2017-09-14