nep-big New Economics Papers
on Big Data
Issue of 2018‒11‒19
eight papers chosen by
Tom Coupé
University of Canterbury

  1. Instrument Validity Tests with Causal Trees: With an Application to the Same-sex Instrument By Guber, Raphael
  2. Deep Learning can Replicate Adaptive Traders in a Limit-Order-Book Financial Market By Arthur le Calvez; Dave Cliff
  3. Using Stock Prices as Ground Truth in Sentiment Analysis to Generate Profitable Trading Signals By Ellie Birbeck; Dave Cliff
  4. Machine Learning for Regularized Survey Forecast Combination: Partially Egalitarian Lasso and its Derivatives By Francis X. Diebold; Minchul Shin
  5. The use of social media and artificial intelligence tools by online doctoral students at the thesis stage By Ruolan Wang; José Reis-Jorge; Lucilla Crosta; Anthony Edwards; Mageswary Mudaliar
  6. Multi-channel discourse as an indicator for Bitcoin price and volume movements By Marvin Aron Kennis
  7. Mainstreaming Biodiversity in Development Cooperation Projects through the Application of Mitigation Hierarchy and Green Infrastructure Approaches By Tetsuya Kamijo
  8. La donnée, une marchandise comme les autres ? By Henri Isaac

  1. By: Guber, Raphael (Munich Center for the Economics of Aging (MEA))
    Abstract: The use of instrumental variables (IVs) to identify causal effects is widespread in empirical economics, but it is fundamentally impossible to proof their validity. However, assumptions sufficient for the identification of local average treatment effects (LATEs) jointly generate necessary conditions in the observed data that allow to refute an IV's validity. Suitable tests exist, but they may not be able to detect even severe violations of IV validity in practice. In this paper, we employ recently developed machine learning tools as data-driven improvements for these tests. Specifically, we use the causal tree (CT) algorithm from Athey and Imbens (2016) to directly search the covariate space for violations of the LATE assumptions. The new approach is applied to the sibling sex composition instrument in census data from China and the United States. We expect that, because of son preferences, the siblings sex instrument is invalid in the Chinese context. However, existing IV validity tests are unable to detect violations, while our CT based procedure does.
    JEL: C12 C18 C26
    Date: 2018–09–17
  2. By: Arthur le Calvez; Dave Cliff
    Abstract: We report successful results from using deep learning neural networks (DLNNs) to learn, purely by observation, the behavior of profitable traders in an electronic market closely modelled on the limit-order-book (LOB) market mechanisms that are commonly found in the real-world global financial markets for equities (stocks & shares), currencies, bonds, commodities, and derivatives. Successful real human traders, and advanced automated algorithmic trading systems, learn from experience and adapt over time as market conditions change; our DLNN learns to copy this adaptive trading behavior. A novel aspect of our work is that we do not involve the conventional approach of attempting to predict time-series of prices of tradeable securities. Instead, we collect large volumes of training data by observing only the quotes issued by a successful sales-trader in the market, details of the orders that trader is executing, and the data available on the LOB (as would usually be provided by a centralized exchange) over the period that the trader is active. In this paper we demonstrate that suitably configured DLNNs can learn to replicate the trading behavior of a successful adaptive automated trader, an algorithmic system previously demonstrated to outperform human traders. We also demonstrate that DLNNs can learn to perform better (i.e., more profitably) than the trader that provided the training data. We believe that this is the first ever demonstration that DLNNs can successfully replicate a human-like, or super-human, adaptive trader operating in a realistic emulation of a real-world financial market. Our results can be considered as proof-of-concept that a DLNN could, in principle, observe the actions of a human trader in a real financial market and over time learn to trade equally as well as that human trader, and possibly better.
    Date: 2018–11
  3. By: Ellie Birbeck; Dave Cliff
    Abstract: The increasing availability of "big" (large volume) social media data has motivated a great deal of research in applying sentiment analysis to predict the movement of prices within financial markets. Previous work in this field investigates how the true sentiment of text (i.e. positive or negative opinions) can be used for financial predictions, based on the assumption that sentiments expressed online are representative of the true market sentiment. Here we consider the converse idea, that using the stock price as the ground-truth in the system may be a better indication of sentiment. Tweets are labelled as Buy or Sell dependent on whether the stock price discussed rose or fell over the following hour, and from this, stock-specific dictionaries are built for individual companies. A Bayesian classifier is used to generate stock predictions, which are input to an automated trading algorithm. Placing 468 trades over a 1 month period yields a return rate of 5.18%, which annualises to approximately 83% per annum. This approach performs significantly better than random chance and outperforms two baseline sentiment analysis methods tested.
    Date: 2018–11
  4. By: Francis X. Diebold (Department of Economics, University of Pennsylvania); Minchul Shin (Department of Economics, University of Illinois)
    Abstract: Despite the clear success of forecast combination in many economic environments, several important issues remain incompletely resolved. The issues relate to selection of the set of forecasts to combine, and whether some form of additional regularization (e.g., shrinkage) is desirable. Against this background, and also considering the frequently-found good performance of simple-average combinations, we propose a LASSO-based procedure that sets some combining weights to zero and shrinks the survivors toward equality (“partially-egalitarian LASSO†). Ex-post analysis reveals that the optimal solution has a very simple form: The vast majority of forecasters should be discarded, and the remainder should be averaged. We therefore propose and explore direct subset-averaging procedures motivated by the structure of partially-egalitarian LASSO and the lessons learned, which, unlike LASSO, do not require choice of a tuning parameter. Intriguingly, in an application to the European Central Bank Survey of Professional Forecasters, our procedures outperform simple average and median forecasts – indeed they perform approximately as well as the ex-post best forecaster.
    Keywords: Forecast combination, forecast surveys, shrinkage, model selection, LASSO, regularization
    JEL: C53
    Date: 2018–08–17
  5. By: Ruolan Wang (Laureate Online Education in partnership with the University of Liverpool); José Reis-Jorge (Laureate Online Education in partnership with the University of Liverpool); Lucilla Crosta (Laureate Online Education in partnership with the University of Liverpool); Anthony Edwards (Laureate Online Education in partnership with the University of Liverpool); Mageswary Mudaliar (Laureate Online Education in partnership with the University of Liverpool)
    Abstract: Our paper aims to explore how the doctoral students made use of digital technologies - Social Media (SM) and Artificial Intelligence (AI) tools - in the thesis stage of their fully online doctoral studies and what impact those tools had on their studies. Data were collected from an online survey (n=28) and a series of semi-structured interviews (n=9). The analysis of the survey data informed the qualitative phase of data collection. Both survey and interview data show a similar pattern of digital technologies uses in which for our participants SM tools far outpaces the usages of AI tools. We argue that the unique characteristics of the online doctoral students might have determined the popularity of some digital tools. The study findings help us to better understand students digital experience as both individuals and learners.
    Keywords: Online doctoral studies, doctoral students, EdD programme, digital tools, social media, artificial intelligence
    JEL: I23
    Date: 2018–11
  6. By: Marvin Aron Kennis
    Abstract: This research aims to identify how Bitcoin-related news publications and online discourse are expressed in Bitcoin exchange movements of price and volume. Being inherently digital, all Bitcoin-related fundamental data (from exchanges, as well as transactional data directly from the blockchain) is available online, something that is not true for traditional businesses or currencies traded on exchanges. This makes Bitcoin an interesting subject for such research, as it enables the mapping of sentiment to fundamental events that might otherwise be inaccessible. Furthermore, Bitcoin discussion largely takes place on online forums and chat channels. In stock trading, the value of sentiment data in trading decisions has been demonstrated numerous times [1] [2] [3], and this research aims to determine whether there is value in such data for Bitcoin trading models. To achieve this, data over the year 2015 has been collected from, (the biggest Bitcoin forum in post volume), established news sources such as Bloomberg and the Wall Street Journal, the complete /r/btc and /r/Bitcoin subreddits, and the bitcoin-otc and bitcoin-dev IRC channels. By analyzing this data on sentiment and volume, we find weak to moderate correlations between forum, news, and Reddit sentiment and movements in price and volume from 1 to 5 days after the sentiment was expressed. A Granger causality test confirms the predictive causality of the sentiment on the daily percentage price and volume movements, and at the same time underscores the predictive causality of market movements on sentiment expressions in online communities
    Date: 2018–11
  7. By: Tetsuya Kamijo
    Abstract: The importance of biodiversity to human welfare is widely recognized and environmental impact assessment (EIA) is regarded as a useful tool to minimize adverse impacts on biodiversity due to development. However, biodiversity loss continues in particular in developing countries though biodiversity-inclusive assessment has been implemented for a long time. The purpose of this working paper is to propose a practical approach for mainstreaming biodiversity into development cooperation projects. This paper examines the biodiversity mitigation measures of 120 EIA reports prepared by the Japan International Cooperation Agency from 2001 to 2012 using quantitative text analysis. The present biodiversity considerations are inadequately addressed and the avoidance measures are quite scarce. Ecosystems have multiple benefits and it is worthwhile to incorporate their benefits into development cooperation projects. The application of mitigation hierarchy aiming for no net loss and green infrastructure approaches to make wise use of ecosystem services can be one solution to stop biodiversity loss and satisfy development needs.
    Keywords: biodiversity, ecosystem services, mitigation hierarchy, green infrastructure, ecosystem-based disaster risk reduction
    Date: 2018–09
  8. By: Henri Isaac (DRM - Dauphine Recherches en Management - Université Paris-Dauphine - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Many economic agents see data as the new raw material for the 21st. century. The catchment, possession and use of data are, accordingly, a new source of wealth, evidence of this being the success of some digital firms. However their characteristics keep electronic data from being ordinary merchandise. Besides, the use and exchange values of data depend on the legal framework with its regulations about producing exchanging data.
    Abstract: La donnée apparaît, aux yeux de nombreux acteurs économiques, comme une nouvelle matière première, une nouvelle marchandise du XXIe siècle. Sa captation, sa possession et son exploitation seraient la source de nouvelles richesses, comme certaines réussites d'entreprises numériques le démontreraient. Cependant, les caractéristiques de la donnée numérique sont loin d'en faire une marchandise comme les autres. Plus encore, la valeur d'usage et la valeur d'échange des données sont conditionnées par le régime juridique de production et d'échange de ces données.
    Keywords: big data
    Date: 2018

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.