nep-big New Economics Papers
on Big Data
Issue of 2018‒07‒30
nine papers chosen by
Tom Coupé
University of Canterbury

  1. Generic machine learning inference on heterogenous treatment effects in randomized experiments By Victor Chernozhukov; Mert Demirer; Esther Duflo; Ivan Fernandez-Val
  2. "Bitcoin Technical Trading with Articial Neural Network" By Masafumi Nakano; Akihiko Takahashi; Soichiro Takahashi
  3. How to transfer real time and big data management control methods of the industry to the service? By Amaury Gayet
  4. Exact and robust conformal inference methods for predictive machine learning with dependent data By Victor Chernozhukov; Kaspar Wüthrich; Yinchu Zhu
  5. Prediction for Big Data through Kriging : Small Sequential and One-Shot Designs By Kleijnen, J.P.C.; van Beers, W.C.M.
  6. Financial Trading as a Game: A Deep Reinforcement Learning Approach By Chien Yi Huang
  7. Using job vacancies to understand the effects of labour market mismatch on UK output and productivity By Turrell, Arthur; Speigner, Bradley; Djumalieva, Jyldyz; Copple, David; Thurgood, James
  8. Digitization, Computerization, Networking, Automation, and the Future of Jobs in Japan By IWAMOTO Koichi; TANOUE Yuta
  9. ICOs success drivers: a textual and statistical analysis By Paola Cerchiello; Anca Mirela Toma

  1. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Mert Demirer (Institute for Fiscal Studies); Esther Duflo (Institute for Fiscal Studies); Ivan Fernandez-Val (Institute for Fiscal Studies and Boston University)
    Abstract: We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. Our approach is agnostic and does not make unrealistic or hard-to-check assumptions; we don’t require conditions for consistency of the ML methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. The inference method could be of substantial independent interest in many machine learning applications. An empirical application to the impact of micro-credit on economic development illustrates the use of the approach in randomized experiments. An additional application to the impact of the gender discrimination on wages illustrates the potential use of the approach in observational studies, where machine learning methods can be used to condition flexibly on very high-dimensional controls.
    Keywords: Agnostic Inference, Machine Learning, Confidence Intervals, Causal Effects, Variational P-values and Confidence Intervals, Uniformly Valid Inference, Quantification of Uncertainty, Sample Splitting, Multiple Splitting, Assumption-Freeness
    Date: 2017–12–30
  2. By: Masafumi Nakano (Graduate School of Economics, Faculty of Economics, The University of Tokyo); Akihiko Takahashi (Faculty of Economics, The University of Tokyo); Soichiro Takahashi (Graduate School of Economics, Faculty of Economics, The University of Tokyo)
    Abstract: This paper explores Bitcoin intraday technical trading based on artificial neural networks for the return prediction. In particular, our deep learning method successfully discovers trading signals through a seven layered neural network structure for given input data of technical indicators, which are calculated by the past time-series data over every 15 minutes. Under feasible settings of execution costs, the numerical experiments demonstrate that our approach significantly improves the performance of a buy-and-hold strategy. Especially, our model performs well for a challenging period from December 2017 to January 2018, during which Bitcoin suffers from substantial minus returns. Furthermore, various sensitivity analysis is implemented for the change of the number of layers, activation functions, input data and output classification to confirm the robustness of our approach.
    Date: 2018–07
  3. By: Amaury Gayet (CEROS - Centre d'Etudes et de Recherches sur les Organisations et la Stratégie - UPN - Université Paris Nanterre)
    Abstract: Our two previous papers (Gayet and al., 2014 ; 2016) at Manufacturing Accounting Research (MAR) proposed a reflection for a real-time management control method in the industry. The proposal for this paper is to initiate a transfer to the service. The approach is to summarize our previous main contributions, to define the conceptual framework - which consists in describing the real-time Information Systems (IS) architecture and big data and a methodology to deploy a solution of Business Intelligence (BI) -, then, present the solution implemented in the industry to open a reflection on its transferability. The paper proposes an approach similar to that deployed in an editor, leader of manufacturing intelligence, for the service sector by an analogy with the main software of the market. This research raises multiple implications because the digital transformation of organizations is strategic. The practical implications are to help in the deployment of real-time and big data solutions and should not evade the social implications that implies a profound change in managerial practices. The originality of this paper is to build a bridge between the industrial world, characterized by the rigor of the procedures, and that of the services
    Abstract: Nos deux précédents papiers (Gayet et al., 2014 ; 2016) à Manufacturing Accounting Research ont proposé une réflexion pour une méthode de contrôle de gestion temps réel dans l'industrie. La proposition de ce papier est d'amorcer un transfert au service. L'approche consiste à résumer nos principaux apports précédents, à définir le cadre conceptuel – qui consiste à décrire l'architecture des Systèmes d'Information (SI) temps réel et big data et une méthodologie pour déployer une solution de Business Intelligence (BI) –, puis, présenter la solution mise en place dans l'industrie pour ouvrir une réflexion sur sa transférabilité. Le papier propose une approche similaire à celle déployée chez un éditeur, leader du manufacturing intelligence, pour le secteur du service par une analogie avec les principaux logiciels du marché. Cette recherche soulève de multiples implications car la transformation digitale des organisations est stratégique. Les implications pratiques consistent à aider dans le déploiement de solutions temps réel et big data et ne doivent pas éluder les implications sociales qui impliquent une profonde modification des pratiques managériales. L'originalité de ce papier est de construire un pont entre le monde industriel, caractérisé par la rigueur des procédures, et celui des services.
    Keywords: Management Control,Information System (IS),Real Time,Big Data,Decision Support System,Business Intelligence (BI),Contrôle de gestion,Système d'Information (SI),Temps Réel,Système Interactif d'Aide à la Décision (SIAD)
    Date: 2018–06–15
  4. By: Victor Chernozhukov (Institute for Fiscal Studies and MIT); Kaspar Wüthrich (Institute for Fiscal Studies); Yinchu Zhu (Institute for Fiscal Studies)
    Abstract: We extend conformal inference to general settings that allow for time series data. Our proposal is developed as a randomization method and accounts for potential serial dependence by including block structures in the permutation scheme. As a result, the proposed method retains the exact, model-free validity when the data are i.i.d. or more generally exchangeable, similar to usual conformal inference methods. When exchangeability fails, as is the case for common time series data, the proposed approach is approximately valid under weak assumptions on the conformity score.
    Keywords: Conformal inference, permutation and randomization, dependent data, groups
    Date: 2018–03–02
  5. By: Kleijnen, J.P.C. (Tilburg University, Center For Economic Research); van Beers, W.C.M. (Tilburg University, Center For Economic Research)
    Abstract: Kriging or Gaussian process (GP) modeling is an interpolation method that assumes the outputs (responses) are more correlated, the closer the inputs (ex- planatory or independent variables) are. A GP has unknown (hyper)parameters that must be estimated; the standard estimation method uses the "maximum likelihood" criterion. However, big data make it hard to compute the estimates of these GP parameters, and the resulting Kriging predictor and the variance of this predictor. To solve this problem, some authors select a relatively small subset from the big set of previously observed "old" data; their method is se- quential and depends on the variance of the Kriging predictor. The resulting designs turn out to be "local"; i.e., most design points are concentrated around the point to be predicted. We develop three alternative one-shot methods that do not depend on GP parameters: (i) select a small subset such that this sub- set still covers the original input space–albeit coarser; (ii) select a subset with relatively many— but not all— combinations close to the new combination that is to be predicted, and (iii) select a subset with the nearest neighbors (NNs) of this new combination. To evaluate these designs, we compare their squared prediction errors in several numerical (Monte Carlo) experiments. These experi- ments show that our NN design is a viable alternative for the more sophisticated sequential designs.
    Keywords: kriging; Gaussian process; big data; experimental design; nearest neighbor
    JEL: C0 C1 C9 C15 C44
    Date: 2018
  6. By: Chien Yi Huang
    Abstract: An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent. In this paper, we propose an Markov Decision Process (MDP) model suitable for the financial trading task and solve it with the state-of-the-art deep recurrent Q-network (DRQN) algorithm. We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1. We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2. We develop an action augmentation technique to mitigate the need for random exploration by providing extra feedback signals for all actions to the agent. This enables us to use greedy policy over the course of learning and shows strong empirical performance compared to more commonly used epsilon-greedy exploration. However, this technique is specific to financial trading under a few market assumptions. 3. We sample a longer sequence for recurrent neural network training. A side product of this mechanism is that we can now train the agent for every T steps. This greatly reduces training time since the overall computation is down by a factor of T. We combine all of the above into a complete online learning algorithm and validate our approach on the spot foreign exchange market.
    Date: 2018–07
  7. By: Turrell, Arthur (Bank of England); Speigner, Bradley (Bank of England); Djumalieva, Jyldyz (Bank of England); Copple, David (Bank of England); Thurgood, James (Bank of England)
    Abstract: Mismatch in the labour market has been implicated as a driver of the UK’s productivity ‘puzzle’, the phenomenon describing how the growth rate and level of UK productivity have fallen behind their respective pre-Great Financial Crisis trends. Using a new dataset of around 15 million job adverts originally posted online, we examine the extent to which eliminating occupational or regional mismatch would have boosted productivity and output growth in the UK in the post-crisis period. To show how aggregate labour market data hide important heterogeneity, we map the naturally occurring vacancy data into official occupational classifications using a novel application of text analysis. The effects of mismatch on aggregate UK productivity and output are driven by dispersion in regional or occupational productivity, tightness, and matching efficiency. We find, contrary to previous work, that unwinding occupational mismatch would have had a weak effect on growth in the post-crisis period. However, unwinding regional mismatch would have substantially boosted output and productivity relative to their realised paths, bringing them in line with their pre-crisis trends.
    Keywords: Vacancies; matching; mismatch
    JEL: E24 J63
    Date: 2018–07–06
  8. By: IWAMOTO Koichi; TANOUE Yuta
    Abstract: In the seminal study, Frey & Osborne reported that 47% of the total employment in the United States is at risk of computerization. Many studies estimate how automation of work influences employment. In Japan, however, there are few studies which investigate the effect of automation and networking on future employment. It is important to discuss this based on facts and evidence. This paper describes the present situation in Japan on the impact of artificial intelligence (AI) on employment, utilizing a survey study. We first visited Japanese companies and conducted a field survey on the new technologies being introduced. Following this, in August 2017, we conducted a survey study of about 10,000 companies. This paper discusses the output of the survey study.
    Date: 2018–07
  9. By: Paola Cerchiello (Department of Economics and Management, University of Pavia); Anca Mirela Toma (Department of Economics and Management, University of Pavia)
    Abstract: Initial coin offering (aka ICOs) represents one of the several by-product of the cryptocurrencies world. New generation start-up and existing businesses in order to avoid rigid and long money raising protocols imposed by classical channels like banks or venture capitalists, offer the inner value of their business by selling tokens, i.e. units of the chosen cryptocurrency, like a regular firm would do with and IPO. The investors of course hope in a value increasing of the tokens in the near future, provided a solid and valid business idea typically described by the ICO issuers in a white paper, both a descriptive and technical report of the proposed business. However, fraudulent activities perpetrated by unscrupulous start-up happen quite often and it would be crucial to highlight in advance clear signs of illegal money raising. In this paper, we employ a statistical approach to detect which characteristics of an ICO are significantly related to fraudulent behaviours. We leverage a number of differen t variables like: entrepreneurial skills, number of people chatting on Telegram on the given ICO and relative sentiment, type of business, country issuing, token pre-sale price. Through logistic regression, classifcation tree we are able to shed a light on the riskiest ICOs.
    Keywords: ICOs, cryptocurrencies, fundraising, classification models, text analysis
    Date: 2018–07

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.