nep-big New Economics Papers
on Big Data
Issue of 2017‒10‒15
three papers chosen by
Tom Coupé
University of Canterbury

  1. Term Structure Analysis with Big Data By Andreasen, Martin M.; Christensen, Jens H. E.; Rudebusch, Glenn D.
  2. High Frequency Market Making with Machine Learning By Matthew F Dixon
  3. On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments By Windmeijer, Frank; Farbmacher, Helmut; Davies, Neil; Smith, George Davey

  1. By: Andreasen, Martin M. (Aarhus University); Christensen, Jens H. E. (Federal Reserve Bank of San Francisco); Rudebusch, Glenn D. (Federal Reserve Bank of San Francisco)
    Abstract: Analysis of the term structure of interest rates almost always takes a two-step approach. First, actual bond prices are summarized by interpolated synthetic zero-coupon yields, and second, a small set of these yields are used as the source data for further empirical examination. In contrast, we consider the advantages of a one-step approach that directly analyzes the universe of bond prices. To illustrate the feasibility and desirability of the onestep approach, we compare arbitrage-free dynamic term structure models estimated using both approaches. We also provide a simulation study showing that a one-step approach can extract the information in large panels of bond prices and avoid any arbitrary noise introduced from a first-stage interpolation of yields.
    JEL: C58 G12 G17
    Date: 2017–09–15
  2. By: Matthew F Dixon
    Abstract: High frequency trading has been characterized as an arms race with 'Red Queen' characteristics [Farmer,2012]. It is improbable, even impossible, that many market participants can sustain a competitive advantage through the sole reliance on low latency trade execution systems. The growth in volume of market data, advances in computer hardware and commensurate prominence of machine learning in other disciplines, have spurred the exploration of machine learning for price discovery. Even though the application of machine learning to price prediction has been extensively researched, the merit of this approach for high frequency market making has received little attention. This paper introduces a trade execution model to evaluate the economic impact of classifiers through backtesting. Extending the concept of confusion matrix, we present a 'trade information matrix' to attribute the expected profit and loss of tick level predictive classifiers under execution constraints, such as fill probabilities and position dependent trade rules, to correct and incorrect predictions. We apply the execution model and trade information matrix to Level II E-mini S&P 500 futures history and demonstrate an estimation approach for measuring the sensitivity of the P&L to classification error. We describe the training of a recurrent neural network (RNN) and show (i) there is little gain from re-training the model on a frequent basis; (ii) that there are distinct intra-day classifier performance trends; and (iii) classifier accuracy quickly erodes with the length of prediction horizon. Our findings suggest that our computationally tractable approach can be used to directly evaluate the performance sensitivity of a market making strategy to classifier error and can augment traditional market simulation based testing.
    Date: 2017–10
  3. By: Windmeijer, Frank; Farbmacher, Helmut; Davies, Neil; Smith, George Davey
    Abstract: We investigate the behaviour of the Lasso for selecting invalid instruments in linear instrumental variables models for estimating causal effects of exposures on outcomes, as proposed recently by Kang, Zhang, Cai and Small (2016, Journal of the American Statistical Association). Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. We show that for this setup, the Lasso may not select all invalid instruments in large samples if they are relatively strong. Consistent selection also depends on the correlation structure of the instruments. We propose a median estimator that is consistent when less than 50% of the instruments are invalid, but its consistency does not depend on the relative strength of the instruments or their correlation structure. This estimator can therefore be used for adaptive Lasso estimation. The methods are applied to a Mendelian randomisation study to estimate the causal effect of BMI on diastolic blood pressure using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI.
    Keywords: causal inference,instrumental variables estimation,invalid instruments,Lasso,Mendelian randomisation
    Date: 2017

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.