Abstract: |
The problem of classifying trades as buys or sells is examined. I propose
estimated quotes for midpoint and bid/ask tests and a modeling approach to
classification. Prevailing quotes are estimated using flexible approximations
to the distribution for delays of quotes relative to trade timestamps.
Classification is done by a generalized linear model which includes improved
versions of midpoint, tick, and bid/ask tests. The model also considers the
relative strengths of these tests, can account for market microstructure
peculiarities, and allows for autocorrelations and cross-correlations in trade
direction. The correlation modeling corrects for pseudoreplication, yielding
more accurate standard errors and fixed effect estimates. Further, the model
estimates probabilities of correct classification. The model is compared to
various trade classification methods using a sample of 2,836 domestic US
stocks from an unexplored, recent, and readily-available dataset. Out of
sample, modeled classifications are 1-2% more accurate overall than current
methods; this improvement is consistent across dates, sectors, and locations
relative to the inside quote. For Nasdaq and NYSE stocks, 1% and 1.3% of the
improvement comes from using relative strengths of the various tests; 0.9% and
0.7% of the improvement, respectively, comes from using some form of estimated
quotes. For AMEX stocks, a 0.4% improvement is attributed to using a lagged
version of the bid/ask test. I also find indications of short- and
ultra-short-term alpha. |