nep-ecm New Economics Papers
on Econometrics
Issue of 2018‒12‒17
twenty papers chosen by
Sune Karlsson
Örebro universitet

  1. Fixed-Bandwidth CUSUM Tests Under Long Memory By Wenger, Kai; Leschinski, Christian
  2. Necessary and Probably Sufficient Test for Finding Valid Instrumental Variables By Amit Sharma
  3. Heterogenous Coefficients, Discrete Instruments, and Identification of Treatment Effects By Whitney K. Newey; Sami Stouli
  4. “Flexible maximum conditional likelihood estimation for single-index models to predict accident severity with telematics data” By Catalina Bolancé; Ricardo Cao; Montserrat Guillen
  5. Conditional Distributions of Crop Yields: A Bayesian Approach for Characterizing Technological Change By Ramsey, A.
  6. Simple Local Polynomial Density Estimators By Matias D. Cattaneo; Michael Jansson; Xinwei Ma
  7. Comparing latent inequality with ordinal data By David M. Kaplan; Longhao Zhuo
  8. Rationalizing Rational Expectations? Tests and Deviations By Xavier D'Haultfoeuille; Christophe Gaillac; Arnaud Maurel
  9. Non-Randomly Sampled Networks: Biases and Corrections By Chih-Sheng Hsieh; Stanley I. M. Ko; Jaromír Kovářík; Trevon Logan
  10. LM-BIC Model Selection in Semiparametric Models By Ivan Korolev
  11. High Dimensional Classification through $\ell_0$-Penalized Empirical Risk Minimization By Le-Yu Chen; Sokbae Lee
  12. Estimation of a Nonseparable Heterogeneous Demand Function with Shape Restrictions and Berkson Errors By Richard Blundell; Joel Horowitz; Matthias Parey
  13. Column Generation Algorithms for Nonparametric Analysis of Random Utility Models By Bart Smeulders
  14. A mollifier approach to the deconvolution of probability densities By Maréchal, Pierre; Simar, Léopold; Vanhems, Anne
  15. Comparing models for sequence data: prediction and dissimilarities By Marco Raffaella Piccarreta; Marco Bonetti; Stefano Lombardi
  16. A Residual Bootstrap for Conditional Expected Shortfall By Alexander Heinemann; Sean Telg
  17. Financial time series forecasting using empirical mode decomposition and support vector regression By Nava, Noemi; Di Matteo, Tiziana; Aste, Tomaso
  18. BDLOB: Bayesian Deep Convolutional Neural Networks for Limit Order Books By Zihao Zhang; Stefan Zohren; Stephen Roberts
  19. Machine Learning for Yield Curve Feature Extraction: Application to Illiquid Corporate Bonds By Greg Kirczenow; Masoud Hashemi; Ali Fathi; Matt Davison
  20. Forensic Econometrics: Demand Estimation when Data are Missing By Julian Hidalgo; Michelle Sovinsky

  1. By: Wenger, Kai; Leschinski, Christian
    Abstract: We propose a family of self-normalized CUSUM tests for structural change under long memory. The test statistics apply non-parametric kernel-based fixed-b and fixed-m long-run variance estimators and have well-defined limiting distributions that only depend on the long-memory parameter. A Monte Carlo simulation shows that these tests provide finite sample size control while outperforming competing procedures in terms of power.
    Keywords: Fixed-bandwidth asymptotics; Fractional Integration; Long Memory; Structural Breaks
    JEL: C12 C22
    Date: 2018–12
  2. By: Amit Sharma
    Abstract: Can instrumental variables be found from data? While instrumental variable (IV) methods are widely used to identify causal effect, testing their validity from observed data remains a challenge. This is because validity of an IV depends on two assumptions, exclusion and as-if-random, that are largely believed to be untestable from data. In this paper, we show that under certain conditions, testing for instrumental variables is possible. We build upon prior work on necessary tests to derive a test that characterizes the odds of being a valid instrument, thus yielding the name "necessary and probably sufficient". The test works by defining the class of invalid-IV and valid-IV causal models as Bayesian generative models and comparing their marginal likelihood based on observed data. When all variables are discrete, we also provide a method to efficiently compute these marginal likelihoods. We evaluate the test on an extensive set of simulations for binary data, inspired by an open problem for IV testing proposed in past work. We find that the test is most powerful when an instrument follows monotonicity---effect on treatment is either non-decreasing or non-increasing---and has moderate-to-weak strength; incidentally, such instruments are commonly used in observational studies. Among as-if-random and exclusion, it detects exclusion violations with higher power. Applying the test to IVs from two seminal studies on instrumental variables and five recent studies from the American Economic Review shows that many of the instruments may be flawed, at least when all variables are discretized. The proposed test opens the possibility of data-driven validation and search for instrumental variables.
    Date: 2018–12
  3. By: Whitney K. Newey; Sami Stouli
    Abstract: Multidimensional heterogeneity and endogeneity are important features of a wide class of econometric models. We consider heterogenous coefficients models where the outcome is a linear combination of known functions of treatment and heterogenous coefficients. We use control variables to obtain identification results for average treatment effects. With discrete instruments in a triangular model we find that average treatment effects cannot be identified when the number of support points is less than or equal to the number of coefficients. A sufficient condition for identification is that the second moment matrix of the treatment functions given the control is nonsingular with probability one. We relate this condition to identification of average treatment effects with multiple treatments.
    Date: 2018–11
  4. By: Catalina Bolancé (Department of Econometrics, Riskcenter-IREA, University of Barcelona, Avinguda Diagonal 690, 08034 Barcelona, Spain.); Ricardo Cao (Research Group MODES, Department of Mathematics, CITIC, Universidade da Coruña and ITMATI Campus de Elviña, s/n 15071 A Coruña, Spain.); Montserrat Guillen (Department of Econometrics, Riskcenter-IREA, University of Barcelona, Avinguda Diagonal 690, 08034 Barcelona, Spain.)
    Abstract: Estimation in single-index models for risk assessment is developed. Statistical properties are given and an application to estimate the cost of traffic accidents in an innovative insurance data set that has information on driving style is presented. A new kernel approach for the estimator covariance matrix is provided. Both, the simulation study and the real case show that the method provides the best results when data are highly skewed and when the conditional distribution is of interest. Supplementary materials containing appendices are available online.
    Keywords: Insurance loss data, heavy tailed distributions, quantiles, non-parametric conditional distribution. JEL classification:C51, C14, G22
    Date: 2018–12
  5. By: Ramsey, A.
    Abstract: What changes in the distribution of crop yields occur as a result of technological innovation? Viewing observed yields as random variables, estimation of the yield distribution conditional on time provides one approach for characterizing distributional transformation. Yields are also affected by weather and other covariates, spatial correlation, and a paucity of data in any one location. Common parametric and nonparametric methods rarely consider these aspects in a unified manner. Comprehensive solutions for describing the distribution of yields can be considered ideal. We implement a Bayesian spatial quantile regression model for the conditional distribution of yields that is distribution-free, includes weather (covariate) effects, smooths across space, and models the complete quantile process. Results provide insight into the temporal and spatial evolution of crop yields with implications for the measurement of technological change. Acknowledgement :
    Keywords: Crop Production/Industries
    Date: 2018–07
  6. By: Matias D. Cattaneo; Michael Jansson; Xinwei Ma
    Abstract: This paper introduces an intuitive and easy-to-implement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require pre-binning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth selection methods. As a substantive application of our results, we develop a novel discontinuity in density testing procedure, an important problem in regression discontinuity designs and other program evaluation settings. An illustrative empirical application is provided. Two companion \texttt{Stata} and \texttt{R} software packages are provided.
    Date: 2018–11
  7. By: David M. Kaplan (Department of Economics, University of Missouri); Longhao Zhuo (University of Missouri)
    Abstract: Using health as an example, we consider comparing two latent distributions when only ordinal data are available. Distinct from the literature, we assume a continuous latent distribution but not a parametric model. Primarily, we contribute (partial) identification results: given two known ordinal distributions, what can be learned about the relationship between the two corresponding latent distributions? Secondarily, we discuss Bayesian and frequentist inference on the relevant ordinal relationships, which are combinations of moment inequalities. Simulations and empirical examples illustrate our contributions.
    Keywords: health; nonparametric inference; partial identification; partial ordering; shape restrictions
    JEL: C25 D30 I14
    Date: 2018–12–03
  8. By: Xavier D'Haultfoeuille; Christophe Gaillac; Arnaud Maurel
    Abstract: In this paper, we build a new test of rational expectations based on the marginal distributions of realizations and subjective beliefs. This test is widely applicable, including in the common situation where realizations and beliefs are observed in two different datasets that cannot be matched. We show that whether one can rationalize rational expectations is equivalent to the distribution of realizations being a mean-preserving spread of the distribution of beliefs. The null hypothesis can then be rewritten as a system of many moment inequality and equality constraints, for which tests have been recently developed in the literature. Next, we go beyond testing by defining and estimating the minimal deviations from rational expectations that can be rationalized by the data. In the context of structural models, we build on this concept to propose an easy-to-implement way to conduct a sensitivity analysis on the assumed form of expectations. Finally, we apply our framework to test for and quantify deviations from rational expectations about future earnings, and examine the consequences of such departures in the context of a life-cycle model of consumption.
    JEL: C12 D84
    Date: 2018–11
  9. By: Chih-Sheng Hsieh; Stanley I. M. Ko; Jaromír Kovářík; Trevon Logan
    Abstract: This paper analyzes statistical issues arising from non-representative network samples of the population, the most common network data used. We first characterize the biases in both network statistics and estimates of network effects under non-random sampling theoretically and numerically. Sampled network data systematically bias the properties of observed networks and suffer from non-classical measurement-error problems if applied as regressors. Apart from the sampling rate and the elicitation procedure, these biases depend in a non-trivial way on which subpopulations are missing with higher probability. We then propose a methodology, adapting post-stratification weighting approaches to networked contexts, which enables researchers to recover several network-level statistics and reduce the biases in the estimated network effects. The advantages of the proposed methodology are that it can be applied to network data collected via both designed and non-designed sampling procedures, does not require one to assume any network formation model, and is straightforward to implement. We use Monte Carlo simulation and two widely used empirical network data sets to show that accounting for the non-representativeness of the sample dramatically changes the results of regression analysis.
    JEL: C4 D85 L14 Z13
    Date: 2018–11
  10. By: Ivan Korolev
    Abstract: This paper studies model selection in semiparametric econometric models. It develops a consistent series-based model selection procedure based on a Bayesian Information Criterion (BIC) type criterion to select between several classes of models. The procedure selects a model by minimizing the semiparametric Lagrange Multiplier (LM) type test statistic from Korolev (2018) but additionally rewards simpler models. The paper also develops consistent upward testing (UT) and downward testing (DT) procedures based on the semiparametric LM type specification test. The proposed semiparametric LM-BIC and UT procedures demonstrate good performance in simulations. To illustrate the use of these semiparametric model selection procedures, I apply them to the parametric and semiparametric gasoline demand specifications from Yatchew and No (2001). The LM-BIC procedure selects the semiparametric specification that is nonparametric in age but parametric in all other variables, which is in line with the conclusions in Yatchew and No (2001). The results of the UT and DT procedures heavily depend on the choice of tuning parameters and assumptions about the model errors.
    Date: 2018–11
  11. By: Le-Yu Chen; Sokbae Lee
    Abstract: We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive non-asymptotic probability bounds on the estimated sparsity as well as on the excess misclassification risk. In particular, we show that our method yields a sparse solution whose l0-norm can be arbitrarily close to true sparsity with high probability and obtain the rates of convergence for the excess misclassification risk. The proposed procedure is implemented via the method of mixed integer linear programming. Its numerical performance is illustrated in Monte Carlo experiments.
    Date: 2018–11
  12. By: Richard Blundell; Joel Horowitz; Matthias Parey
    Abstract: Berkson errors are commonplace in empirical microeconomics and occur whenever we observe an average in a specified group rather than the true individual value. In consumer demand this form of measurement error is present because the price an individual pays is often measured by the average price paid by individuals in a specified group (e.g., a county). We show the importance of such measurement errors for the estimation of demand in a setting with nonseparable unobserved heterogeneity. We develop a consistent estimator using external information on the true distribution of prices. Examining the demand for gasoline in the U.S., accounting for Berkson errors is found to be quantitatively important for estimating price effects and for welfare calculations. Imposing the Slutsky shape constraint greatly reduces the sensitivity to Berkson errors.
    Date: 2018–11
  13. By: Bart Smeulders
    Abstract: Kitamura and Stoye (2014) develop a nonparametric test for linear inequality constraints, when these are are represented as vertices of a polyhedron instead of its faces. They implement this test for an application to nonparametric tests of Random Utility Models. As they note in their paper, testing such models is computationally challenging. In this paper, we develop and implement more efficient algorithms, based on column generation, to carry out the test. These improved algorithms allow us to tackle larger datasets.
    Date: 2018–12
  14. By: Maréchal, Pierre; Simar, Léopold; Vanhems, Anne
    Abstract: In this paper, we use a mollifier approach to regularize the deconvolution, which has been used in research fields like medical imaging, tomography, astrophysics but, to the best of our knowledge, never in statistics or econometrics. We show that the analysis of this new regularization method offers a unifying and generalizing frame in order to compare the benefits of various different filter-type techniques like deconvolution kernels, Tikhonov or spectral cut-off method. In particular, the mollifier approach allows to relax some restrictive assumptions required for the deconvolution problem, and has better stabilizing properties compared to spectral cutoff and Tikhonov. We prove the asymptotic convergence of our estimator and provide simulations analysis to compare the finite sample properties of our estimator with respect to the well-known methods.
    Date: 2018–11
  15. By: Marco Raffaella Piccarreta; Marco Bonetti; Stefano Lombardi
    Abstract: We consider the case when it is of interest to study the different states experienced over time by a set of subjects, focusing on the resulting trajectories as a whole rather than on the occurrence of specific events. Such situation occurs commonly in a variety of settings, for example in social and biomedical studies. Model‐based approaches, such as multistate models or Hidden Markov models, are being used increasingly to analyze trajectories and to study their relationships with a set of explanatory variables. The different assumptions underlying different models typically make the comparison of their performances difficult. In this work we introduce a novel way to accomplish this task, based on microsimulation‐based predictions. We discuss some criteria to evaluate one model and/or to compare competing models with respect to their ability to generate trajectories similar to the observed ones.
    Keywords: Dissimilarity; Hidden Markov model; Interpoint distance distribution; Micro‐simulation; Multistate model; Optimal Matching; Sequence analysis
    Date: 2018–01
  16. By: Alexander Heinemann; Sean Telg
    Abstract: This paper studies a fixed-design residual bootstrap method for the two-step estimator of Francq and Zako\"ian (2015) associated with the conditional Expected Shortfall. For a general class of volatility models the bootstrap is shown to be asymptotically valid under the conditions imposed by Beutner et al. (2018). A simulation study is conducted revealing that the average coverage rates are satisfactory for most settings considered. There is no clear evidence to have a preference for any of the three proposed bootstrap intervals. This contrasts results in Beutner et al. (2018) for the VaR, for which the reversed-tails interval has a superior performance.
    Date: 2018–11
  17. By: Nava, Noemi; Di Matteo, Tiziana; Aste, Tomaso
    Abstract: We introduce a multistep-ahead forecasting methodology that combines empirical mode decomposition (EMD) and support vector regression (SVR). This methodology is based on the idea that the forecasting task is simplified by using as input for SVR the time series decomposed with EMD. The outcomes of this methodology are compared with benchmark models commonly used in the literature. The results demonstrate that the combination of EMD and SVR can outperform benchmark models significantly, predicting the Standard & Poor’s 500 Index from 30 s to 25 min ahead. The high-frequency components better forecast short-term horizons, whereas the low-frequency components better forecast long-term horizons.
    Keywords: empirical mode decomposition; support vector regression; forecasting
    JEL: G1 G2
    Date: 2018–02–05
  18. By: Zihao Zhang; Stefan Zohren; Stephen Roberts
    Abstract: We showcase how dropout variational inference can be applied to a large-scale deep learning model that predicts price movements from limit order books (LOBs), the canonical data source representing trading and pricing movements. We demonstrate that uncertainty information derived from posterior predictive distributions can be utilised for position sizing, avoiding unnecessary trades and improving profits. Further, we test our models by using millions of observations across several instruments and markets from the London Stock Exchange. Our results suggest that those Bayesian techniques not only deliver uncertainty information that can be used for trading but also improve predictive performance as stochastic regularisers. To the best of our knowledge, we are the first to apply Bayesian networks to LOBs.
    Date: 2018–11
  19. By: Greg Kirczenow; Masoud Hashemi; Ali Fathi; Matt Davison
    Abstract: This paper studies an application of machine learning in extracting features from the historical market implied corporate bond yields. We consider an example of a hypothetical illiquid fixed income market. After choosing a surrogate liquid market, we apply the Denoising Autoencoder (DAE) algorithm to learn the features of the missing yield parameters from the historical data of the instruments traded in the chosen liquid market. The DAE algorithm is then challenged by two "point-in-time" inpainting algorithms taken from the image processing and computer vision domain. It is observed that, when tested on unobserved rate surfaces, the DAE algorithm exhibits superior performance thanks to the features it has learned from the historical shapes of yield curves.
    Date: 2018–11
  20. By: Julian Hidalgo; Michelle Sovinsky
    Abstract: Often empirical researchers face many data constraints when estimating models of de- mand. These constraints can sometimes prevent adequate evaluation of policies. In this article, we discuss two such missing data problems that arise frequently: missing data on prices and missing information on the size of the potential market. We present some ways to overcome these limitations in the context of two recent research projects. Liana and Sovin- sky (2018) which addresses how to incorporate unobserved price heterogeneity and Hidalgo and Sovinsky (2018) which focuses on how to use modeling techniques to estimate missing market size. Our aim is to provide a starting point for thinking about ways to overcome common data issues.
    Date: 2018–11

This nep-ecm issue is ©2018 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.