
on Econometrics 
By:  Wenger, Kai; Leschinski, Christian 
Abstract:  We propose a family of selfnormalized CUSUM tests for structural change under long memory. The test statistics apply nonparametric kernelbased fixedb and fixedm longrun variance estimators and have welldefined limiting distributions that only depend on the longmemory parameter. A Monte Carlo simulation shows that these tests provide finite sample size control while outperforming competing procedures in terms of power. 
Keywords:  Fixedbandwidth asymptotics; Fractional Integration; Long Memory; Structural Breaks 
JEL:  C12 C22 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:han:dpaper:dp647&r=ecm 
By:  Amit Sharma 
Abstract:  Can instrumental variables be found from data? While instrumental variable (IV) methods are widely used to identify causal effect, testing their validity from observed data remains a challenge. This is because validity of an IV depends on two assumptions, exclusion and asifrandom, that are largely believed to be untestable from data. In this paper, we show that under certain conditions, testing for instrumental variables is possible. We build upon prior work on necessary tests to derive a test that characterizes the odds of being a valid instrument, thus yielding the name "necessary and probably sufficient". The test works by defining the class of invalidIV and validIV causal models as Bayesian generative models and comparing their marginal likelihood based on observed data. When all variables are discrete, we also provide a method to efficiently compute these marginal likelihoods. We evaluate the test on an extensive set of simulations for binary data, inspired by an open problem for IV testing proposed in past work. We find that the test is most powerful when an instrument follows monotonicityeffect on treatment is either nondecreasing or nonincreasingand has moderatetoweak strength; incidentally, such instruments are commonly used in observational studies. Among asifrandom and exclusion, it detects exclusion violations with higher power. Applying the test to IVs from two seminal studies on instrumental variables and five recent studies from the American Economic Review shows that many of the instruments may be flawed, at least when all variables are discretized. The proposed test opens the possibility of datadriven validation and search for instrumental variables. 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1812.01412&r=ecm 
By:  Whitney K. Newey; Sami Stouli 
Abstract:  Multidimensional heterogeneity and endogeneity are important features of a wide class of econometric models. We consider heterogenous coefficients models where the outcome is a linear combination of known functions of treatment and heterogenous coefficients. We use control variables to obtain identification results for average treatment effects. With discrete instruments in a triangular model we find that average treatment effects cannot be identified when the number of support points is less than or equal to the number of coefficients. A sufficient condition for identification is that the second moment matrix of the treatment functions given the control is nonsingular with probability one. We relate this condition to identification of average treatment effects with multiple treatments. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.09837&r=ecm 
By:  Catalina Bolancé (Department of Econometrics, RiskcenterIREA, University of Barcelona, Avinguda Diagonal 690, 08034 Barcelona, Spain.); Ricardo Cao (Research Group MODES, Department of Mathematics, CITIC, Universidade da Coruña and ITMATI Campus de Elviña, s/n 15071 A Coruña, Spain.); Montserrat Guillen (Department of Econometrics, RiskcenterIREA, University of Barcelona, Avinguda Diagonal 690, 08034 Barcelona, Spain.) 
Abstract:  Estimation in singleindex models for risk assessment is developed. Statistical properties are given and an application to estimate the cost of traffic accidents in an innovative insurance data set that has information on driving style is presented. A new kernel approach for the estimator covariance matrix is provided. Both, the simulation study and the real case show that the method provides the best results when data are highly skewed and when the conditional distribution is of interest. Supplementary materials containing appendices are available online. 
Keywords:  Insurance loss data, heavy tailed distributions, quantiles, nonparametric conditional distribution. JEL classification:C51, C14, G22 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:ira:wpaper:201829&r=ecm 
By:  Ramsey, A. 
Abstract:  What changes in the distribution of crop yields occur as a result of technological innovation? Viewing observed yields as random variables, estimation of the yield distribution conditional on time provides one approach for characterizing distributional transformation. Yields are also affected by weather and other covariates, spatial correlation, and a paucity of data in any one location. Common parametric and nonparametric methods rarely consider these aspects in a unified manner. Comprehensive solutions for describing the distribution of yields can be considered ideal. We implement a Bayesian spatial quantile regression model for the conditional distribution of yields that is distributionfree, includes weather (covariate) effects, smooths across space, and models the complete quantile process. Results provide insight into the temporal and spatial evolution of crop yields with implications for the measurement of technological change. Acknowledgement : 
Keywords:  Crop Production/Industries 
Date:  2018–07 
URL:  http://d.repec.org/n?u=RePEc:ags:iaae18:277253&r=ecm 
By:  Matias D. Cattaneo; Michael Jansson; Xinwei Ma 
Abstract:  This paper introduces an intuitive and easytoimplement nonparametric density estimator based on local polynomial techniques. The estimator is fully boundary adaptive and automatic, but does not require prebinning or any other transformation of the data. We study the main asymptotic properties of the estimator, and use these results to provide principled estimation, inference, and bandwidth selection methods. As a substantive application of our results, we develop a novel discontinuity in density testing procedure, an important problem in regression discontinuity designs and other program evaluation settings. An illustrative empirical application is provided. Two companion \texttt{Stata} and \texttt{R} software packages are provided. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.11512&r=ecm 
By:  David M. Kaplan (Department of Economics, University of Missouri); Longhao Zhuo (University of Missouri) 
Abstract:  Using health as an example, we consider comparing two latent distributions when only ordinal data are available. Distinct from the literature, we assume a continuous latent distribution but not a parametric model. Primarily, we contribute (partial) identification results: given two known ordinal distributions, what can be learned about the relationship between the two corresponding latent distributions? Secondarily, we discuss Bayesian and frequentist inference on the relevant ordinal relationships, which are combinations of moment inequalities. Simulations and empirical examples illustrate our contributions. 
Keywords:  health; nonparametric inference; partial identification; partial ordering; shape restrictions 
JEL:  C25 D30 I14 
Date:  2018–12–03 
URL:  http://d.repec.org/n?u=RePEc:umc:wpaper:1816&r=ecm 
By:  Xavier D'Haultfoeuille; Christophe Gaillac; Arnaud Maurel 
Abstract:  In this paper, we build a new test of rational expectations based on the marginal distributions of realizations and subjective beliefs. This test is widely applicable, including in the common situation where realizations and beliefs are observed in two different datasets that cannot be matched. We show that whether one can rationalize rational expectations is equivalent to the distribution of realizations being a meanpreserving spread of the distribution of beliefs. The null hypothesis can then be rewritten as a system of many moment inequality and equality constraints, for which tests have been recently developed in the literature. Next, we go beyond testing by defining and estimating the minimal deviations from rational expectations that can be rationalized by the data. In the context of structural models, we build on this concept to propose an easytoimplement way to conduct a sensitivity analysis on the assumed form of expectations. Finally, we apply our framework to test for and quantify deviations from rational expectations about future earnings, and examine the consequences of such departures in the context of a lifecycle model of consumption. 
JEL:  C12 D84 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:25274&r=ecm 
By:  ChihSheng Hsieh; Stanley I. M. Ko; Jaromír Kovářík; Trevon Logan 
Abstract:  This paper analyzes statistical issues arising from nonrepresentative network samples of the population, the most common network data used. We first characterize the biases in both network statistics and estimates of network effects under nonrandom sampling theoretically and numerically. Sampled network data systematically bias the properties of observed networks and suffer from nonclassical measurementerror problems if applied as regressors. Apart from the sampling rate and the elicitation procedure, these biases depend in a nontrivial way on which subpopulations are missing with higher probability. We then propose a methodology, adapting poststratification weighting approaches to networked contexts, which enables researchers to recover several networklevel statistics and reduce the biases in the estimated network effects. The advantages of the proposed methodology are that it can be applied to network data collected via both designed and nondesigned sampling procedures, does not require one to assume any network formation model, and is straightforward to implement. We use Monte Carlo simulation and two widely used empirical network data sets to show that accounting for the nonrepresentativeness of the sample dramatically changes the results of regression analysis. 
JEL:  C4 D85 L14 Z13 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:25270&r=ecm 
By:  Ivan Korolev 
Abstract:  This paper studies model selection in semiparametric econometric models. It develops a consistent seriesbased model selection procedure based on a Bayesian Information Criterion (BIC) type criterion to select between several classes of models. The procedure selects a model by minimizing the semiparametric Lagrange Multiplier (LM) type test statistic from Korolev (2018) but additionally rewards simpler models. The paper also develops consistent upward testing (UT) and downward testing (DT) procedures based on the semiparametric LM type specification test. The proposed semiparametric LMBIC and UT procedures demonstrate good performance in simulations. To illustrate the use of these semiparametric model selection procedures, I apply them to the parametric and semiparametric gasoline demand specifications from Yatchew and No (2001). The LMBIC procedure selects the semiparametric specification that is nonparametric in age but parametric in all other variables, which is in line with the conclusions in Yatchew and No (2001). The results of the UT and DT procedures heavily depend on the choice of tuning parameters and assumptions about the model errors. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.10676&r=ecm 
By:  LeYu Chen; Sokbae Lee 
Abstract:  We consider a high dimensional binary classification problem and construct a classification procedure by minimizing the empirical misclassification risk with a penalty on the number of selected features. We derive nonasymptotic probability bounds on the estimated sparsity as well as on the excess misclassification risk. In particular, we show that our method yields a sparse solution whose l0norm can be arbitrarily close to true sparsity with high probability and obtain the rates of convergence for the excess misclassification risk. The proposed procedure is implemented via the method of mixed integer linear programming. Its numerical performance is illustrated in Monte Carlo experiments. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.09540&r=ecm 
By:  Richard Blundell; Joel Horowitz; Matthias Parey 
Abstract:  Berkson errors are commonplace in empirical microeconomics and occur whenever we observe an average in a specified group rather than the true individual value. In consumer demand this form of measurement error is present because the price an individual pays is often measured by the average price paid by individuals in a specified group (e.g., a county). We show the importance of such measurement errors for the estimation of demand in a setting with nonseparable unobserved heterogeneity. We develop a consistent estimator using external information on the true distribution of prices. Examining the demand for gasoline in the U.S., accounting for Berkson errors is found to be quantitatively important for estimating price effects and for welfare calculations. Imposing the Slutsky shape constraint greatly reduces the sensitivity to Berkson errors. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.10690&r=ecm 
By:  Bart Smeulders 
Abstract:  Kitamura and Stoye (2014) develop a nonparametric test for linear inequality constraints, when these are are represented as vertices of a polyhedron instead of its faces. They implement this test for an application to nonparametric tests of Random Utility Models. As they note in their paper, testing such models is computationally challenging. In this paper, we develop and implement more efficient algorithms, based on column generation, to carry out the test. These improved algorithms allow us to tackle larger datasets. 
Date:  2018–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1812.01400&r=ecm 
By:  Maréchal, Pierre; Simar, Léopold; Vanhems, Anne 
Abstract:  In this paper, we use a mollifier approach to regularize the deconvolution, which has been used in research fields like medical imaging, tomography, astrophysics but, to the best of our knowledge, never in statistics or econometrics. We show that the analysis of this new regularization method offers a unifying and generalizing frame in order to compare the benefits of various different filtertype techniques like deconvolution kernels, Tikhonov or spectral cutoff method. In particular, the mollifier approach allows to relax some restrictive assumptions required for the deconvolution problem, and has better stabilizing properties compared to spectral cutoff and Tikhonov. We prove the asymptotic convergence of our estimator and provide simulations analysis to compare the finite sample properties of our estimator with respect to the wellknown methods. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:33097&r=ecm 
By:  Marco Raffaella Piccarreta; Marco Bonetti; Stefano Lombardi 
Abstract:  We consider the case when it is of interest to study the different states experienced over time by a set of subjects, focusing on the resulting trajectories as a whole rather than on the occurrence of specific events. Such situation occurs commonly in a variety of settings, for example in social and biomedical studies. Model‐based approaches, such as multistate models or Hidden Markov models, are being used increasingly to analyze trajectories and to study their relationships with a set of explanatory variables. The different assumptions underlying different models typically make the comparison of their performances difficult. In this work we introduce a novel way to accomplish this task, based on microsimulation‐based predictions. We discuss some criteria to evaluate one model and/or to compare competing models with respect to their ability to generate trajectories similar to the observed ones. 
Keywords:  Dissimilarity; Hidden Markov model; Interpoint distance distribution; Micro‐simulation; Multistate model; Optimal Matching; Sequence analysis 
Date:  2018–01 
URL:  http://d.repec.org/n?u=RePEc:don:donwpa:113&r=ecm 
By:  Alexander Heinemann; Sean Telg 
Abstract:  This paper studies a fixeddesign residual bootstrap method for the twostep estimator of Francq and Zako\"ian (2015) associated with the conditional Expected Shortfall. For a general class of volatility models the bootstrap is shown to be asymptotically valid under the conditions imposed by Beutner et al. (2018). A simulation study is conducted revealing that the average coverage rates are satisfactory for most settings considered. There is no clear evidence to have a preference for any of the three proposed bootstrap intervals. This contrasts results in Beutner et al. (2018) for the VaR, for which the reversedtails interval has a superior performance. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.11557&r=ecm 
By:  Nava, Noemi; Di Matteo, Tiziana; Aste, Tomaso 
Abstract:  We introduce a multistepahead forecasting methodology that combines empirical mode decomposition (EMD) and support vector regression (SVR). This methodology is based on the idea that the forecasting task is simplified by using as input for SVR the time series decomposed with EMD. The outcomes of this methodology are compared with benchmark models commonly used in the literature. The results demonstrate that the combination of EMD and SVR can outperform benchmark models significantly, predicting the Standard & Poor’s 500 Index from 30 s to 25 min ahead. The highfrequency components better forecast shortterm horizons, whereas the lowfrequency components better forecast longterm horizons. 
Keywords:  empirical mode decomposition; support vector regression; forecasting 
JEL:  G1 G2 
Date:  2018–02–05 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:91028&r=ecm 
By:  Zihao Zhang; Stefan Zohren; Stephen Roberts 
Abstract:  We showcase how dropout variational inference can be applied to a largescale deep learning model that predicts price movements from limit order books (LOBs), the canonical data source representing trading and pricing movements. We demonstrate that uncertainty information derived from posterior predictive distributions can be utilised for position sizing, avoiding unnecessary trades and improving profits. Further, we test our models by using millions of observations across several instruments and markets from the London Stock Exchange. Our results suggest that those Bayesian techniques not only deliver uncertainty information that can be used for trading but also improve predictive performance as stochastic regularisers. To the best of our knowledge, we are the first to apply Bayesian networks to LOBs. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1811.10041&r=ecm 
By:  Greg Kirczenow; Masoud Hashemi; Ali Fathi; Matt Davison 
Abstract:  This paper studies an application of machine learning in extracting features from the historical market implied corporate bond yields. We consider an example of a hypothetical illiquid fixed income market. After choosing a surrogate liquid market, we apply the Denoising Autoencoder (DAE) algorithm to learn the features of the missing yield parameters from the historical data of the instruments traded in the chosen liquid market. The DAE algorithm is then challenged by two "pointintime" inpainting algorithms taken from the image processing and computer vision domain. It is observed that, when tested on unobserved rate surfaces, the DAE algorithm exhibits superior performance thanks to the features it has learned from the historical shapes of yield curves. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1812.01102&r=ecm 
By:  Julian Hidalgo; Michelle Sovinsky 
Abstract:  Often empirical researchers face many data constraints when estimating models of de mand. These constraints can sometimes prevent adequate evaluation of policies. In this article, we discuss two such missing data problems that arise frequently: missing data on prices and missing information on the size of the potential market. We present some ways to overcome these limitations in the context of two recent research projects. Liana and Sovin sky (2018) which addresses how to incorporate unobserved price heterogeneity and Hidalgo and Sovinsky (2018) which focuses on how to use modeling techniques to estimate missing market size. Our aim is to provide a starting point for thinking about ways to overcome common data issues. 
Date:  2018–11 
URL:  http://d.repec.org/n?u=RePEc:bon:boncrc:crctr224_058_2018&r=ecm 