
on Econometrics 
By:  Ruixuan Liu; Zhengfei Yu 
Abstract:  We propose two simple semiparametric estimation methods for ordered response models with an unknown error distribution. The proposed methods do not require users to choose any tuning parameter and they automatically incorporate the monotonicity restriction of the unknown distribution function. Fixing finite dimensional parameters in the model, we construct nonparametric maximum likelihood estimates (NPMLE) for the error distribution based on the related binary choice data or the entire ordered response data. We then obtain estimates for finite dimensional parameters based on moment conditions given the estimated distribution function. Our semiparametric approaches deliver rootn consistent and asymptotically normal estimators of the regression coefficient and threshold parameter. We also develop valid bootstrap procedures for inference. We apply our methods to the interdependent durations model in Honore and de Paula (2010), where the social interaction e ect is directly related to the threshold parameter in the corresponding ordered response model. The advantages of our methods are borne out in simulation studies and a real data application to the joint retirement decision of married couples. 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:tsu:tewpjp:2019004&r=all 
By:  Fabio Franco (University of Rome "Tor Vergata") 
Abstract:  Particle filtering is a useful statistical tool which can be used to make inference on the latent variables and the structural parameters of state space models by employing it inside MCMC algorithms (Flury and Shephard, 2011). It only relies on two assumptions (Gordon et al, 1993): a: The ability to simulate from the dynamic of the model; b: The predictive measurement density can be computed. In practice the second assumption may not be obvious and implementations of particle filter can become difficult to conduct. Gallant, Giacomini and Ragusa (2016) have recently developed a particle filter which does not rely on the structural form of the measurement equation. This method uses a set of moment conditions to induce the likelihood function of a structural model under a GMM criteria. The semiparametric structure allows to use particle filtering where the standard techniques are not applicable or difficult to implement. On the other hand, the GMM representation is less efficient than the standard technique and in some cases it can affect the proper functioning of particle filter and in turn deliver poor estimates. The contribution of this paper is to provide a comparison between the standard techniques, as Kalman filter and standard bootstrap particle filter, and the method proposed by Gallant et al (2016) in order to measure the performance of particle filter with GMM representation. 
Keywords:  Bootstrap particle filter, GMM likelihood representation, MetropolisHastings algorithm, Kalman filter, nonlinear/nonGaussian state space models. 
JEL:  C4 C8 
Date:  2019–12–04 
URL:  http://d.repec.org/n?u=RePEc:rtv:ceisrp:477&r=all 
By:  Schnücker, A.M. 
Abstract:  This paper proposes LASSO estimation specific for panel vector autoregressive (PVAR) models. The penalty term allows for shrinkage for different lags, for shrinkage towards homogeneous coeficients across panel units, for penalization of lags of variables belonging to another crosssectional unit, and for varying penalization across equations. The penalty parameters therefore build on time series and crosssectional properties that are commonly found in PVAR models. Simulation results point towards advantages of using the proposed LASSO for PVAR models over ordinary least squares in terms of forecast accuracy. An empirical forecasting application with five countries support these findings. 
Keywords:  Model selection, multicountry model, shrinkage estimation 
JEL:  C13 C32 C33 
Date:  2019–11–01 
URL:  http://d.repec.org/n?u=RePEc:ems:eureir:122072&r=all 
By:  Duván Humberto Cataño (University of Antioquia); Carlos Vladimir RodríguezCaballero (ITAM and CREATES); Daniel Peña (Universidad Carlos III de Madrid) 
Abstract:  We introduce a nonstationary highdimensional factor model with timevarying loadings. We propose an estimation procedure based on two stages. First, we estimate common factors by principal components. Afterwards, in the second step, considering the factors estimates as observed, the timevarying loadings are estimated by an iterative procedure of generalized least squares using wavelet functions. We investigate the finite sample features of the proposed methodology by some Monte Carlo simulations. Finally, we use this methodology to study the electricity prices and loads of the Nord Pool power market. 
Keywords:  Factor models, wavelet functions, generalized least squares, electricity prices and loads 
JEL:  C13 C32 Q43 
Date:  2019–12–09 
URL:  http://d.repec.org/n?u=RePEc:aah:create:201923&r=all 
By:  He, Yang (Microsoft Corporation); Bartalotti, Otávio (Iowa State University) 
Abstract:  This paper develops a novel wild bootstrap procedure to construct robust biascorrected (RBC) valid confidence intervals (CIs) for fuzzy regression discontinuity designs, providing an intuitive complement to existing RBC methods. The CIs generated by this procedure are valid under conditions similar to the procedures proposed by Calonico et al. (2014) and related literature. Simulations provide evidence that this new method is at least as accurate as the plugin analytical corrections when applied to a variety of data generating processes featuring endogeneity and clustering. Finally, we demonstrate its empirical relevance by revisiting Angrist and Lavy (1999) analysis of class size on student outcomes. 
Keywords:  fuzzy regression discontinuity, robust confidence intervals, wild bootstrap, average treatment effect 
JEL:  C14 C21 C26 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:iza:izadps:dp12801&r=all 
By:  Zhishui Hu (University of Science and Technology of China); Peter C.B. Phillips (Cowles Foundation, Yale University); Qiying Wang (School of Mathematics and Statistics, The University of Sydney) 
Abstract:  This paper develops an asymptotic theory for nonlinear cointegrating power function regression. The framework extends earlier work on the deterministic trend case and allows for both endogeneity and heteroskedasticity, which makes the models and inferential methods relevant to many empirical economic and ï¬ nancial applications, including predictive regression. Accompanying the asymptotic theory of nonlinear regression, the paper establishes some new results on weak convergence to stochastic integrals that go beyond the usual semimartingale structure and considerably extend existing limit theory, complementing other recent ï¬ ndings on stochastic integral asymptotics. The paper also provides a general framework for extremum estimation limit theory that encompasses stochastically nonstationary time series and should be of wide applicability. 
Keywords:  Nonlinear power regression, Least squares estimation, Nonstationarity, Endogeneity, Heteroscedasticity 
JEL:  C13 C22 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:cwl:cwldpp:2211&r=all 
By:  ThomasAgnan, Christine; Morais, Joanna 
Abstract:  In the framework of Compositional Data Analysis, vectors carrying relative information, also called compositional vectors, can appear in regression models either as dependent or as explanatory variables. In some situations, they can be on both sides of the regression equation. Measuring the marginal impacts of covariates in these types of models is not straightforward since a change in one component of a closed composition automatically affects the rest of the composition. J. Morais, C. ThomasAgnan and M. Simioni [Austrian Journal of Statistics, 47(5), 125, 2018] have shown how to measure, compute and interpret these marginal impacts in the case of linear regression models with compositions on both sides of the equation. The resulting natural interpretation is in terms of an elasticity, a quantity commonly used in econometrics and marketing applications. They also demonstrate the link between these elasticities and simplicial derivatives. The aim of this contribution is to extend these results to other situations, namely when the compositional vector is on a single side of the regression equation. In these cases, the marginal impact is related to a semielasticity and also linked to some simplicial derivative. Moreover we consider the possibility that a total variable is used as an explanatory variable, with several possible interpretations of this total and we derive the elasticity formulas in that case. 
Keywords:  compositional regression model; marginal effects; simplicial derivative; elasticity; semielasticity. 
JEL:  C10 C25 C35 C46 M31 D12 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:tse:wpaper:123765&r=all 
By:  Stephan Smeekes; Etienne Wijler 
Abstract:  We investigate how the possible presence of unit roots and cointegration affects forecasting with Big Data. As most macroeoconomic time series are very persistent and may contain unit roots, a proper handling of unit roots and cointegration is of paramount importance for macroeconomic forecasting. The highdimensional nature of Big Data complicates the analysis of unit roots and cointegration in two ways. First, transformations to stationarity require performing many unit root tests, increasing room for errors in the classification. Second, modelling unit roots and cointegration directly is more difficult, as standard highdimensional techniques such as factor models and penalized regression are not directly applicable to (co)integrated data and need to be adapted. We provide an overview of both issues and review methods proposed to address these issues. These methods are also illustrated with two empirical applications. 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1911.10552&r=all 
By:  Ruixuan Liu; Zhengfei Yu 
Abstract:  We study accelerated failure time (AFT) models in which the survivor function of the additive error term is logconcave. The logconcavity assumption covers large families of commonlyused distributions and also represents the aging or wearout phenomenon of the baseline duration. For rightcensored failure time data, we construct semiparametric maximum likelihood estimates of the finite dimensional parameter and establish the large sample properties. The shape restriction is incorporated via a nonparametric maximum likelihood estimator (NPMLE) of the hazard function. Our approach guarantees the uniqueness of a global solution for the estimating equations and delivers semiparametric efficient estimates. Simulation studies and empirical applications demonstrate the usefulness of our method. 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:tsu:tewpjp:2019003&r=all 
By:  Alexander M. Chinco; Andreas Neuhierl; Michael Weber 
Abstract:  The academic literature literally contains hundreds of variables that seem to predict the crosssection of expected returns. This socalled "anomaly zoo" has caused many to question whether researchers are using the right tests of statistical significance. But, here's the thing: even if researchers use the right tests, they will still draw the wrong conclusions from their econometric analyses if they start out with the wrong priorsi.e., if they start out with incorrect beliefs about the ex ante probability of encountering a tradable anomaly. So, what are the right priors? What is the correct anomaly base rate? We develop a first way to estimate the anomaly base rate by combining two key insights: 1) EmpiricalBayes methods capture the implicit process by which researchers form priors based on their past experience with other variables in the anomaly zoo. 2) Under certain conditions, there is a onetoone mapping between these prior beliefs and the bestfit tuning parameter in a penalized regression. We study tradingstrategy performance to verify our estimation results. If you trade on two variables with similar onemonthahead return forecasts in different anomalybaserate regimes (low vs. high), the variable in the low baserate regime consistently underperforms the otherwise identical variable in the high baserate regime. 
JEL:  C12 C52 G11 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:26493&r=all 
By:  Giuseppe Cavaliere; Iliyan Georgiev 
Abstract:  Asymptotic bootstrap validity is usually understood as consistency of the distribution of a bootstrap statistic, conditional on the data, for the unconditional limit distribution of a statistic of interest. From this perspective, randomness of the limit bootstrap measure is regarded as a failure of the bootstrap. We show that such limiting randomness does not necessarily invalidate bootstrap inference if validity is understood as control over the frequency of correct inferences in large samples. We first establish sufficient conditions for asymptotic bootstrap validity in cases where the unconditional limit distribution of a statistic can be obtained by averaging a (random) limiting bootstrap distribution. Further, we provide results ensuring the asymptotic validity of the bootstrap as a tool for conditional inference, the leading case being that where a bootstrap distribution estimates consistently a conditional (and thus, random) limit distribution of a statistic. We apply our framework to several inference problems in econometrics, including linear models with possibly nonstationary regressors, functional CUSUM statistics, conditional KolmogorovSmirnov specification tests, the `parameter on the boundary' problem and tests for constancy of parameters in dynamic econometric models. 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1911.12779&r=all 
By:  Yu, Hanchen; Fotheringham, Alexander Stewart; Li, Ziqi; Oshan, Taylor M.; Wolf, Levi John (University of Bristol) 
Abstract:  Under the realization that Geographically Weighted Regression (GWR) is a databorrowing technique, this paper derives expressions for the amount of bias introduced to local parameter estimates by borrowing data from locations where the processes might be different from those at the regression location. This is done for both GWR and Multiscale GWR (MGWR). We demonstrate the accuracy of our expressions for bias through a comparison with empirically derived estimates based on a simulated data set with known local parameter values. By being able to compute the bias in both models we are able to demonstrate the superiority of MGWR. We then demonstrate the utility of a corrected Akaike Information Criterion statistic in finding optimal bandwidths in both GWR and MGWR as a tradeoff between minimizing both bias and uncertainty. We further show how bias in one set of local parameter estimates can affect the bias in another set of local estimates. The bias derived from borrowing data from other locations appears to be very small. 
Date:  2019–07–18 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:etb42&r=all 
By:  Annalisa Cadonna; Sylvia Fr\"uhwirthSchnatter; Peter Knaus 
Abstract:  Timevarying parameter (TVP) models are very flexible in capturing gradual changes in the effect of a predictor on the outcome variable. However, in particular when the number of predictors is large, there is a known risk of overfitting and poor predictive performance, since the effect of some predictors is constant over time. We propose a prior for variance shrinkage in TVP models, called triple gamma. The triple gamma prior encompasses a number of priors that have been suggested previously, such as the Bayesian lasso, the double gamma prior and the Horseshoe prior. We present the desirable properties of such a prior and its relationship to Bayesian Model Averaging for variance selection. The features of the triple gamma prior are then illustrated in the context of time varying parameter vector autoregressive models, both for simulated datasets and for a series of macroeconomics variables in the Euro Area. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.03100&r=all 
By:  Chan Shen Author1NameFirst: Chan Author1NameLast: Shen (Pennsylvania State University Author2NameFirst: Roger Author2NameLast: Klein Author2Affiliation: Rutgers University Author2Email: rogerwklein@gmail.com) 
Abstract:  Controlling the bias is central to estimating semiparametric models. Many methods have been developed to control bias in estimating conditional expectations while main taining a desirable variance order. However, these methods typically do not perform well at moderate sample sizes. Moreover, and perhaps related to their performance, nonoptimal windows are selected with undersmoothing needed to ensure the appro priate bias order. In this paper, we propose a recursive differencing estimator for conditional expectations. When this method is combined with a bias control targeting the derivative of the semiparametric expectation, we are able to obtain asymptotic normality under optimal windows. As suggested by the structure of the recursion, in a wide variety of triple index designs, the proposed bias control performs much better at moderate sample sizes than regular or higher order kernels and local polynomials. 
Keywords:  semiparametric model, bias reduction, conditional expectation 
JEL:  C1 C14 
Date:  2019–11–24 
URL:  http://d.repec.org/n?u=RePEc:rut:rutres:201903&r=all 
By:  Schnaubelt, Matthias 
Abstract:  Machine learning is increasingly applied to time series data, as it constitutes an attractive alternative to forecasts based on traditional time series models. For independent and identically distributed observations, crossvalidation is the prevalent scheme for estimating outofsample performance in both model selection and assessment. For time series data, however, it is unclear whether forwardvalidation schemes, i.e., schemes that keep the temporal order of observations, should be preferred. In this paper, we perform a comprehensive empirical study of eight common validation schemes. We introduce a study design that perturbs global stationarity by introducing a slow evolution of the underlying datagenerating process. Our results demonstrate that, even for relatively small perturbations, commonly used crossvalidation schemes often yield estimates with the largest bias and variance, and forwardvalidation schemes yield better estimates of the outofsample error. We provide an interpretation of these results in terms of an additional evolutioninduced bias and the samplesize dependent estimation error. Using a largescale financial data set, we demonstrate the practical significance in a replication study of a statistical arbitrage problem. We conclude with some general guidelines on the selection of suitable validation schemes for time series data. 
Keywords:  machine learning,model selection,model validation,time series,crossvalidation 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:zbw:iwqwdp:112019&r=all 
By:  Kenichiro McAlinn; Kosaku Takanashi 
Abstract:  This paper proposes a new estimator for selecting weights to average over least squares estimates obtained from a set of models. Our proposed estimator builds on the Mallows model average (MMA) estimator of Hansen (2007), but, unlike MMA, simultaneously controls for location bias and regression error through a common constant. We show that our proposed estimator the meanshift Mallows model average (MSA) estimator is asymptotically optimal to the original MMA estimator in terms of mean squared error. A simulation study is presented, where we show that our proposed estimator uniformly outperforms the MMA estimator. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.01194&r=all 
By:  Antoine Deeb; Cl\'ement de Chaisemartin 
Abstract:  In the literature studying randomized controlled trials (RCTs), it is often assumed that the potential outcomes of units participating in the experiment are deterministic. This assumption is unlikely to hold, as stochastic shocks may take place during the experiment. In this paper, we consider the case of an RCT with individuallevel treatment assignment, and we allow for individuallevel and clusterlevel (e.g. villagelevel) shocks to affect the potential outcomes. We show that one can draw inference on two estimands: the ATE conditional on the realizations of the clusterlevel shocks, using heteroskedasticityrobust standard errors; the ATE netted out of those shocks, using clusterrobust standard errors. By clustering, researchers can test if the treatment would still have had an effect, had the stochastic shocks that occurred during the experiment been different. 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1912.01052&r=all 
By:  Alexander Jurisch 
Abstract:  We develop a method that relates the truncated cumulantfunction of the fourth order with the L\'evian cumulantfunction. This gives us explicit formulas for the L\'evyparameters, which allow a realtime analysis of the state of a randommotion. Cumbersome procedures like maximumlikelihood or leastsquare methods are unnecessary. Furthermore, we treat the L\'evysystem in terms of statistical mechanics and work out it's thermodynamic properties. This also includes a discussion of the fractal nature of relativistic corrections. As examples for a timeseries analysis, we apply our results on the timeseries of the German DAX and the American S\&P500\,. 
Date:  2019–02 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1902.09425&r=all 
By:  Millie Yi Mao (Azusa Pacific University); Aman Ullah (Department of Economics, University of California Riverside) 
Abstract:  This chapter introduces an information theoretic approach to specify econometric functions as an alternative to avoid parametric assumptions. We investigate the performances of the information theoretic method in estimating the regression (conditional mean) and response (derivative) functions. We have demonstrated that they are easy to implement, and are advantageous over parametric models and nonparametric kernel techniques. 
Keywords:  Information theory, Maximum entropy distributions, Econometric functions, Conditional mean 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:ucr:wpaper:201923&r=all 
By:  Indranil SenGupta; William Nganje; Erik Hanson 
Abstract:  A commonly used stochastic model for derivative and commodity market analysis is the BarndorffNielsen and Shephard (BNS) model. Though this model is very efficient and analytically tractable, it suffers from the absence of long range dependence and many other issues. For this paper, the analysis is restricted to crude oil price dynamics. A simple way of improving the BNS model with the implementation of various machine learning algorithms is proposed. This refined BNS model is more efficient and has fewer parameters than other models which are used in practice as improvements of the BNS model. The procedure and the model show the application of data science for extracting a "deterministic component" out of processes that are usually considered to be completely stochastic. Empirical applications validate the efficacy of the proposed model for long range dependence. 
Date:  2019–11 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1911.13300&r=all 
By:  Christian Bongiorno (MICS  Mathématiques et Informatique pour la Complexité et les Systèmes  CentraleSupélec); Damien Challet (MICS  Mathématiques et Informatique pour la Complexité et les Systèmes  CentraleSupélec) 
Abstract:  We introduce a method to predict which correlation matrix coefficients are likely to change their signs in the future in the highdimensional regime, i.e. when the number of features is larger than the number of samples per feature. The stability of correlation signs, twobytwo relationships, is found to depend on threebythree relationships inspired by Heider social cohesion theory in this regime. We apply our method to US and Hong Kong equities historical data to illustrate how the structure of correlation matrices influences the stability of the sign of its coefficients . 
Date:  2019–10–28 
URL:  http://d.repec.org/n?u=RePEc:hal:wpaper:hal02335586&r=all 
By:  Massimo Franchi ("Sapienza" University of Rome); Paolo Paruolo (European Commission, Joint Research Centre) 
Abstract:  This paper discusses the concept of cointegrating space for systems integrated of order higher than 1. It is first observed that the notions of (polynomial) cointegrating vectors and of root functions coincide. Second, the cointegrating space is defined as a subspace of the space of rational vectors. Third, it is shown that canonical sets of root functions can be used to generate a basis of the cointegrating space. Fourth, results on how to reduce bases of rational vector spaces to polynomial bases with minimal order (i.e. minimal bases) are shown to imply the separation of cointegrating vectors that potentially do not involve differences of the process from the ones that require them. Finally, it is argued that minimality of polynomial bases and economic identification of cointegrating vectors can be properly combined. 
Keywords:  VAR, Cointegration, I(d), Vector spaces. 
JEL:  C12 C33 C55 
Date:  2019–12 
URL:  http://d.repec.org/n?u=RePEc:sas:wpaper:20192&r=all 