
on Econometrics 
By:  Gilles de Truchis; Elena Ivona Dumitrescu 
Abstract:  We discuss cointegration relationships when covariance stationary observables exhibit unbalanced integration orders. Least squares type estimates of the long run coefficient are expected to converge either to 0 or to infinity if one does not account for the true unknown unbalance parameter. We propose a class of narrowband weighted nonlinear least squares estimators of these two parameters and analyze its asymptotic properties. The limit distribution is shown to be Gaussian, albeit singular, and it covers the entire stationary region in the particular case of the generalized nonlinear least squares estimator, thereby allowing for straightforward statistical inference. A Monte Carlo study documents the good finite sample properties of our class of estimators. They are further used to provide new perspectives on the riskreturn relationship on financial stock markets. In particular, we find that the variance risk premium estimated in an appropriately rebalanced cointegration system is a better return predictor than existing risk premia measures. 
Keywords:  Unbalanced cointegration, Long memory, Stationarity, Generalized Least Squares, Nonlinear Least Squares 
JEL:  C22 G10 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:drm:wpaper:201914&r=all 
By:  Claudia Noack; Christoph Rothe 
Abstract:  Fuzzy regression discontinuity (FRD) designs occur frequently in many areas of applied economics. We argue that the confidence intervals based on nonparametric local linear regression that are commonly reported in empirical FRD studies can have poor finite sample coverage properties for reasons related to their general construction based on the delta method, and to how they account for smoothing bias. We therefore propose new confidence sets, which are based on an AndersonRubintype construction. These confidence sets are biasaware, in the sense that they explicitly take into account the exact smoothing bias of the local linear estimators on which they are based. They are simple to compute, highly efficient, have excellent coverage properties in finite samples. They are also valid under weak identification (that is, if the jump in treatment probabilities at the threshold is small) and irrespective of whether the distribution of the running variable is continuous, discrete, or of some intermediate form. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.04631&r=all 
By:  Feiyu Jiang; Dong Li; Ke Zhu 
Abstract:  This paper considers a semiparametric generalized autoregressive conditional heteroscedastic (SGARCH) model, which has a smooth long run component with unknown form to depict timevarying parameters, and a GARCHtype short run component to capture the temporal dependence. For this SGARCH model, we first estimate the timevarying long run component by the kernel estimator, and then estimate the nontimevarying parameters in short run component by the quasi maximum likelihood estimator (QMLE). We show that the QMLE is asymptotically normal with the usual parametric convergence rate. Next, we provide a consistent Bayesian information criterion for order selection. Furthermore, we construct a Lagrange multiplier (LM) test for linear parameter constraint and a portmanteau test for model diagnostic checking, and prove that both tests have the standard chisquared limiting null distributions. Our entire statistical inference procedure not only works for the nonstationary data, but also has three novel features: first, our QMLE and two tests are adaptive to the unknown form of the long run component; second, our QMLE and two tests are easytoimplement due to their related simple asymptotic variance expressions; third, our QMLE and two tests share the same efficiency and testing power as those in variance target method when the SGARCH model is stationary. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.04147&r=all 
By:  Dan Li; Adam Clements; Christopher Drovandi 
Abstract:  This paper exploits the advantages of sequential Monte Carlo (SMC) to develop parameter estimation and model selection methods for GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) style models. This approach provides an alternative method for quantifying estimation uncertainty relative to classical inference. We demonstrate that even with long time series, the posterior distribution of model parameters are nonnormal, highlighting the need for a Bayesian approach and an efficient posterior sampling method. Efficient approaches for both constructing the sequence of distributions in SMC, and leaveoneout crossvalidation, for long time series data are also proposed. Finally, we develop an unbiased estimator of the likelihood for the Bad EnvironmentGood Environment model, a complex GARCHtype model, which permits exact Bayesian inference not previously available in the literature. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.03828&r=all 
By:  Isaac Loh 
Abstract:  In a nonparametric instrumental regression model, we strengthen the conventional moment independence assumption towards full statistical independence between instrument and error term. This allows us to prove identification results and develop estimators for a structural function of interest when the instrument is discrete, and in particular binary. When the regressor of interest is also discrete with more mass points than the instrument, we state straightforward conditions under which the structural function is partially identified, and give modified assumptions which imply point identification. These stronger assumptions are shown to hold outside of a small set of conditional moments of the error term. Estimators for the identified set are given when the structural function is either partially or point identified. When the regressor is continuously distributed, we prove that if the instrument induces a sufficiently rich variation in the joint distribution of the regressor and error term then point identification of the structural function is still possible. This approach is relatively tractable, and under some standard conditions we demonstrate that our point identifying assumption holds on a topologically generic set of density functions for the joint distribution of regressor, error, and instrument. Our method also applies to a wellknown nonparametric quantile regression framework, and we are able to state analogous point identification results in that context. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.05231&r=all 
By:  Gilles de Truchis; Elena Ivona Dumitrescu; Florent Dubois 
Abstract:  In this paper we propose a local Whittle estimator of stationary bivariate unbalanced fractional cointegration systems. Unbalanced cointegration refers to the situation where the observables have different integration orders, but their filtered versions have equal integration orders and are cointegrated in the usual sense. Based on the frequency domain representation of the unbalanced version of Phillips’ triangular system, we develop a semiparametric approach to jointly estimate the unbalance parameter, the long run coefficient, and the integration orders of the regressand and cointegrating errors. The paper establishes the consistency and asymptotic normality of this estimator. We find a peculiar rate of convergence for the unbalance estimator (possibly faster than rootn) and a singular joint limiting distribution of the unbalance and longrun coefficients. Its good finitesample properties are emphasized through Monte Carlo experiments. We illustrate the relevance of the developed estimator for financial data in an empirical application to the information flowing between the crude oil spot and CMENYMEX markets. 
Keywords:  Unbalanced cointegration, Long memory, Stationarity, Local Whittle likelihood 
JEL:  C22 G10 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:drm:wpaper:201915&r=all 
By:  Donald W.K. Andrews (Cowles Foundation, Yale University); Soonwoo Kwon (Department of Economics, Yale University) 
Abstract:  Standard tests and confidence sets in the moment inequality literature are not robust to model misspecification in the sense that they exhibit spurious precision when the identified set is empty. This paper introduces tests and confidence sets that provide correct asymptotic inference for a pseudotrue parameter in such scenarios, and hence, do not suffer from spurious precision. 
Keywords:  Asymptotics, confidence set, identification, inference, misspecification, moment inequalities, robust, spurious precision, test 
JEL:  C10 C12 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:cwl:cwldpp:2184&r=all 
By:  Zhihua Ma; Yishu Xue; Guanyu Hu 
Abstract:  Most existing spatial clustering literatures discussed the cluster algorithm for spatial responses. In this paper, we consider a Bayesian clustered regression for spatially dependent data in order to detect clusters in the covariate effects. Our proposed method is based on the Dirichlet process which provides a probabilistic framework for simultaneous inference of the number of clusters and the clustering configurations. A Markov chain Monte Carlo sampling algorithm is used to sample from the posterior distribution of the proposed model. In addition, Bayesian model diagnostic techniques are developed to assess the fitness of our proposed model, and check the accuracy of clustering results. Extensive simulation studies are conducted to evaluate the empirical performance of the proposed models. For illustration, our methodology is applied to a housing cost dataset of Georgia. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.02212&r=all 
By:  Mika Meitz (University of Helsinki); Daniel Preve (City University of Hong Kong); Pentti Saikkonen (University of Helsinki) 
Abstract:  A new mixture autoregressive model based on Student’s t–distribution is proposed. A key feature of our model is that the conditional t–distributions of the component models are based on autoregressions that have multivariate t–distributions as their (lowdimensional) stationary distributions. That autoregressions with such stationary distributions exist is not immediate. Our formulation implies that the conditional mean of each component model is a linear function of past observations and the conditional variance is also time varying. Compared to previous mixture autoregressive models our model may therefore be useful in applications where the data exhibits rather strong conditional heteroskedasticity. Our formulation also has the theoretical advantage that conditions for stationarity and ergodicity are always met and these properties are much more straightforward to establish than is common in nonlinear autoregressive models. An empirical example employing a realized kernel series based on S&P 500 highfrequency data shows that the proposed model performs well in volatility forecasting. 
Keywords:  Conditional heteroskedasticity; mixture model; regime switching; Student’s t–distribution 
URL:  http://d.repec.org/n?u=RePEc:cth:wpaper:gru_2018_013&r=all 
By:  Simon Lee; Serena Ng 
Abstract:  Datasets that are terabytes in size are increasingly common, but computer bottlenecks often frustrate a complete analysis of the data. While more data are better than less, diminishing returns suggest that we may not need terabytes of data to estimate a parameter or test a hypothesis. But which rows of data should we analyze, and might an arbitrary subset of rows preserve the features of the original data? This paper reviews a line of work that is grounded in theoretical computer science and numerical linear algebra, and which finds that an algorithmically desirable {\em sketch} of the data must have a {\em subspace embedding} property. Building on this work, we study how prediction and inference is affected by data sketching within a linear regression setup. The sketching error is small compared to the sample size effect which is within the control of the researcher. As a sketch size that is algorithmically optimal may not be suitable for prediction and inference, we use statistical arguments to provide `inference conscious' guides to the sketch size. When appropriately implemented, an estimator that pools over different sketches can be nearly as efficient as the infeasible one using the full sample. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.01954&r=all 
By:  Xinyu Song 
Abstract:  We provide a novel method for large volatility matrix prediction with highfrequency data by applying eigendecomposition to daily realized volatility matrix estimators and capturing eigenvalue dynamics with ARMA models. Given a sequence of daily volatility matrix estimators, we compute the aggregated eigenvectors and obtain the corresponding eigenvalues. Eigenvalues in the same relative magnitude form a time series and the ARMA models are further employed to model the dynamics within each eigenvalue time series to produce a predictor. We predict future large volatility matrix based on the predicted eigenvalues and the aggregated eigenvectors, and demonstrate the advantages of the proposed method in volatility prediction and portfolio allocation problems. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.01196&r=all 
By:  Bo Honore; Thomas Jorgensen; Aureo de Paula 
Abstract:  This paper introduces measures for how each moment contributes to the precision of the parameter estimates in GMM settings. For example, one of the measures asks what would happen to the variance of the parameter estimates if a particular moment was dropped from the estimation. The measures are all easy to compute. We illustrate the usefulness of the measures through two simple examples as well as an application to a model of joint retirement planning of couples. We estimate the model using the UKBHPS, and we find evidence of complementarities in leisure. Our sensitivity measures illustrate that the precision of the estimate of the complementarity is primarily driven by the distribution of the differences in planned retirement dates. The estimated econometric model can be interpreted as a bivariate ordered choice model that allows for simultaneity. This makes the model potentially useful in other applications. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.02101&r=all 
By:  Zihao Zhang; Stefan Zohren; Stephen Roberts 
Abstract:  We showcase how Quantile Regression (QR) can be applied to forecast financial returns using Limit Order Books (LOBs), the canonical data source of highfrequency financial timeseries. We develop a deep learning architecture that simultaneously models the return quantiles for both buy and sell positions. We test our model over millions of LOB updates across multiple different instruments on the London Stock Exchange. Our results suggest that the proposed network not only delivers excellent performance but also provides improved prediction robustness by combining quantile estimates. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.04404&r=all 
By:  Bajzik, Jozef; Havranek, Tomas; Irsova, Zuzana; Schwarz, Jiri 
Abstract:  A key parameter in international economics is the elasticity of substitution between domestic and foreign goods, also called the Armington elasticity. Yet estimates vary widely. We collect 3,524 reported estimates of the elasticity, construct 34 variables that reflect the context in which researchers obtain their estimates, and examine what drives the heterogeneity in the results. To account for inherent model uncertainty, we employ Bayesian and frequentist model averaging. We present the first application of newly developed nonlinear techniques to correct for publication bias. Our main results are threefold. First, there is publication bias against small and statistically insignificant elasticities. Second, differences in results are best explained by differences in data: aggregation, frequency, size, and dimension. Third, the mean elasticity implied by the literature after correcting for both publication bias and potential misspecifications is 3. 
Keywords:  Armington,trade elasticity,metaanalysis,publication bias,Bayesian model averaging 
JEL:  C83 D12 F14 
Date:  2019 
URL:  http://d.repec.org/n?u=RePEc:zbw:esprep:200207&r=all 
By:  Sangalli, Laura M.; Paganoni, Anna M.; Jiménez Recaredo, Raúl José; Elías Fernández, Antonio 
Abstract:  Censured functional data are becoming more recurrent in applications. In those cases, the existing depth measure are useless. In this paper, an approach for measuring depths of censured functional data is presented. Its performance for finite samples is tested by simulation, showing that the new depth agrees with a integrated depth for uncensured functional data. 
Keywords:  Integrated Depth; Partially Observed Data; Functional Data 
Date:  2019–07–10 
URL:  http://d.repec.org/n?u=RePEc:cte:wsrepe:28579&r=all 
By:  Donovan Platt 
Abstract:  Recent advances in computing power and the potential to make more realistic assumptions due to increased flexibility have led to the increased prevalence of simulation models in economics. While models of this class, and particularly agentbased models, are able to replicate a number of empiricallyobserved stylised facts not easily recovered by more traditional alternatives, such models remain notoriously difficult to estimate due to their lack of tractable likelihood functions. While the estimation literature continues to grow, existing attempts have approached the problem primarily from a frequentist perspective, with the Bayesian estimation literature remaining comparatively less developed. For this reason, we introduce a Bayesian estimation protocol that makes use of deep neural networks to construct an approximation to the likelihood, which we then benchmark against a prominent alternative from the existing literature. Overall, we find that our proposed methodology consistently results in more accurate estimates in a variety of settings, including the estimation of financial heterogeneous agent models and the identification of changes in dynamics occurring in models incorporating structural breaks. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.04522&r=all 
By:  Andreas Hagemann 
Abstract:  I introduce a simple permutation procedure to test conventional (nonsharp) hypotheses about the effect of a binary treatment in the presence of a finite number of large, heterogeneous clusters when the treatment effect is identified by comparisons across clusters. The procedure asymptotically controls size by applying a leveladjusted permutation test to a suitable statistic. The adjustments needed for most empirically relevant situations are tabulated in the paper. The adjusted permutation test is easy to implement in practice and performs well at conventional levels of significance with at least four treated clusters and a similar number of control clusters. It is particularly robust to situations where some clusters are much more variable than others. Examples and an empirical application are provided. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.01049&r=all 
By:  Nicolas Apfel 
Abstract:  The widely used shiftshare instrument is generated by summing the products of regional shares and aggregate shifts. All products must fulfill the exclusion restriction, for the instrument to be valid. I propose applying methods which can preselect invalid products when either more than half or the largest group of products is valid. I discuss extensions of these methods for fixed effects models. I illustrate the procedures with three applications: a simulation study, the labor market effect of Chinese import competition and the effect of immigration to the US. My results help explain why previous studies have found low effects of immigration. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.00222&r=all 
By:  Matias D. Cattaneo; Rocio Titiunik; Gonzalo VazquezBare 
Abstract:  This handbook chapter gives an introduction to the sharp regression discontinuity design, covering identification, estimation, inference, and falsification methods. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.04242&r=all 
By:  Lotfi Boudabsa; Damir Filipovic 
Abstract:  We introduce a computational framework for dynamic portfolio valuation and risk management building on machine learning with kernels. We learn the replicating martingale of a portfolio from a finite sample of its terminal cumulative cash flow. The learned replicating martingale is given in closed form thanks to a suitable choice of the kernel. We develop an asymptotic theory and prove convergence and a central limit theorem. We also derive finite sample error bounds and concentration inequalities. Numerical examples show good results for a relatively small training sample size. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.03726&r=all 
By:  Philip Dawid; Macartan Humphreys; Monica Musio 
Abstract:  Suppose X and Y are binary exposure and outcome variables, and we have full knowledge of the distribution of Y, given application of X. From this we know the average causal effect of X on Y. We are now interested in assessing, for a case that was exposed and exhibited a positive outcome, whether it was the exposure that caused the outcome. The relevant "probability of causation", PC, typically is not identified by the distribution of Y given X, but bounds can be placed on it, and these bounds can be improved if we have further information about the causal process. Here we consider cases where we know the probabilistic structure for a sequence of complete mediators between X and Y. We derive a general formula for calculating bounds on PC for any pattern of data on the mediators (including the case with no data). We show that the largest and smallest upper and lower bounds that can result from any complete mediation process can be obtained in processes with at most two steps. We also consider homogeneous processes with many mediators. PC can sometimes be identified as 0 with negative data, but it cannot be identified at 1 even with positive data on an infinite set of mediators. The results have implications for learning about causation from knowledge of general processes and of data on cases. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.00399&r=all 
By:  Matthieu Garcin (Research Center  Léonard de Vinci Pôle Universitaire  De Vinci Research Center) 
Date:  2019–06–03 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal02163662&r=all 
By:  Kibrom A. Abay; Leah Bevis; Christopher B. Barrett 
Abstract:  The mechanism(s) that generate measurement error matter to inference. Survey measurement error is typically thought to represent simple misreporting correctable through improved measurement. But errors might also or alternatively reflect respondent misperceptions that materially affect the respondent decisions under study. We show analytically that these alternate data generating processes imply different appropriate regression specifications and have distinct effects on the bias in parameter estimates. We introduce a simple empirical technique to generate unbiased estimates under more general conditions and to apportion measurement error between misreporting and misperceptions in measurement error when one has both selfreported and objectivelymeasured observations of the same explanatory variable. We then apply these techniques to the longstanding question of agricultural intensification: do farmers increase input application rates per unit area as the size of the plots they cultivate decreases? Using nationally representative data from four subSaharan African countries, we find strong evidence that measurement error in plot size reflects a mixture of farmer misreporting and misperceptions. The results matter to inference around the intensification hypothesis and call into question whether more objective, precise measures are always preferable when estimating behavioral parameters. 
JEL:  C18 O13 O55 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:26066&r=all 
By:  Fabrice Daniel 
Abstract:  This article studies the financial time series data processing for machine learning. It introduces the most frequent scaling methods, then compares the resulting stationarity and preservation of useful information for trend forecasting. It proposes an empirical test based on the capability to learn simple data relationship with simple models. It also speaks about the data split method specific to time series, avoiding unwanted overfitting and proposes various labelling for classification and regression. 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1907.03010&r=all 
By:  JeanFran\c{c}ois B\'egin; Mathieu Boudreault 
Abstract:  In this study, we develop a deterministic nonlinear filtering algorithm based on a highdimensional version of Kitagawa (1987) to evaluate the likelihood function of models that allow for stochastic volatility and jumps whose arrival intensity is also stochastic. We show numerically that the deterministic filtering method is precise and much faster than the particle filter, in addition to yielding a smooth function over the parameter space. We then find the maximum likelihood estimates of various models that include stochastic volatility, jumps in the returns and variance, and also stochastic jump arrival intensity with the S&P 500 daily returns. During the Great Recession, the jump arrival intensity increases significantly and contributes to the clustering of volatility and negative returns. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.04322&r=all 
By:  Nghia Nguyen; MinhNgoc Tran; David Gunawan; R. Kohn 
Abstract:  Stochastic Volatility (SV) models are widely used in the financial sector while Long ShortTerm Memory (LSTM) models have been successfully used in many largescale industrial applications of Deep Learning. Our article combines these two methods non trivially and proposes a model for capturing the dynamics of financial volatility process, which we call the LSTMSV model. The proposed model overcomes the shortterm memory problem in conventional SV models, is able to capture nonlinear dependence in the latent volatility process, and often has a better outofsample forecast performance than SV models. The conclusions are illustrated through simulation studies and applications to three financial time series datasets: US stock market weekly index SP500, Australian stock weekly index ASX200 and AustralianUS dollar daily exchange rates. We argue that there are significant differences in the underlying dynamics between the volatility process of SP500 and ASX200 datasets and that of the exchange rate dataset. For the stock index data, there is strong evidence of longterm memory and nonlinear dependence in the volatility process, while this is not the case for the exchange rates. An userfriendly software package together with the examples reported in the paper are available at https://github.com/vbayeslab. 
Date:  2019–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:1906.02884&r=all 
By:  Gonzalo, Jesús; Pitarakis, JeanYves 
Abstract:  Predictive regressions are a widely used econometric environment for assessing the predictability of economic and financial variables using past values of one or more predictors. The nature of the applications considered by practitioners often involve the use of predictors that have highly persistent smoothly varying dynamics as opposed to the much noisier nature of the variable being predicted. This imbalance tends to affect the accuracy of the estimates of the model parameters and the validity of inferences about them when one uses standard methods that do not explicitly recognise this and related complications. A growing literature that aimed at introducing novel techniques specifically designed to produce accurate inferences in such environments ensued. The frequent use of these predictive regressions in applied work has also led practitioners to question the validity of viewing predictability within a linear setting that ignores the possibility that predictability may occasionally be switched off. This in turn has generated a new stream of research aiming at introducing regime specific behaviour within predictive regressions in order to explicitly capture phenomena such as episodic predictability. 
Keywords:  Cusum; Structural Breaks; Thresholds; Economic Regime Shifts; Nonlinear Predictability; Nuisance Parameters; Instrumental Variables; Local To Unit Root; Persistence; Predictability 
Date:  2019–07 
URL:  http://d.repec.org/n?u=RePEc:cte:werepe:28554&r=all 