nep-ecm 2019-05-13 papers

on Econometrics

Issue of 2019‒05‒13
twenty papers chosen by
Sune Karlsson
Örebro universitet

Labor Composite Likelihood Estimation of an Autoregressive Panel Probit Model with Random Effects By Kerem Tuzcuoglu
Copula Multivariate GARCH Model with Constrained Hamiltonian Monte Carlo By Martin Burda; Louis Belisle
Instrumental Variable Estimation of Dynamic Linear Panel Data Models with Defactored Regressors and a Multifactor Error Structure By Milda Norkuté; Vasilis Sarafidis; Takashi Yamagata; Guowei Cui
Bias assessment and reduction for the 2SLS estimator in general dynamic simultaneous equations models By Wang, Dandan; Phillips, Garry David Alan
The power of (non-)linear shrinking: a review and guide to covariance matrix estimation By Olivier Ledoit; Michael Wolf
Non-standard inference for augmented double autoregressive models with null volatility coefficients By Feiyu Jiang; Dong Li; Ke Zhu
Modelling with Discretized Ordered Choice Covariates By Felix Chan; Ágoston Reguly; László Mátyás
Estimation of high-dimensional factor models and its application in power data analysis By Xin Shi; Robert Qiu; Tiebin Mi
Asymptotically Valid Bootstrap Inference for Proxy SVARs By Jentsch, Carsen; Lunsford, Kurt Graden
Lasso under Multi-way Clustering: Estimation and Post-selection Inference By Harold D. Chiang; Yuya Sasaki
Estimating the term structure with linear regressions: Getting to the roots of the problem By Adam Golinski; Peter Spencer
A Uniform Bound of the Operator Norm of Random Element Matrices and Operator Norm Minimizing Estimation By Hyungsik Roger Moon
The analysis of marked and weighted empirical processes of estimated residuals By Vanessa Berenguer-Rico; Søren Johansen; Bent Nielsen
A new approach to dating the reference cycle By Máximo Camacho; María Dolores Gadea; Ana Gómez Loscos
IFRS9 Expected Credit Loss Estimation: Advanced Models for Estimating Portfolio Loss and Weighting Scenario Losses By Yang, Bill Huajian; Wu, Biao; Cui, Kaijie; Du, Zunwei; Fei, Glenn
The Likelihood of Mixed Hitting Times By Jaap H. Abbring; Tim Salimans
Distance-Based Metrics: A Bayesian Solution to the Power and Extreme-Error Problems in Asset-Pricing Tests By Amit Goyal; Zhongzhi Lawrence He; Sahn-Wook Huh
Decision Making with Machine Learning and ROC Curves By Kai Feng; Han Hong; Ke Tang; Jingyuan Wang
Demand and Welfare Analysis in Discrete Choice Models with Social Interactions By Bhattacharya, Debopam; Dupas, Pascaline; Kanaya, Shin
p-Hacking: Evidence from Two Million Trading Strategies By Tarun Chordia; Amit Goyal; Alessio Saretto

Labor Composite Likelihood Estimation of an Autoregressive Panel Probit Model with Random Effects

By:	Kerem Tuzcuoglu
Abstract:	Modeling and estimating persistent discrete data can be challenging. In this paper, we use an autoregressive panel probit model where the autocorrelation in the discrete variable is driven by the autocorrelation in the latent variable. In such a non-linear model, the autocorrelation in an unobserved variable results in an intractable likelihood containing high-dimensional integrals. To tackle this problem, we use composite likelihoods that involve much lower order of integration. However, parameter identification becomes problematic since the information employed in lower dimensional distributions may not be rich enough for identification. Therefore, we characterize types of composite likelihoods that are valid for this model and study conditions under which the parameters can be identified. Moreover, we provide consistency and asymptotic normality results of the pairwise composite likelihood estimator and conduct Monte Carlo simulations to assess its finite-sample performances. Finally, we apply our method to analyze credit ratings. The results indicate a significant improvement in the estimated transition probabilities between rating classes compared with static models.
Keywords:	Credit risk management; Econometric and statistical methods; Economic models
JEL:	C23 C25 C58 G24
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:bca:bocawp:19-16&r=all

Copula Multivariate GARCH Model with Constrained Hamiltonian Monte Carlo

By:	Martin Burda; Louis Belisle
Abstract:	The Copula Multivariate GARCH (CMGARCH) model is based on a dynamic copula function with time-varying parameters. It is particularly suited for modelling dynamic dependence of non-elliptically distributed financial returns series. The model allows for capturing more flexible dependence patterns than a multivariate GARCH model and also generalizes static copula dependence models. Nonetheless, the model is subject to a number of parameter constraints that ensure positivity of variances and covariance stationarity of the modeled stochastic processes. As such, the resulting distribution of parameters of interest is highly irregular, characterized by skewness, asymmetry, and truncation, hindering the applicability and accuracy of asymptotic inference. In this paper, we propose Bayesian analysis of the CMGARCH model based on Constrained Hamiltonian Monte Carlo (CHMC), which has been shown in other contexts to yield efficient inference on complicated constrained dependence structures. In the CMGARCH context, we contrast CHMC with traditional random-walk sampling used in the previous literature and highlight the benefits of CHMC for applied researchers. We estimate the posterior mean, median and Bayesian confidence intervals for the coefficients of tail dependence. The analysis is performed in an application to a recent portfolio of S&P500 financial asset returns.
Keywords:	Dynamic conditional volatility, varying correlation model, Markov Chain Monte Carlo
JEL:	C11 C15 C32 C63
Date:	2019–04–29
URL:	http://d.repec.org/n?u=RePEc:tor:tecipa:tecipa-638&r=all

Instrumental Variable Estimation of Dynamic Linear Panel Data Models with Defactored Regressors and a Multifactor Error Structure

By:	Milda Norkuté; Vasilis Sarafidis; Takashi Yamagata; Guowei Cui
Abstract:	This paper develops two instrumental variable (IV) estimators for dynamic panel data models with exogenous covariates and a multifactor error structure when both crosssectional and time series dimensions, N and T respectively, are large. Our approach initially projects out the common factors from the exogenous covariates of the model, and constructs instruments based on this defactored covariates. For models with homogeneous slope coe_cients, we propose a two-step IV estimator: the _rst step IV estimator is obtained using the defactored covariates as instruments. In the second step, the entire model is defactored by the extracted factors from the residuals of the _rst step estimation and subsequently obtain the _nal IV estimator. For models with heterogeneous slope coe _cients, we propose a mean-group type estimator, which is the cross-sectional average of _rst-step IV estimators of cross-section speci_c slopes. It is noteworthy that our estimators do not require us to seek for instrumental variables outside the model. Furthermore, our estimators are linear hence computationally robust and inexpensive. Moreover, they require no bias correction, and they are not subject to the small sample bias of least squares type estimators. The _nite sample performances of the proposed estimators and associated statistical tests are investigated, and the results show that the estimators and the tests perform well even for small N and T.
Date:	2018–02
URL:	http://d.repec.org/n?u=RePEc:dpr:wpaper:1019r&r=all

Bias assessment and reduction for the 2SLS estimator in general dynamic simultaneous equations models

By:	Wang, Dandan; Phillips, Garry David Alan
Abstract:	We consider the bias of the 2SLS estimator in general dynamic simultaneousequation models with g endogenous regressors. By using asymptotic expansion techniques we approximate 2SLS coefficient estimation bias under innovation errors, p lagged-dependent variables and strongly-exogenous explanatory variables. Large-T approximations bias of the structural form is then used to construct corrected estimators for the parameters of interest in the general DSEM (C2SLS). Simulations show that the C2SLS gives almost unbiased estimators and low mean squared errors. Alternatively, the numerical bootstrap method results suggest that the non-parametric bootstrap could be used in 2SLS for improving estimation in general DSEM.
Keywords:	C2sls; 2sls; Monte Carlo Simulations; Bootstrap; Bias Correction; Asymptotic Approximations; General Dynamic Simultaneous Equations Model
JEL:	C32 C13
Date:	2019–04–29
URL:	http://d.repec.org/n?u=RePEc:cte:wsrepe:28322&r=all

The power of (non-)linear shrinking: a review and guide to covariance matrix estimation

By:	Olivier Ledoit; Michael Wolf
Abstract:	Many econometric and data-science applications require a reliable estimate of the covariance matrix, such as Markowitz portfolio selection. When the number of variables is of the same magnitude as the number of observations, this constitutes a difficult estimation problem; the sample covariance matrix certainly will not do. In this paper, we review our work in this area going back 15+ years. We have promoted various shrinkage estimators, which can be classified into linear and nonlinear. Linear shrinkage is simpler to understand, to derive, and to implement. But nonlinear shrinkage can deliver another level of performance improvement, especially if overlaid with stylized facts such as time-varying co-volatility or factor models.
Keywords:	Dynamic conditional correlations, factor models, large-dimensional asymptotics, Markowitz portfolio selection, rotation equivariance
JEL:	C13 C58 G11
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:zur:econwp:323&r=all

Non-standard inference for augmented double autoregressive models with null volatility coefficients

By:	Feiyu Jiang; Dong Li; Ke Zhu
Abstract:	This paper considers an augmented double autoregressive (DAR) model, which allows null volatility coefficients to circumvent the over-parameterization problem in the DAR model. Since the volatility coefficients might be on the boundary, the statistical inference methods based on the Gaussian quasi-maximum likelihood estimation (GQMLE) become non-standard, and their asymptotics require the data to have a finite sixth moment, which narrows applicable scope in studying heavy-tailed data. To overcome this deficiency, this paper develops a systematic statistical inference procedure based on the self-weighted GQMLE for the augmented DAR model. Except for the Lagrange multiplier test statistic, the Wald, quasi-likelihood ratio and portmanteau test statistics are all shown to have non-standard asymptotics. The entire procedure is valid as long as the data is stationary, and its usefulness is illustrated by simulation studies and one real example.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.01798&r=all

Modelling with Discretized Ordered Choice Covariates

By:	Felix Chan; Ágoston Reguly; László Mátyás
Abstract:	This paper deals with econometric models where some (or all) explanatory variables (or covariates) are observed as discretized ordered choices. Such variables are in theory continuous, but in this form are not observed at all, their distribution is unknown, and instead only a set of discrete choices are observed. We explore how such variables influence inference, more precisely, we show that this leads to a very special form of measurement error, and consequently to endogeneity bias. We then propose appropriate sub-sampling and instrumental variables (IV) estimation methods to deal with the problem.
Date:	2019–05–02
URL:	http://d.repec.org/n?u=RePEc:ceu:econwp:2019_2&r=all

Estimation of high-dimensional factor models and its application in power data analysis

By:	Xin Shi; Robert Qiu; Tiebin Mi
Abstract:	In dealing with high-dimensional data, factor models are often used for reducing dimensions and extracting relevant information. The spectrum of covariance matrices from power data exhibits two aspects: 1) bulk, which arises from random noise or fluctuations and 2) spikes, which represents factors caused by anomaly events. In this paper, we propose a new approach to the estimation of high-dimensional factor models, minimizing the distance between the empirical spectral density (ESD) of covariance matrices of the residuals of power data that are obtained by subtracting principal components and the limiting spectral density (LSD) from a multiplicative covariance structure model. The free probability techniques in random matrix theory (RMT) are used to calculate the spectral density of the multiplicative covariance model, which efficiently solves the computational difficulties. The proposed approach connects the estimation of the number of factors to the LSD of covariance matrices of the residuals, which provides estimators of the number of factors and the correlation structure information in the residuals. Considering a lot of measurement noise is contained in power data and the correlation structure is complex for the residuals from power data, the approach prefers approaching the ESD of covariance matrices of the residuals through a multiplicative covariance model, which avoids making crude assumptions or simplifications on the complex structure of the data. Theoretical studies show the proposed approach is robust to noise and sensitive to the presence of weak factors. The synthetic data from IEEE 118-bus power system is used to validate the effectiveness of the approach. Furthermore, the application to the analysis of the real-world online monitoring data in a power grid shows that the estimators in the approach can be used to indicate the system states.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.02061&r=all

Asymptotically Valid Bootstrap Inference for Proxy SVARs

By:	Jentsch, Carsen (TU Dortmund University); Lunsford, Kurt Graden (Federal Reserve Bank of Cleveland)
Abstract:	Proxy structural vector autoregressions identify structural shocks in vector autoregressions with external variables that are correlated with the structural shocks of interest but uncorrelated with all other structural shocks. We provide asymptotic theory for this identification approach under mild α-mixing conditions that cover a large class of uncorrelated, but possibly dependent innovation processes, including conditional heteroskedasticity. We prove consistency of a residual-based moving block bootstrap for inference on statistics such as impulse response functions and forecast error variance decompositions. Wild bootstraps are proven to be generally invalid for these statistics and their coverage rates can be badly and persistently mis-sized.
Keywords:	External Instruments; Mixing; Proxy Variables; Residual-Based Moving Block Bootstrap; Structural Vector Autoregression; Wild Bootstrap;
JEL:	C30 C32
Date:	2019–05–03
URL:	http://d.repec.org/n?u=RePEc:fip:fedcwq:190800&r=all

Lasso under Multi-way Clustering: Estimation and Post-selection Inference

By:	Harold D. Chiang; Yuya Sasaki
Abstract:	This paper studies regression models with lasso when data is sampled under multi-way clustering. First, we establish the convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method based on a post-double-selection procedure and show its asymptotic validity. Our procedure can be easily implemented with existing statistical packages. Simulation results demonstrate that the proposed procedure works well in finite sample.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.02107&r=all

Estimating the term structure with linear regressions: Getting to the roots of the problem

By:	Adam Golinski; Peter Spencer
Abstract:	Linear estimators of the affine term structure model are inconsistent since they cannot reproduce the factors used in estimation. This is a serious handicap empirically,giving a worse fit than the conventional ML estimator that ensures consistency. We show that a simple self-consistent estimator can be constructed using the eigenvalue decomposition of a regression estimator. The remaining parameters of the model follow analytically. The fit of this model is virtually indistinguishable from that of the ML estimator. We apply the method to estimate various models of U.S. Treasury yields and a joint model of the U.S. and German yield curves.
Keywords:	term structure, linear regression estimators, self-consistent model, estimation methods, two-country model.
JEL:	C13 G12
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:yor:yorken:19/05&r=all

A Uniform Bound of the Operator Norm of Random Element Matrices and Operator Norm Minimizing Estimation

By:	Hyungsik Roger Moon
Abstract:	In this paper, we derive a uniform stochastic bound of the operator norm (or equivalently, the largest singular value) of random matrices whose elements are indexed by parameters. As an application, we propose a new estimator that minimizes the operator norm of the matrix that consists of the moment functions. We show the consistency of the estimator.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.01096&r=all

The analysis of marked and weighted empirical processes of estimated residuals

By:	Vanessa Berenguer-Rico (University of Oxford); Søren Johansen (University of Copenhagen and CREATES); Bent Nielsen (University of Oxford)
Abstract:	An extended and improved theory is presented for marked and weighted empirical processes of residuals of time series regressions. The theory is motivated by 1-step Huber-skip estimators, where a set of good observations are selected using an initial estimator and an updated estimator is found by applying least squares to the selected observations. In this case, the weights and marks represent powers of the regressors and the regression errors, respectively. The inclusion of marks is a non-trivial extention to previous theory and requires refined martingale arguments.
Keywords:	1-step Huber-skip, Non-stationarity, Robust Statistics, Stationarity
JEL:	C13
Date:	2019–04–29
URL:	http://d.repec.org/n?u=RePEc:aah:create:2019-06&r=all

A new approach to dating the reference cycle

By:	Máximo Camacho; María Dolores Gadea (University of Zaragoza); Ana Gómez Loscos (Banco de España)
Abstract:	This paper proposes a new approach to the analysis of the reference cycle turning points, defined on the basis of the specific turning points of a broad set of coincident economic indicators. Each individual pair of specific peaks and troughs from these indicators is viewed as a realization of a mixture of an unspecified number of separate bivariate Gaussian distributions whose different means are the reference turning points. These dates break the sample into separate reference cycle phases, whose shifts are modeled by a hidden Markov chain. The transition probability matrix is constrained so that the specification is equivalent to a multiple changepoint model. Bayesian estimation of finite Markov mixture modeling techniques is suggested to estimate the model. Several Monte Carlo experiments are used to show the accuracy of the model to date reference cycles that suffer from short phases, uncertain turning points, small samples and asymmetric cycles. In the empirical section, we show the high performance of our approach to identifying the US reference cycle, with little difference from the timing of the turning point dates established by the NBER. In a pseudo real-time analysis, we also show the good performance of this methodology in terms of accuracy and speed of detection of turning point dates.
Keywords:	business cycles, turning points, finite mixture models
JEL:	E32 C22 E27
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:bde:wpaper:1914&r=all

IFRS9 Expected Credit Loss Estimation: Advanced Models for Estimating Portfolio Loss and Weighting Scenario Losses

By:	Yang, Bill Huajian; Wu, Biao; Cui, Kaijie; Du, Zunwei; Fei, Glenn
Abstract:	Estimation of portfolio expected credit loss is required for IFRS9 regulatory purposes. It starts with the estimation of scenario loss at loan level, and then aggregated and summed up by scenario probability weights to obtain portfolio expected loss. This estimated loss can vary significantly, depending on the levels of loss severity generated by the IFSR9 models, and the probability weights chosen. There is a need for a quantitative approach for determining the weights for scenario losses. In this paper, we propose a model to estimate the expected portfolio losses brought by recession risk, and a quantitative approach for determining the scenario weights. The model and approach are validated by an empirical example, where we stress portfolio expected loss by recession risk, and calculate the scenario weights accordingly.
Keywords:	Scenario weight, stressed expected credit loss, loss severity, recession probability, Vasicek distribution, probit mixed model
JEL:	C02 C1 C10 C13 C18 C22 C32 C46 C51 C52 C53 G1 G18 G31 G32 G38
Date:	2019–04–18
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:93634&r=all

The Likelihood of Mixed Hitting Times

By:	Jaap H. Abbring; Tim Salimans
Abstract:	We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent L\'evy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of L\'evy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan's (1985) strike data.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.03463&r=all

Distance-Based Metrics: A Bayesian Solution to the Power and Extreme-Error Problems in Asset-Pricing Tests

By:	Amit Goyal (University of Lausanne); Zhongzhi Lawrence He (Brock University, Goodman School of Business); Sahn-Wook Huh (State University of New York (SUNY) - Department of Finance)
Abstract:	We propose a unified set of distance-based performance metrics that address the power and extreme-error problems inherent in traditional measures for asset-pricing tests. From a Bayesian perspective, the distance metrics coherently incorporate both pricing errors and their standard errors. Measured in units of return, they have an economic interpretation as the minimum cost of holding a dogmatic belief in a model. Our metrics identify Fama and French (2015) factor model (augmented with the momentum factor and/or without the value factor) as the best model and thus highlight the importance of the momentum factor. In contrast, the traditional alpha-based statistics often lead to inconsistent and counter-intuitive model rankings.
Keywords:	Asset-Pricing Tests, Power Problem, Extreme-Error Problem, Distance-Based Metrics, Optimal Transport Theory, Bayesian Interpretations, Model Comparisons and Rankings
JEL:	C11 G11 G12
Date:	2018–12
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp1878&r=all

Decision Making with Machine Learning and ROC Curves

By:	Kai Feng; Han Hong; Ke Tang; Jingyuan Wang
Abstract:	The Receiver Operating Characteristic (ROC) curve is a representation of the statistical information discovered in binary classification problems and is a key concept in machine learning and data science. This paper studies the statistical properties of ROC curves and its implication on model selection. We analyze the implications of different models of incentive heterogeneity and information asymmetry on the relation between human decisions and the ROC curves. Our theoretical discussion is illustrated in the context of a large data set of pregnancy outcomes and doctor diagnosis from the Pre-Pregnancy Checkups of reproductive age couples in Henan Province provided by the Chinese Ministry of Health.
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1905.02810&r=all

Demand and Welfare Analysis in Discrete Choice Models with Social Interactions

By:	Bhattacharya, Debopam; Dupas, Pascaline; Kanaya, Shin
Abstract:	Many real-life settings of consumer choice involve social interactions, causing targeted policies to have spillover effects. This paper develops novel empirical tools for analyzing demand and welfare effects of policy interventions in binary choice settings with social interactions. Examples include subsidies for health product adoption and vouchers for attending a high-achieving school. We establish the connection between econometrics of large games and Brock-Durlauf-type interaction models, under both I.I.D. and spatially correlated unobservables. We develop new convergence results for associated beliefs and estimates of preference parameters under increasing domain spatial asymptotics. Next, we show that even with fully parametric specifications and unique equilibrium, choice data, that are sufficient for counterfactual demand prediction under interactions, are insufficient for welfare calculations. This is because distinct underlying mechanisms producing the same interaction coefficient can imply different welfare effects and deadweight-loss from a policy intervention. Standard index-restrictions imply distribution-free bounds on welfare. We illustrate our results using experimental data on mosquito-net adoption in rural Kenya.
Date:	2019–04
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:13707&r=all

p-Hacking: Evidence from Two Million Trading Strategies

By:	Tarun Chordia (Emory University - Department of Finance); Amit Goyal (University of Lausanne); Alessio Saretto (University of Texas at Dallas - School of Management - Department of Finance & Managerial Economics)
Abstract:	We implement a data mining approach to generate about 2.1 million trading strategies. This large set of strategies serves as a laboratory to evaluate the seriousness of p-hacking and data snooping in finance. We apply multiple hypothesis testing techniques that account for cross-correlations in signals and returns to produce t-statistic thresholds that control the proportion of false discoveries. We find that the difference in rejections rates produced by single and multiple hypothesis testing is such that most rejections of the null of no outperformance under single hypothesis testing are likely false (i.e., we find a very high rate of type I errors). Combining statistical criteria with economic considerations, we find that a remarkably small number of strategies survive our thorough vetting procedure. Even these surviving strategies have no theoretical underpinnings. Overall, p-hacking is a serious problem and, correcting for it, outperforming trading strategies are rare.
Keywords:	Hypothesis testing, False discoveries, Trading strategies
JEL:	G10 G11 G12
Date:	2017–08
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp1737&r=all

This nep-ecm issue is ©2019 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.