nep-ecm New Economics Papers
on Econometrics
Issue of 2021‒02‒15
twenty papers chosen by
Sune Karlsson
Örebro universitet

  1. A Test of Sufficient Condition for Infinite-step Granger Noncausality in Infinite Order Vector Autoregressive Process By Umberto Triacca; Olivier Damette; Alessandro Giovannelli
  2. The Bootstrap for Network Dependent Processes By Denis Kojevnikov
  3. A Design-Based Perspective on Synthetic Control Methods By Lea Bottmer; Guido Imbens; Jann Spiess; Merrill Warnick
  4. Local linear tie-breaker designs By Dan M. Kluger; Art B. Owen
  5. On an integer-valued stochastic intensity model for time series of counts By Aknouche, Abdelhakim; Dimitrakopoulos, Stefanos
  6. Tree-based Node Aggregation in Sparse Graphical Models By Ines Wilms; Jacob Bien
  7. Adaptive Estimation of Quadratic Functionals in Nonparametric Instrumental Variable Models By Christoph Breunig; Xiaohong Chen
  8. A note on likelihood ratio tests for models with latent variables By Chen, Yunxiao; Moustaki, Irini; Zhang, H
  9. Inference on the New Keynesian Phillips Curve with Very Many Instrumental Variables By Max-Sebastian Dov\`i
  10. Consistent specification testing under spatial dependence By Abhimanyu Gupta; Xi Qu
  11. Gaussian Process Latent Class Choice Models By Georges Sfeir; Filipe Rodrigues; Maya Abou-Zeid
  12. Discrete Choice Analysis with Machine Learning Capabilities By Youssef M. Aboutaleb; Mazen Danaf; Yifei Xie; Moshe Ben-Akiva
  13. A new class of tail dependence measures and their maximization By Takaaki Koike; Shogo Kato; Marius Hofert
  14. Kernel Density Estimation with Linked Boundary Conditions By Matthew J. Colbrook; Zdravko I. Botev; Karsten Kuritz; Shev MacNamara
  15. Designing multi-model applications with surrogate forecast systems By Smith, Leonard A.; Du, Hailiang; Higgins, Sarah
  16. Nowcasting Monthly GDP with Big Data: a Model Averaging Approach By Tommaso Proietti; Alessandro Giovannelli
  17. Unraveling S&P500 stock volatility and networks -- An encoding and decoding approach By Xiaodong Wang; Fushing Hsieh
  18. Peaks, Gaps, and Time Reversibility of Economic Time Series By Tommaso Proietti
  19. The accuracy of asymmetric GARCH model estimation By Olivier Darné; Amélie Charles
  20. Evaluating the Discrimination Ability of Proper Multivariate Scoring Rules By Carol Alexander; Michael Coulon; Yang Han; Xiaochun Meng

  1. By: Umberto Triacca (University of L'Aquila); Olivier Damette (University of Lorraine); Alessandro Giovannelli (University of L'Aquila)
    Abstract: This paper derives a sufficient condition for noncausality at all forecast horizons (infinitestep noncausality). We propose a test procedure for this sufficient condition. Our procedure presents two main advantages. First, our infinite-step Granger causality analysis is conducted in a more general framework with respect to the procedures proposed in literature. Second, it involves only linear restrictions under the null, that can be tested by using standard F statistics. A simulation study shows that the proposed procedure has reasonable size and good power. Typically, one thousand or more observations are required to ensure that the test procedures perform reasonably well. These are typical sample sizes for financial time series applications. Here, we give a first example of possible applications by considering the Mixture Distribution Hypothesis in the Foreign Exchange Market
    Keywords: Granger causality,Hypothesis testing,Time series,Vector autoregressive Models
    JEL: C12 C22 C58
    Date: 2020–06–18
  2. By: Denis Kojevnikov
    Abstract: This paper focuses on the bootstrap for network dependent processes under the conditional $\psi$-weak dependence. Such processes are distinct from other forms of random fields studied in the statistics and econometrics literature so that the existing bootstrap methods cannot be applied directly. We propose a block-based approach and a modification of the dependent wild bootstrap for constructing confidence sets for the mean of a network dependent process. In addition, we establish the consistency of these methods for the smooth function model and provide the bootstrap alternatives to the network heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We find that the modified dependent wild bootstrap and the corresponding variance estimator are consistent under weaker conditions relative to the block-based method, which makes the former approach preferable for practical implementation.
    Date: 2021–01
  3. By: Lea Bottmer; Guido Imbens; Jann Spiess; Merrill Warnick
    Abstract: Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control (SC) methods have quickly become one of the leading methods for estimating causal effects in observational studies with panel data. Formal discussions often motivate SC methods by the assumption that the potential outcomes were generated by a factor model. Here we study SC methods from a design-based perspective, assuming a model for the selection of the treated unit(s), e.g., random selection as guaranteed in a randomized experiment. We show that SC methods offer benefits even in settings with randomized assignment, and that the design perspective offers new insights into SC methods for observational data. A first insight is that the standard SC estimator is not unbiased under random assignment. We propose a simple modification of the SC estimator that guarantees unbiasedness in this setting and derive its exact, randomization-based, finite sample variance. We also propose an unbiased estimator for this variance. We show in settings with real data that under random assignment this Modified Unbiased Synthetic Control (MUSC) estimator can have a root mean-squared error (RMSE) that is substantially lower than that of the difference-in-means estimator. We show that such an improvement is weakly guaranteed if the treated period is similar to the other periods, for example, if the treated period was randomly selected. The improvement is most likely to be substantial if the number of pre-treatment periods is large relative to the number of control units.
    Date: 2021–01
  4. By: Dan M. Kluger; Art B. Owen
    Abstract: Tie-breaker experimental designs are hybrids of Randomized Control Trials (RCTs) and Regression Discontinuity Designs (RDDs) in which subjects with moderate scores are placed in an RCT while subjects with extreme scores are deterministically assigned to the treatment or control group. The design maintains the benefits of randomization for causal estimation while avoiding the possibility of excluding the most deserving recipients from the treatment group. The causal effect estimator for a tie-breaker design can be estimated by fitting local linear regressions for both the treatment and control group, as is typically done for RDDs. We study the statistical efficiency of such local linear regression-based causal estimators as a function of $\Delta$, the radius of the interval in which treatment randomization occurs. In particular, we determine the efficiency of the estimator as a function of $\Delta$ for a fixed, arbitrary bandwidth under the assumption of a uniform assignment variable. To generalize beyond uniform assignment variables and asymptotic regimes, we also demonstrate on the Angrist and Lavy (1999) classroom size dataset that prior to conducting an experiment, an experimental designer can estimate the efficiency for various experimental radii choices by using Monte Carlo as long as they have access to the distribution of the assignment variable. For both uniform and triangular kernels, we show that increasing the radius of randomized experiment interval will increase the efficiency until the radius is the size of the local-linear regression bandwidth, after which no additional efficiency benefits are conferred.
    Date: 2021–01
  5. By: Aknouche, Abdelhakim; Dimitrakopoulos, Stefanos
    Abstract: We propose a broad class of count time series models, the mixed Poisson integer-valued stochastic intensity models. The proposed specification encompasses a wide range of conditional distributions of counts. We study its probabilistic structure and design Markov chain Monte Carlo algorithms for two cases; the Poisson and the negative binomial distributions. The methodology is applied to simulated data as well as to various data sets. Model comparison using marginal likelihoods and forecast evaluation using point and density forecasts are also considered.
    Keywords: Markov chain Monte Carlo, mixed Poisson process, parameter-driven models, count time series models.
    JEL: C11 C13 C15 C18 C25 C5 C51 C53 C63
    Date: 2020–01–01
  6. By: Ines Wilms; Jacob Bien
    Abstract: High-dimensional graphical models are often estimated using regularization that is aimed at reducing the number of edges in a network. In this work, we show how even simpler networks can be produced by aggregating the nodes of the graphical model. We develop a new convex regularized method, called the tree-aggregated graphical lasso or tag-lasso, that estimates graphical models that are both edge-sparse and node-aggregated. The aggregation is performed in a data-driven fashion by leveraging side information in the form of a tree that encodes node similarity and facilitates the interpretation of the resulting aggregated nodes. We provide an efficient implementation of the tag-lasso by using the locally adaptive alternating direction method of multipliers and illustrate our proposal's practical advantages in simulation and in applications in finance and biology.
    Date: 2021–01
  7. By: Christoph Breunig; Xiaohong Chen
    Abstract: This paper considers adaptive estimation of quadratic functionals in the nonparametric instrumental variables (NPIV) models. Minimax estimation of a quadratic functional of a NPIV is an important problem in optimal estimation of a nonlinear functional of an ill-posed inverse regression with an unknown operator using one random sample. We first show that a leave-one-out, sieve NPIV estimator of the quadratic functional proposed by \cite{BC2020} attains a convergence rate that coincides with the lower bound previously derived by \cite{ChenChristensen2017}. The minimax rate is achieved by the optimal choice of a key tuning parameter (sieve dimension) that depends on unknown NPIV model features. We next propose a data driven choice of the tuning parameter based on Lepski's method. The adaptive estimator attains the minimax optimal rate in the severely ill-posed case and in the regular, mildly ill-posed case, but up to a multiplicative $\sqrt{\log n}$ in the irregular, mildly ill-posed case.
    Date: 2021–01
  8. By: Chen, Yunxiao; Moustaki, Irini; Zhang, H
    Abstract: The likelihood ratio test (LRT) is widely used for comparing the relative fit of nested latent variable models. Following Wilks’ theorem, the LRT is conducted by comparing the LRT statistic with its asymptotic distribution under the restricted model, a χ 2 distribution with degrees of freedom equal to the difference in the number of free parameters between the two nested models under comparison. For models with latent variables such as factor analysis, structural equation models and random effects models, however, it is often found that the χ 2 approximation does not hold. In this note, we show how the regularity conditions of Wilks’ theorem may be violated using three examples of models with latent variables. In addition, a more general theory for LRT is given that provides the correct asymptotic theory for these LRTs. This general theory was first established in Chernoff (J R Stat Soc Ser B (Methodol) 45:404–413, 1954) and discussed in both van der Vaart (Asymptotic statistics, Cambridge, Cambridge University Press, 2000) and Drton (Ann Stat 37:979–1012, 2009), but it does not seem to have received enough attention. We illustrate this general theory with the three examples.
    Keywords: Wilks’ theorem; χ 2 -distribution; latent variable models; random effects models; dimensionality; tangent cone
    JEL: C1
    Date: 2020–12–21
  9. By: Max-Sebastian Dov\`i
    Abstract: Limited-information inference on New Keynesian Phillips Curves (NKPCs) and other single-equation macroeconomic relations is characterised by weak and high-dimensional instrumental variables (IVs). Beyond the efficiency concerns previously raised in the literature, I show by simulation that ad-hoc selection procedures can lead to substantial biases in post-selection inference. I propose a Sup Score test that remains valid under dependent data, arbitrarily weak identification, and a number of IVs that increases exponentially with the sample size. Conducting inference on a standard NKPC with 359 IVs and 179 observations, I find substantially wider confidence sets than those commonly found.
    Date: 2021–01
  10. By: Abhimanyu Gupta; Xi Qu
    Abstract: We propose a series-based nonparametric specification test for a regression function when data are spatially dependent, the `space' being of a general economic or social nature. Dependence can be parametric, parametric with increasing dimension, semiparametric or any combination thereof, thus covering a vast variety of settings. These include spatial error models of varying types and levels of complexity. Under a new smooth spatial dependence condition, our test statistic is asymptotically standard normal. To prove the latter property, we establish a central limit theorem for quadratic forms in linear processes in an increasing dimension setting. Finite sample performance is investigated in a simulation study and empirical examples illustrate the test with real-world data.
    Date: 2021–01
  11. By: Georges Sfeir; Filipe Rodrigues; Maya Abou-Zeid
    Abstract: We present a Gaussian Process - Latent Class Choice Model (GP-LCCM) to integrate a non-parametric class of probabilistic machine learning within discrete choice models (DCMs). Gaussian Processes (GPs) are kernel-based algorithms that incorporate expert knowledge by assuming priors over latent functions rather than priors over parameters, which makes them more flexible in addressing nonlinear problems. By integrating a Gaussian Process within a LCCM structure, we aim at improving discrete representations of unobserved heterogeneity. The proposed model would assign individuals probabilistically to behaviorally homogeneous clusters (latent classes) using GPs and simultaneously estimate class-specific choice models by relying on random utility models. Furthermore, we derive and implement an Expectation-Maximization (EM) algorithm to jointly estimate/infer the hyperparameters of the GP kernel function and the class-specific choice parameters by relying on a Laplace approximation and gradient-based numerical optimization methods, respectively. The model is tested on two different mode choice applications and compared against different LCCM benchmarks. Results show that GP-LCCM allows for a more complex and flexible representation of heterogeneity and improves both in-sample fit and out-of-sample predictive power. Moreover, behavioral and economic interpretability is maintained at the class-specific choice model level while local interpretation of the latent classes can still be achieved, although the non-parametric characteristic of GPs lessens the transparency of the model.
    Date: 2021–01
  12. By: Youssef M. Aboutaleb; Mazen Danaf; Yifei Xie; Moshe Ben-Akiva
    Abstract: This paper discusses capabilities that are essential to models applied in policy analysis settings and the limitations of direct applications of off-the-shelf machine learning methodologies to such settings. Traditional econometric methodologies for building discrete choice models for policy analysis involve combining data with modeling assumptions guided by subject-matter considerations. Such considerations are typically most useful in specifying the systematic component of random utility discrete choice models but are typically of limited aid in determining the form of the random component. We identify an area where machine learning paradigms can be leveraged, namely in specifying and systematically selecting the best specification of the random component of the utility equations. We review two recent novel applications where mixed-integer optimization and cross-validation are used to algorithmically select optimal specifications for the random utility components of nested logit and logit mixture models subject to interpretability constraints.
    Date: 2021–01
  13. By: Takaaki Koike; Shogo Kato; Marius Hofert
    Abstract: A new class of measures of bivariate tail dependence is proposed, which is defined as a limit of a measure of concordance of the underlying copula restricted to the tail region of interest. The proposed tail dependence measures include tail dependence coefficients as special cases, but capture the extremal relationship between random variables not only along the diagonal but also along all the angles weighted by the so-called tail generating measure. As a result, the proposed tail dependence measures overcome the issue that the tail dependence coefficients underestimate the extent of extreme co-movements. We also consider the so-called maximal and minimal tail dependence measures, defined as the maximum and minimum of the tail dependence measures among all tail generating measures for a given copula. It turns out that the minimal tail dependence measure coincides with the tail dependence coefficient, and the maximal tail dependence measure overestimates the degree of extreme co-movements. We investigate properties, representations and examples of the proposed tail dependence measures, and their performance is demonstrated in a series of numerical experiments. For fair assessment of tail dependence and stability of estimation under small sample size, we support the use of tail dependence measures weighted over all angles compared with maximal and minimal ones.
    Date: 2021–01
  14. By: Matthew J. Colbrook; Zdravko I. Botev; Karsten Kuritz; Shev MacNamara
    Abstract: Kernel density estimation on a finite interval poses an outstanding challenge because of the well-recognized bias at the boundaries of the interval. Motivated by an application in cancer research, we consider a boundary constraint linking the values of the unknown target density function at the boundaries. We provide a kernel density estimator (KDE) that successfully incorporates this linked boundary condition, leading to a non-self-adjoint diffusion process and expansions in nonseparable generalized eigenfunctions. The solution is rigorously analyzed through an integral representation given by the unified transform (or Fokas method). The new KDE possesses many desirable properties, such as consistency, asymptotically negligible bias at the boundaries, and an increased rate of approximation, as measured by the AMISE. We apply our method to the motivating example in biology and provide numerical experiments with synthetic data, including comparisons with state-of-the-art KDEs (which currently cannot handle linked boundary constraints). Results suggest that the new method is fast and accurate. Furthermore, we demonstrate how to build statistical estimators of the boundary conditions satisfied by the target function without a priori knowledge. Our analysis can also be extended to more general boundary conditions that may be encountered in applications.
    Keywords: boundary bias; biological cell cycle; density estimation; diffusion; linked boundary conditions; unified transform
    Date: 2020–12–01
  15. By: Smith, Leonard A.; Du, Hailiang; Higgins, Sarah
    Abstract: Probabilistic forecasting is common in a wide variety of fields including geoscience, social science, and finance. It is sometimes the case that one has multiple probability forecasts for the same target.How is the information in these multiple nonlinear forecast systems best "combined"? Assuming stationarity, in the limit of a very large forecast-outcome archive, each model-based probability density function can be weighted to form a "multimodel forecast" that will, in expectation, provide at least as much information as the most informative single model forecast system. If one of the forecast systems yields a probability distribution that reflects the distribution from which the outcome will be drawn, Bayesian model averaging will identify this forecast system as the preferred system in the limit as the number of forecast-outcome pairs goes to infinity. In many applications, like those of seasonal weather forecasting, data are precious; the archive is often limited to fewer than 26 entries. In addition, no perfect model is in hand. It is shown that in this case forming a single "multimodel probabilistic forecast" can be expected to provemisleading. These issues are investigated in the surrogatemodel (here a forecast system) regime,where using probabilistic forecasts of a simplemathematical systemallowsmany limiting behaviors of forecast systems to be quantified and compared with those undermore realistic conditions.
    Keywords: EP/K013661/1; EP/K03832X/1)
    JEL: C1
    Date: 2020–06–01
  16. By: Tommaso Proietti (CEIS & DEF, University of Rome "Tor Vergata"); Alessandro Giovannelli (DEF, University of Rome "Tor Vergata")
    Abstract: Gross domestic product (GDP) is the most comprehensive and authoritative measure of economic activity. The macroeconomic literature has focused on nowcasting and forecasting this measure at the monthly frequency, using related high frequency indicators. We address the issue of estimating monthly gross domestic product using a large dimensional set of monthly indicators, by pooling the disaggregate estimates arising from simple and feasible bivariate models that consider one indicator at a time in conjunction to GDP. Our base model handles mixed frequency data and ragged-edge data structure with any pattern of missingness. Our methodology enables to distill the common component of the available economic indicators, so that the monthly GDP estimates arise from the projection of the quarterly figures on the space spanned by the common component. The weights used for the combination reflect the ability to nowcast quarterly GDP and are obtained as a function of the regularized estimator of the high-dimensional covariance matrix of the nowcasting errors. A recursive nowcasting and forecasting experiment illustrates that the optimal weights adapt to the information set available in real time and vary according to the phase of the business cycle.
    Keywords: Mixed-Frequency Data, Dynamic Factor Models, State Space Models,Shrinkage
    JEL: C32 C52 C53 E37
    Date: 2020–05–12
  17. By: Xiaodong Wang; Fushing Hsieh
    Abstract: We extend the Hierarchical Factor Segmentation(HFS) algorithm for discovering multiple volatility states process hidden within each individual S&P500 stock's return time series. Then we develop an associative measure to link stocks into directed networks of various scales of associations. Such networks shed lights on which stocks would likely stimulate or even promote, if not cause, volatility on other linked stocks. Our computing endeavors starting from encoding events of large return on the original time axis to transform the original return time series into a recurrence-time process on discrete-time-axis. By adopting BIC and clustering analysis, we identify potential multiple volatility states, and then apply the extended HFS algorithm on the recurrence time series to discover its underlying volatility state process. Our decoding approach is found favorably compared with Viterbi's in experiments involving both light and heavy tail distributions. After recovering the volatility state process back to the original time-axis, we decode and represent stock dynamics of each stock. Our measurement of association is measured through overlapping concurrent volatility states upon a chosen window. Consequently, we establish data-driven associative networks for S&P500 stocks to discover their global dependency relational groupings with respect to various strengths of links.
    Date: 2021–01
  18. By: Tommaso Proietti (DEF and CEIS, Università di Roma "Tor Vergata")
    Abstract: Locating the running maxima and minima of a time series, and measuring the current deviation from them, generates processes that are analytically relevant for the analysis of the business cycle and for characterizing bull and bear phases in financial markets. The measurement of the time distance from the running peak originates a first order Markov chain, whose characteristics can be used for testing time reversibility of economic dynamics and specific types of asymmetries in financial markets. The paper derives the time series properties of the gap process and other related processes that arise from the same measurement context, and proposes new nonparametric tests of time reversibility. Empirical examples illustrate their uses for characterizing the depth of a recession and the duration of bull and a bear market.
    Keywords: Markov chains, Business cycles, Recession duration.
    JEL: C22 C58 E32
    Date: 2020–06–17
  19. By: Olivier Darné (LEMNA - Laboratoire d'économie et de management de Nantes Atlantique - IEMN-IAE Nantes - Institut d'Économie et de Management de Nantes - Institut d'Administration des Entreprises - Nantes - UN - Université de Nantes - IUML - FR 3473 Institut universitaire Mer et Littoral - UBS - Université de Bretagne Sud - UM - Le Mans Université - UA - Université d'Angers - CNRS - Centre National de la Recherche Scientifique - IFREMER - Institut Français de Recherche pour l'Exploitation de la Mer - UN - Université de Nantes - ECN - École Centrale de Nantes); Amélie Charles (Audencia Business School - Audencia Business School)
    Date: 2020–12–28
  20. By: Carol Alexander; Michael Coulon; Yang Han; Xiaochun Meng
    Abstract: Proper scoring rules are commonly applied to quantify the accuracy of distribution forecasts. Given an observation they assign a scalar score to each distribution forecast, with the the lowest expected score attributed to the true distribution. The energy and variogram scores are two rules that have recently gained some popularity in multivariate settings because their computation does not require a forecast to have parametric density function and so they are broadly applicable. Here we conduct a simulation study to compare the discrimination ability between the energy score and three variogram scores. Compared with other studies, our simulation design is more realistic because it is supported by a historical data set containing commodity prices, currencies and interest rates, and our data generating processes include a diverse selection of models with different marginal distributions, dependence structure, and calibration windows. This facilitates a comprehensive comparison of the performance of proper scoring rules in different settings. To compare the scores we use three metrics: the mean relative score, error rate and a generalised discrimination heuristic. Overall, we find that the variogram score with parameter p=0.5 outperforms the energy score and the other two variogram scores.
    Date: 2021–01

This nep-ecm issue is ©2021 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.