nep-ecm New Economics Papers
on Econometrics
Issue of 2021‒06‒28
fourteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Transformed Regression-based Long-Horizon Predictability Tests By Demetrescu, Matei; Rodrigues, Paulo MM; Taylor, AM Robert
  2. Dynamic Factor Copula Models with Estimated Cluster Assignments By Dong Hwan Oh; Andrew J. Patton
  3. Exponential Tilting for Zero-inflated Interval Regression with Applications to Cyber Security Survey Data By Cristian Roner; Claudia Di Caterina; Davide Ferrari
  4. A Note on the Accuracy of Variational Bayes in State Space Models: Inference and Prediction By David T. Frazier; Ruben Loaiza-Maya; Gael M. Martin
  5. Efficient Black-Box Importance Sampling for VaR and CVaR Estimation By Anand Deo; Karthyek Murthy
  6. Sensitivity of LATE Estimates to Violations of the Monotonicity Assumption By Claudia Noack
  7. Forecasting VaR and ES using a joint quantile regression and implications in portfolio allocation By Luca Merlo; Lea Petrella; Valentina Raponi
  8. Long-distance mode choice model estimation using mobile phone network data By Andersson, Angelica; Engelson, Leonid; Börjesson, Maria; Daly, Andrew; Kristoffersson, Ida
  9. Superconsistency of tests in high dimensions By Anders Bredahl Kock; David Preinerstorfer
  10. Constrained Classification and Policy Learning By Toru Kitagawa; Shosei Sakaguchi; Aleksey Tetenov
  11. Parsimony in model selection: tools for assessing fit propensity By Falk, Carl F.; Muthukrishna, Michael
  12. Dynamic Asymmetric Causality Tests with an Application By Abdulnasser Hatemi-J
  13. A Central Limit Theorem, Loss Aversion and Multi-Armed Bandits By Zengjing Chen; Larry G. Epstein; Guodong Zhang
  14. Generative Adversarial Networks in finance: an overview By Florian Eckerli; Joerg Osterrieder

  1. By: Demetrescu, Matei; Rodrigues, Paulo MM; Taylor, AM Robert
    Abstract: We propose new tests for long-horizon predictability based on IVX estimation (see Kostakis et al., 2015) of transformed regressions. These explicitly account for the over-lapping nature of the dependent variable which features in a long-horizon predictive regression arising from temporal aggregation. Because we use IVX estimation we can also incorporate the residual augmentation approach recently used in the context of short-horizon predictability testing by Demetrescu and Rodrigues (2020) to improve efficiency. Our proposed tests have a number of advantages for practical use. First, they are simple to compute making them more appealing for empirical work than, in particular, the Bonferroni-based methods developed in, among others, Valkanov (2003) and Hjalmarsson (2011), which require the computation of confidence intervals for the autoregressive parameter characterising the predictor. Second, unlike some of the available tests, they allow the practitioner to remain ambivalent as to whether the predictor is strongly or weakly persistent. Third, the tests are valid under considerably weaker assumptions on the innovations than extant long-horizon predictability tests. In particular, we allow for quite general forms of conditional and unconditional heteroskedasticity in the innovations, neither of which are tied to a parametric model. Fourth, our proposed tests can be easily implemented as either one or two-sided hypotheses tests, unlike the Bonferroni-based methods which require the computation of different confidence intervals for the autoregressive parameter depending on whether left or right tailed tests are to be conducted (see Hjalmarsson, 2011). Finally our approach is straightforwardly generalisable to a multi-predictor context. Monte Carlo analysis suggests that our preferred test displays improved finite properties compared to the leading tests available in the literature. We also report an empirical application of the methods we develop to investigate the potential predictive power of real exchange rates for predicting nominal exchange rates and inflation.
    Keywords: long-horizon predictive regression; IVX estimation; (un)conditional heteroskedasticity; unknown regressor persistence; endogeneity; residual augmentation
    Date: 2021–06–18
  2. By: Dong Hwan Oh; Andrew J. Patton
    Abstract: This paper proposes a dynamic multi-factor copula for use in high dimensional time series applications. A novel feature of our model is that the assignment of individual variables to groups is estimated from the data, rather than being pre-assigned using SIC industry codes, market capitalization ranks, or other ad hoc methods. We adapt the k-means clustering algorithm for use in our application and show that it has excellent finite-sample properties. Applying the new model to returns on 110 US equities, we find around 20 clusters to be optimal. In out-of-sample forecasts, we find that a model with as few as five estimated clusters significantly outperforms an otherwise identical model with 21 clusters formed using two-digit SIC codes.
    Keywords: Correlation; Tail risk; Multivariate density forecast
    JEL: C32 C58 C38
    Date: 2021–04–30
  3. By: Cristian Roner (Free University of Bozen-Bolzano, Italy); Claudia Di Caterina (Free University of Bozen-Bolzano, Italy); Davide Ferrari (Free University of Bozen-Bolzano, Italy)
    Abstract: Non-negative ordered survey data often exhibit an unusually high frequency of zeros in the first interval. Zero-inflated ordered probit models handle the excess of zeros by combining a split probit model and an ordered probit model. In the presence of data violating distributional assumptions, standard inference based on the maximum likelihood method gives biased estimates with large standard errors. In this paper, we consider robust inference for the zero-inflated ordered probit model based on the exponential tilting methodology. Exponential tilting selects unequal weights for the observations in such a way as to maximise the likelihood function subject to moving a given distance from equally weighted scores. As a result, observations that are incompatible with the assumed zero-inflated distribution receive a relatively small weight. Our methodology is motivated by the analysis of survey data on cyber security breaches to study the relationship between investments in cyber defences and costs from cyber breaches. Robust estimates obtained via tilting clearly show an e ect of the investments in reducing the amount of the loss from a cyber breach.
    Keywords: Zero-inflation; Exponential tilting; Interval regression; Cyber security; Survey data.
    JEL: C1 C13 C83 D24 D25
    Date: 2021–06
  4. By: David T. Frazier; Ruben Loaiza-Maya; Gael M. Martin
    Abstract: Using theoretical and numerical results, we document the accuracy of commonly applied variational Bayes methods across a broad range of state space models. The results demonstrate that, in terms of accuracy on fixed parameters, there is a clear hierarchy in terms of the methods, with approaches that do not approximate the states yielding superior accuracy over methods that do. We also document numerically that the inferential discrepancies between the various methods often yield only small discrepancies in predictive accuracy over small out-of-sample evaluation periods. Nevertheless, in certain settings, these predictive discrepancies can become marked over longer out-of-sample periods. This finding indicates that the invariance of predictive results to inferential inaccuracy, which has been an oft-touted point made by practitioners seeking to justify the use of variational inference, is not ubiquitous and must be assessed on a case-by-case basis.
    Date: 2021–06
  5. By: Anand Deo; Karthyek Murthy
    Abstract: This paper considers Importance Sampling (IS) for the estimation of tail risks of a loss defined in terms of a sophisticated object such as a machine learning feature map or a mixed integer linear optimisation formulation. Assuming only black-box access to the loss and the distribution of the underlying random vector, the paper presents an efficient IS algorithm for estimating the Value at Risk and Conditional Value at Risk. The key challenge in any IS procedure, namely, identifying an appropriate change-of-measure, is automated with a self-structuring IS transformation that learns and replicates the concentration properties of the conditional excess from less rare samples. The resulting estimators enjoy asymptotically optimal variance reduction when viewed in the logarithmic scale. Simulation experiments highlight the efficacy and practicality of the proposed scheme
    Date: 2021–06
  6. By: Claudia Noack
    Abstract: In this paper, we develop a method to assess the sensitivity of local average treatment effect estimates to potential violations of the monotonicity assumption of Imbens and Angrist (1994). We parameterize the degree to which monotonicity is violated using two sensitivity parameters: the first one determines the share of defiers in the population, and the second one measures differences in the distributions of outcomes between compliers and defiers. For each pair of values of these sensitivity parameters, we derive sharp bounds on the outcome distributions of compliers in the first-order stochastic dominance sense. We identify the robust region that is the set of all values of sensitivity parameters for which a given empirical conclusion, e.g. that the local average treatment effect is positive, is valid. Researchers can assess the credibility of their conclusion by evaluating whether all the plausible sensitivity parameters lie in the robust region. We obtain confidence sets for the robust region through a bootstrap procedure and illustrate the sensitivity analysis in an empirical application. We also extend this framework to analyze treatment effects of the entire population.
    Date: 2021–06
  7. By: Luca Merlo; Lea Petrella; Valentina Raponi
    Abstract: In this paper we propose a multivariate quantile regression framework to forecast Value at Risk (VaR) and Expected Shortfall (ES) of multiple financial assets simultaneously, extending Taylor (2019). We generalize the Multivariate Asymmetric Laplace (MAL) joint quantile regression of Petrella and Raponi (2019) to a time-varying setting, which allows us to specify a dynamic process for the evolution of both VaR and ES of each asset. The proposed methodology accounts for the dependence structure among asset returns. By exploiting the properties of the MAL distribution, we then propose a new portfolio optimization method that minimizes the portfolio risk and controls for well-known characteristics of financial data. We evaluate the advantages of the proposed approach on both simulated and real data, using weekly returns on three major stock market indices. We show that our method outperforms other existing models and provides more accurate risk measure forecasts compared to univariate ones.
    Date: 2021–06
  8. By: Andersson, Angelica (Research Programme in Transport Economics); Engelson, Leonid (Research Programme in Transport Economics); Börjesson, Maria (Research Programme in Transport Economics); Daly, Andrew (Research Programme in Transport Economics); Kristoffersson, Ida (Research Programme in Transport Economics)
    Abstract: In this paper we develop two methods for the use of mobile phone data to support the estimation of long-distance mode choice models. Both methods are based on logit formulations in which we define likelihood functions and use maximum likelihood estimation. Mobile phone data consists of information about a sequence of antennae that have detected each phone, so the mode choice is not actually observed. In the first trip-based method, the mode of each trip is inferred by a separate procedure, and the estimation process is then straightforward. However, since it is usually not possible to determine the mode choice with certainty, this method might give biased results. In our second antenna-based method we therefore base the likelihood function on the sequences of antennae that have detected the phones. The estimation aims at finding a parameter vector in the mode choice model that would explain the observed sequences best. The main challenge with the antenna-based method is the need for detailed resolution of the available data, i.e., that the mobile phone operator might not be willing or able to provide the modeller with sequences of antennae that have detected the phones. In this paper we show the derivation of the two methods, that they coincide in case of certainty about the chosen mode and discuss the validity of assumptions and their advantages and disadvantages. Furthermore, we apply the first trip-based method to empirical data and compare the results of two different ways of implementing it.
    Keywords: Demand model; Mode choice; Mobile phone network data; Travel behaviour; Long-distance travel
    JEL: C18 C35 R41 R42
    Date: 2021–06–16
  9. By: Anders Bredahl Kock; David Preinerstorfer
    Abstract: To assess whether there is some signal in a big database, aggregate tests for the global null hypothesis of no effect are routinely applied in practice before more specialized analysis is carried out. Although a plethora of aggregate tests is available, each test has its strengths but also its blind spots. In a Gaussian sequence model, we study whether it is possible to obtain a test with substantially better consistency properties than the likelihood ratio (i.e., Euclidean norm based) test. We establish an impossibility result, showing that in the high-dimensional framework we consider, the set of alternatives for which a test may improve upon the likelihood ratio test -- that is, its superconsistency points -- is always asymptotically negligible in a relative volume sense.
    Date: 2021–06
  10. By: Toru Kitagawa; Shosei Sakaguchi; Aleksey Tetenov
    Abstract: Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (cost-sensitive) classification problem. Consistency of the surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006) crucially relies on the assumption of correct specification, meaning that the specified set of classifiers is rich enough to contain a first-best classifier. This assumption is, however, less credible when the set of classifiers is constrained by interpretability or fairness, leaving the applicability of surrogate loss based algorithms unknown in such second-best scenarios. This paper studies consistency of surrogate loss procedures under a constrained set of classifiers without assuming correct specification. We show that in the setting where the constraint restricts the classifier's prediction set only, hinge losses (i.e., $\ell_1$-support vector machines) are the only surrogate losses that preserve consistency in second-best scenarios. If the constraint additionally restricts the functional form of the classifier, consistency of a surrogate loss approach is not guaranteed even with hinge loss. We therefore characterize conditions for the constrained set of classifiers that can guarantee consistency of hinge risk minimizing classifiers. Exploiting our theoretical results, we develop robust and computationally attractive hinge loss based procedures for a monotone classification problem.
    Date: 2021–06
  11. By: Falk, Carl F.; Muthukrishna, Michael
    Abstract: Theories can be represented as statistical models for empirical testing. There is a vast literature on model selection and multimodel inference that focuses on how to assess which statistical model, and therefore which theory, best fits the available data. For example, given some data, one can compare models on various information criterion or other fit statistics. However, what these indices fail to capture is the full range of counterfactuals. That is, some models may fit the given data better not because they represent a more correct theory, but simply because these models have more fit propensity - a tendency to fit a wider range of data, even nonsensical data, better. Current approaches fall short in considering the principle of parsimony (Occam’s Razor), often equating it with the number of model parameters. Here we offer a toolkit for researchers to better study and understand parsimony through the fit propensity of Structural Equation Models. We provide an R package (ockhamSEM) built on the popular lavaan package. To illustrate the importance of evaluating fit propensity, we use ockhamSEM to investigate the factor structure of the Rosenberg Self-Esteem Scale.
    Keywords: fit indices; parsimony; model fit; structural equation modeling; formal theory; SEM
    JEL: C1
    Date: 2021–06–02
  12. By: Abdulnasser Hatemi-J
    Abstract: Testing for causation, defined as the preceding impact of the past values of one variable on the current value of another one when all other pertinent information is accounted for, is increasingly utilized in empirical research of the time-series data in different scientific disciplines. A relatively recent extension of this approach has been allowing for potential asymmetric impacts since it is harmonious with the way reality operates in many cases according to Hatemi-J (2012). The current paper maintains that it is also important to account for the potential change in the parameters when asymmetric causation tests are conducted, as there exists a number of reasons for changing the potential causal connection between variables across time. The current paper extends therefore the static asymmetric causality tests by making them dynamic via the usage of subsamples. An application is also provided consistent with measurable definitions of economic or financial bad as well as good news and their potential interaction across time.
    Date: 2021–06
  13. By: Zengjing Chen; Larry G. Epstein; Guodong Zhang
    Abstract: This paper establishes a central limit theorem under the assumption that conditional variances can vary in a largely unstructured history-dependent way across experiments subject only to the restriction that they lie in a fixed interval. Limits take a novel and tractable form, and are expressed in terms of oscillating Brownian motion. A second contribution is application of this result to a class of multi-armed bandit problems where the decision-maker is loss averse.
    Date: 2021–06
  14. By: Florian Eckerli; Joerg Osterrieder
    Abstract: Modelling in finance is a challenging task: the data often has complex statistical properties and its inner workings are largely unknown. Deep learning algorithms are making progress in the field of data-driven modelling, but the lack of sufficient data to train these models is currently holding back several new applications. Generative Adversarial Networks (GANs) are a neural network architecture family that has achieved good results in image generation and is being successfully applied to generate time series and other types of financial data. The purpose of this study is to present an overview of how these GANs work, their capabilities and limitations in the current state of research with financial data, and present some practical applications in the industry. As a proof of concept, three known GAN architectures were tested on financial time series, and the generated data was evaluated on its statistical properties, yielding solid results. Finally, it was shown that GANs have made considerable progress in their finance applications and can be a solid additional tool for data scientists in this field.
    Date: 2021–06

This nep-ecm issue is ©2021 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.