
on Econometrics 
By:  Demetrescu, Matei; Rodrigues, Paulo MM; Taylor, AM Robert 
Abstract:  We propose new tests for longhorizon predictability based on IVX estimation (see Kostakis et al., 2015) of transformed regressions. These explicitly account for the overlapping nature of the dependent variable which features in a longhorizon predictive regression arising from temporal aggregation. Because we use IVX estimation we can also incorporate the residual augmentation approach recently used in the context of shorthorizon predictability testing by Demetrescu and Rodrigues (2020) to improve efficiency. Our proposed tests have a number of advantages for practical use. First, they are simple to compute making them more appealing for empirical work than, in particular, the Bonferronibased methods developed in, among others, Valkanov (2003) and Hjalmarsson (2011), which require the computation of confidence intervals for the autoregressive parameter characterising the predictor. Second, unlike some of the available tests, they allow the practitioner to remain ambivalent as to whether the predictor is strongly or weakly persistent. Third, the tests are valid under considerably weaker assumptions on the innovations than extant longhorizon predictability tests. In particular, we allow for quite general forms of conditional and unconditional heteroskedasticity in the innovations, neither of which are tied to a parametric model. Fourth, our proposed tests can be easily implemented as either one or twosided hypotheses tests, unlike the Bonferronibased methods which require the computation of different confidence intervals for the autoregressive parameter depending on whether left or right tailed tests are to be conducted (see Hjalmarsson, 2011). Finally our approach is straightforwardly generalisable to a multipredictor context. Monte Carlo analysis suggests that our preferred test displays improved finite properties compared to the leading tests available in the literature. We also report an empirical application of the methods we develop to investigate the potential predictive power of real exchange rates for predicting nominal exchange rates and inflation. 
Keywords:  longhorizon predictive regression; IVX estimation; (un)conditional heteroskedasticity; unknown regressor persistence; endogeneity; residual augmentation 
Date:  2021–06–18 
URL:  http://d.repec.org/n?u=RePEc:esy:uefcwp:30620&r= 
By:  Dong Hwan Oh; Andrew J. Patton 
Abstract:  This paper proposes a dynamic multifactor copula for use in high dimensional time series applications. A novel feature of our model is that the assignment of individual variables to groups is estimated from the data, rather than being preassigned using SIC industry codes, market capitalization ranks, or other ad hoc methods. We adapt the kmeans clustering algorithm for use in our application and show that it has excellent finitesample properties. Applying the new model to returns on 110 US equities, we find around 20 clusters to be optimal. In outofsample forecasts, we find that a model with as few as five estimated clusters significantly outperforms an otherwise identical model with 21 clusters formed using twodigit SIC codes. 
Keywords:  Correlation; Tail risk; Multivariate density forecast 
JEL:  C32 C58 C38 
Date:  2021–04–30 
URL:  http://d.repec.org/n?u=RePEc:fip:fedgfe:202129&r= 
By:  Cristian Roner (Free University of BozenBolzano, Italy); Claudia Di Caterina (Free University of BozenBolzano, Italy); Davide Ferrari (Free University of BozenBolzano, Italy) 
Abstract:  Nonnegative ordered survey data often exhibit an unusually high frequency of zeros in the first interval. Zeroinflated ordered probit models handle the excess of zeros by combining a split probit model and an ordered probit model. In the presence of data violating distributional assumptions, standard inference based on the maximum likelihood method gives biased estimates with large standard errors. In this paper, we consider robust inference for the zeroinflated ordered probit model based on the exponential tilting methodology. Exponential tilting selects unequal weights for the observations in such a way as to maximise the likelihood function subject to moving a given distance from equally weighted scores. As a result, observations that are incompatible with the assumed zeroinflated distribution receive a relatively small weight. Our methodology is motivated by the analysis of survey data on cyber security breaches to study the relationship between investments in cyber defences and costs from cyber breaches. Robust estimates obtained via tilting clearly show an e ect of the investments in reducing the amount of the loss from a cyber breach. 
Keywords:  Zeroinflation; Exponential tilting; Interval regression; Cyber security; Survey data. 
JEL:  C1 C13 C83 D24 D25 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:bzn:wpaper:bemps85&r= 
By:  David T. Frazier; Ruben LoaizaMaya; Gael M. Martin 
Abstract:  Using theoretical and numerical results, we document the accuracy of commonly applied variational Bayes methods across a broad range of state space models. The results demonstrate that, in terms of accuracy on fixed parameters, there is a clear hierarchy in terms of the methods, with approaches that do not approximate the states yielding superior accuracy over methods that do. We also document numerically that the inferential discrepancies between the various methods often yield only small discrepancies in predictive accuracy over small outofsample evaluation periods. Nevertheless, in certain settings, these predictive discrepancies can become marked over longer outofsample periods. This finding indicates that the invariance of predictive results to inferential inaccuracy, which has been an ofttouted point made by practitioners seeking to justify the use of variational inference, is not ubiquitous and must be assessed on a casebycase basis. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.12262&r= 
By:  Anand Deo; Karthyek Murthy 
Abstract:  This paper considers Importance Sampling (IS) for the estimation of tail risks of a loss defined in terms of a sophisticated object such as a machine learning feature map or a mixed integer linear optimisation formulation. Assuming only blackbox access to the loss and the distribution of the underlying random vector, the paper presents an efficient IS algorithm for estimating the Value at Risk and Conditional Value at Risk. The key challenge in any IS procedure, namely, identifying an appropriate changeofmeasure, is automated with a selfstructuring IS transformation that learns and replicates the concentration properties of the conditional excess from less rare samples. The resulting estimators enjoy asymptotically optimal variance reduction when viewed in the logarithmic scale. Simulation experiments highlight the efficacy and practicality of the proposed scheme 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.10236&r= 
By:  Claudia Noack 
Abstract:  In this paper, we develop a method to assess the sensitivity of local average treatment effect estimates to potential violations of the monotonicity assumption of Imbens and Angrist (1994). We parameterize the degree to which monotonicity is violated using two sensitivity parameters: the first one determines the share of defiers in the population, and the second one measures differences in the distributions of outcomes between compliers and defiers. For each pair of values of these sensitivity parameters, we derive sharp bounds on the outcome distributions of compliers in the firstorder stochastic dominance sense. We identify the robust region that is the set of all values of sensitivity parameters for which a given empirical conclusion, e.g. that the local average treatment effect is positive, is valid. Researchers can assess the credibility of their conclusion by evaluating whether all the plausible sensitivity parameters lie in the robust region. We obtain confidence sets for the robust region through a bootstrap procedure and illustrate the sensitivity analysis in an empirical application. We also extend this framework to analyze treatment effects of the entire population. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.06421&r= 
By:  Luca Merlo; Lea Petrella; Valentina Raponi 
Abstract:  In this paper we propose a multivariate quantile regression framework to forecast Value at Risk (VaR) and Expected Shortfall (ES) of multiple financial assets simultaneously, extending Taylor (2019). We generalize the Multivariate Asymmetric Laplace (MAL) joint quantile regression of Petrella and Raponi (2019) to a timevarying setting, which allows us to specify a dynamic process for the evolution of both VaR and ES of each asset. The proposed methodology accounts for the dependence structure among asset returns. By exploiting the properties of the MAL distribution, we then propose a new portfolio optimization method that minimizes the portfolio risk and controls for wellknown characteristics of financial data. We evaluate the advantages of the proposed approach on both simulated and real data, using weekly returns on three major stock market indices. We show that our method outperforms other existing models and provides more accurate risk measure forecasts compared to univariate ones. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.06518&r= 
By:  Andersson, Angelica (Research Programme in Transport Economics); Engelson, Leonid (Research Programme in Transport Economics); Börjesson, Maria (Research Programme in Transport Economics); Daly, Andrew (Research Programme in Transport Economics); Kristoffersson, Ida (Research Programme in Transport Economics) 
Abstract:  In this paper we develop two methods for the use of mobile phone data to support the estimation of longdistance mode choice models. Both methods are based on logit formulations in which we define likelihood functions and use maximum likelihood estimation. Mobile phone data consists of information about a sequence of antennae that have detected each phone, so the mode choice is not actually observed. In the first tripbased method, the mode of each trip is inferred by a separate procedure, and the estimation process is then straightforward. However, since it is usually not possible to determine the mode choice with certainty, this method might give biased results. In our second antennabased method we therefore base the likelihood function on the sequences of antennae that have detected the phones. The estimation aims at finding a parameter vector in the mode choice model that would explain the observed sequences best. The main challenge with the antennabased method is the need for detailed resolution of the available data, i.e., that the mobile phone operator might not be willing or able to provide the modeller with sequences of antennae that have detected the phones. In this paper we show the derivation of the two methods, that they coincide in case of certainty about the chosen mode and discuss the validity of assumptions and their advantages and disadvantages. Furthermore, we apply the first tripbased method to empirical data and compare the results of two different ways of implementing it. 
Keywords:  Demand model; Mode choice; Mobile phone network data; Travel behaviour; Longdistance travel 
JEL:  C18 C35 R41 R42 
Date:  2021–06–16 
URL:  http://d.repec.org/n?u=RePEc:hhs:trnspr:2021_001&r= 
By:  Anders Bredahl Kock; David Preinerstorfer 
Abstract:  To assess whether there is some signal in a big database, aggregate tests for the global null hypothesis of no effect are routinely applied in practice before more specialized analysis is carried out. Although a plethora of aggregate tests is available, each test has its strengths but also its blind spots. In a Gaussian sequence model, we study whether it is possible to obtain a test with substantially better consistency properties than the likelihood ratio (i.e., Euclidean norm based) test. We establish an impossibility result, showing that in the highdimensional framework we consider, the set of alternatives for which a test may improve upon the likelihood ratio test  that is, its superconsistency points  is always asymptotically negligible in a relative volume sense. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.03700&r= 
By:  Toru Kitagawa; Shosei Sakaguchi; Aleksey Tetenov 
Abstract:  Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (costsensitive) classification problem. Consistency of the surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006) crucially relies on the assumption of correct specification, meaning that the specified set of classifiers is rich enough to contain a firstbest classifier. This assumption is, however, less credible when the set of classifiers is constrained by interpretability or fairness, leaving the applicability of surrogate loss based algorithms unknown in such secondbest scenarios. This paper studies consistency of surrogate loss procedures under a constrained set of classifiers without assuming correct specification. We show that in the setting where the constraint restricts the classifier's prediction set only, hinge losses (i.e., $\ell_1$support vector machines) are the only surrogate losses that preserve consistency in secondbest scenarios. If the constraint additionally restricts the functional form of the classifier, consistency of a surrogate loss approach is not guaranteed even with hinge loss. We therefore characterize conditions for the constrained set of classifiers that can guarantee consistency of hinge risk minimizing classifiers. Exploiting our theoretical results, we develop robust and computationally attractive hinge loss based procedures for a monotone classification problem. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.12886&r= 
By:  Falk, Carl F.; Muthukrishna, Michael 
Abstract:  Theories can be represented as statistical models for empirical testing. There is a vast literature on model selection and multimodel inference that focuses on how to assess which statistical model, and therefore which theory, best fits the available data. For example, given some data, one can compare models on various information criterion or other fit statistics. However, what these indices fail to capture is the full range of counterfactuals. That is, some models may fit the given data better not because they represent a more correct theory, but simply because these models have more fit propensity  a tendency to fit a wider range of data, even nonsensical data, better. Current approaches fall short in considering the principle of parsimony (Occam’s Razor), often equating it with the number of model parameters. Here we offer a toolkit for researchers to better study and understand parsimony through the fit propensity of Structural Equation Models. We provide an R package (ockhamSEM) built on the popular lavaan package. To illustrate the importance of evaluating fit propensity, we use ockhamSEM to investigate the factor structure of the Rosenberg SelfEsteem Scale. 
Keywords:  fit indices; parsimony; model fit; structural equation modeling; formal theory; SEM 
JEL:  C1 
Date:  2021–06–02 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:110856&r= 
By:  Abdulnasser HatemiJ 
Abstract:  Testing for causation, defined as the preceding impact of the past values of one variable on the current value of another one when all other pertinent information is accounted for, is increasingly utilized in empirical research of the timeseries data in different scientific disciplines. A relatively recent extension of this approach has been allowing for potential asymmetric impacts since it is harmonious with the way reality operates in many cases according to HatemiJ (2012). The current paper maintains that it is also important to account for the potential change in the parameters when asymmetric causation tests are conducted, as there exists a number of reasons for changing the potential causal connection between variables across time. The current paper extends therefore the static asymmetric causality tests by making them dynamic via the usage of subsamples. An application is also provided consistent with measurable definitions of economic or financial bad as well as good news and their potential interaction across time. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.07612&r= 
By:  Zengjing Chen; Larry G. Epstein; Guodong Zhang 
Abstract:  This paper establishes a central limit theorem under the assumption that conditional variances can vary in a largely unstructured historydependent way across experiments subject only to the restriction that they lie in a fixed interval. Limits take a novel and tractable form, and are expressed in terms of oscillating Brownian motion. A second contribution is application of this result to a class of multiarmed bandit problems where the decisionmaker is loss averse. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.05472&r= 
By:  Florian Eckerli; Joerg Osterrieder 
Abstract:  Modelling in finance is a challenging task: the data often has complex statistical properties and its inner workings are largely unknown. Deep learning algorithms are making progress in the field of datadriven modelling, but the lack of sufficient data to train these models is currently holding back several new applications. Generative Adversarial Networks (GANs) are a neural network architecture family that has achieved good results in image generation and is being successfully applied to generate time series and other types of financial data. The purpose of this study is to present an overview of how these GANs work, their capabilities and limitations in the current state of research with financial data, and present some practical applications in the industry. As a proof of concept, three known GAN architectures were tested on financial time series, and the generated data was evaluated on its statistical properties, yielding solid results. Finally, it was shown that GANs have made considerable progress in their finance applications and can be a solid additional tool for data scientists in this field. 
Date:  2021–06 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2106.06364&r= 