
on Econometrics 
By:  Zhaoxing Gao; Ruey S. Tsay 
Abstract:  This paper proposes a hierarchical approximatefactor approach to analyzing highdimensional, largescale heterogeneous time series data using distributed computing. The new method employs a multiplefold dimension reduction procedure using Principal Component Analysis (PCA) and shows great promises for modeling largescale data that cannot be stored nor analyzed by a single machine. Each computer at the basic level performs a PCA to extract common factors among the time series assigned to it and transfers those factors to one and only one node of the second level. Each 2ndlevel computer collects the common factors from its subordinates and performs another PCA to select the 2ndlevel common factors. This process is repeated until the central server is reached, which collects common factors from its direct subordinates and performs a final PCA to select the global common factors. The noise terms of the 2ndlevel approximate factor model are the unique common factors of the 1stlevel clusters. We focus on the case of 2 levels in our theoretical derivations, but the idea can easily be generalized to any finite number of hierarchies. We discuss some clustering methods when the group memberships are unknown and introduce a new diffusion index approach to forecasting. We further extend the analysis to unitroot nonstationary time series. Asymptotic properties of the proposed method are derived for the diverging dimension of the data in each computing unit and the sample size $T$. We use both simulated data and real examples to assess the performance of the proposed method in finite samples, and compare our method with the commonly used ones in the literature concerning the forecastability of extracted factors. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.14626&r=all 
By:  Greta Goracci; Simone Giannerini; KungSik Chan; Howell Tong 
Abstract:  We present supremum Lagrange Multiplier tests to compare a linear ARMA specification against its threshold ARMA extension. We derive the asymptotic distribution of the test statistics both under the null hypothesis and contiguous local alternatives. Moreover, we prove the consistency of the tests. The Monte Carlo study shows that the tests enjoy good finitesample properties, are robust against model misspecification and their performance is not affected if the order of the model is unknown. The tests present a low computational burden and do not suffer from some of the drawbacks that affect the quasilikelihood ratio setting. Lastly, we apply our tests to a time series of standardized treering growth indexes and this can lead to new research in climate studies. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.13977&r=all 
By:  Torben G. Andersen; Rasmus T. Varneskov 
Abstract:  This paper develops parameter instability and structural change tests within predictive regressions for economic systems governed by persistent vector autoregressive dynamics. Specifically, in a setting where all – or a subset – of the variables may be fractionally integrated and the predictive relation may feature cointegration, we provide supWald break tests that are constructed using the Local speCtruM (LCM) approach. The new tests cover both parameter variation and multiple structural changes with unknown break dates, and the number of breaks being known or unknown. We establish asymptotic limit theory for the tests, showing that it coincides with standard testing procedures. As a consequence, existing critical values for tieddown Bessel processes may be applied, without modification. We implement the new structural change tests to explore the stability of the fractionally cointegrating relation between implied and realized volatility (IV and RV). Moreover, we assess the relative efficiency of IV forecasts against a challenging timeseries benchmark constructed from highfrequency data. Unlike existing studies, we find evidence that the IVRV cointegrating relation is unstable, and that carefully constructed timeseries forecasts are more efficient than IV in capturing lowfrequency movements in RV. 
JEL:  G12 G17 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:28570&r=all 
By:  Torben G. Andersen; Rasmus T. Varneskov 
Abstract:  We study standard predictive regressions in economic systems governed by persistent vector autoregressive dynamics for the state variables. In particular, all – or a subset – of the variables may be fractionally integrated, which induces a spurious regression problem. We propose a new inference and testing procedure – the Local speCtruM (LCM) approach – for joint significance of the regressors, that is robust against the variables having different integration orders and remains valid regardless of whether predictors are significant and if they induce cointegration. Specifically, the LCM procedure is based on fractional filtering and bandspectrum regression using a suitably selected set of frequency ordinates. Contrary to existing procedures, we establish a uniform Gaussian limit theory and a standard χ2distributed test statistic. Using LCM inference and testing techniques, we explore predictive regressions for the realized return variation. Standard least squares inference indicates that popular financial and macroeconomic variables convey valuable information about future return volatility. In contrast, we find no significant evidence using our robust LCM procedure. If anything, our tests support a reverse chain of causality: rising financial volatility predates adverse innovations to macroeconomic variables. Simulations illustrate the relevance of the theoretical arguments for finitesample inference. 
JEL:  G12 G17 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:28568&r=all 
By:  Javier Alejo; Antonio F. Galvao; Gabriel MontesRojas 
Abstract:  This paper develops inference procedures to evaluate the validity of instruments in instrumental variables (IV) quantile regression (QR) models. We first derive a firststage regression for the IVQR model, analogue to the least squares case, which is a weighted leastsquares regression. The weights are given by the density function of the conditional distribution of the innovation term in the QR structural model, conditional on the exogenous covariates and the nstruments. The firststage regression is a natural framework to evaluate the instruments since we can test for their statistical significance. In the QR case, the instruments could be relevant at some quantiles but not for others or at the mean. Monte Carlo finite sample experiments show that the tests work as expected in terms of empirical size and power. Two applications illustrate that checking for the statistical significance of the instruments at di↵erent quantiles is important. 
Keywords:  quantile regression, instrumental variables, firststage 
JEL:  C13 C14 C21 C51 C53 
Date:  2020–11 
URL:  http://d.repec.org/n?u=RePEc:aep:anales:4304&r=all 
By:  Patrik Guggenberger; Frank Kleibergen; Sophocles Mavroeidis 
Abstract:  We introduce a new test for a twosided hypothesis involving a subset of the structural parameter vector in the linear instrumental variables (IVs) model. Guggenberger et al. (2019), GKM19 from now on, introduce a subvector AndersonRubin (AR) test with datadependent critical values that has asymptotic size equal to nominal size for a parameter space that allows for arbitrary strength or weakness of the IVs and has uniformly nonsmaller power than the projected AR test studied in Guggenberger et al. (2012). However, GKM19 imposes the restrictive assumption of conditional homoskedasticity. The main contribution here is to robustify the procedure in GKM19 to arbitrary forms of conditional heteroskedasticity. We first adapt the method in GKM19 to a setup where a certain covariance matrix has an approximate Kronecker product (AKP) structure which nests conditional homoskedasticity. The new test equals this adaption when the data is consistent with AKP structure as decided by a model selection procedure. Otherwise the test equals the AR/AR test in Andrews (2017) that is fully robust to conditional heteroskedasticity but less powerful than the adapted method. We show theoretically that the new test has asymptotic size bounded by the nominal size and document improved power relative to the AR/AR test in a wide array of Monte Carlo simulations when the covariance matrix is not too far from AKP. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.11371&r=all 
By:  Pang Du; Christopher F. Parmeter; Jeffrey S. Racine 
Abstract:  We consider shape constrained kernelbased probability density function (PDF) and probability mass function (PMF) estimation. Our approach is of widespread potential applicability and includes, separately or simultaneously, constraints on the PDF (PMF) function itself, its integral (sum), and derivatives (finitedifferences) of any order. We also allow for pointwise upper and lower bounds (i.e., inequality constraints) on the PDF and PMF in addition to more popular equality constraints, and the approach handles a range of transformations of the PDF and PMF including, for example, logarithmic transformations (which allows for the imposition of logconcave or logconvex constraints that are popular with practitioners). Theoretical underpinnings for the procedures are provided. A simulationbased comparison of our proposed approach with those obtained using Grenandertype methods is favourable to our approach when the DGP is itself smooth. As far as we know, ours is also the only smooth framework that handles PDFs and PMFs in the presence of inequality bounds, equality constraints, and other popular constraints such as those mentioned above. An implementation in R exists that incorporates constraints such as monotonicity (both increasing and decreasing), convexity and concavity, and logconvexity and logconcavity, among others, while respecting finitesupport boundaries via explicit use of boundary kernel functions. 
Keywords:  nonparametric; density; restricted estimation 
JEL:  C14 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:mcm:deptwp:202105&r=all 
By:  del Barrio Castro, Tomás 
Abstract:  Cointegration between Periodically Integrated (PI) processes has been analyzed among other by Birchen hall, BladenHovell, Chui, Osborn, and Smith (1989), Boswijk and Franses (1995), Franses and Paap (2004), Kleibergen and Franses (1999) and del Barrio Castro and Osborn (2008). However, so far there is not a method, published in an academic journal, that allows us to determine the cointegration rank between PI processes. This paper fills the gap, a method to determine the cointegration rank between a set PI Processes based on the idea of pseudodemodulation is proposed in the context of Seasonal Cointegration by del Barrio Castro, Cubadda and Osborn (2020). Once a pseudodemodulation time series is obtained the Johansen (1995) procedure could be applied to determine the cointegration rank. A Monte Carlo experiment shows that the proposed approach works satisfactorily for small samples. 
Keywords:  Reduced Rank Regression,Periodic Cointegration, Periodically Integrated Processes. 
JEL:  C32 
Date:  2021 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:106603&r=all 
By:  Tengyuan Liang 
Abstract:  We propose a computationally efficient method to construct nonparametric, heteroskedastic prediction bands for uncertainty quantification, with or without any userspecified predictive model. The dataadaptive prediction band is universally applicable with minimal distributional assumptions, with strong nonasymptotic coverage properties, and easy to implement using standard convex programs. Our approach can be viewed as a novel variance interpolation with confidence and further leverages techniques from semidefinite programming and sumofsquares optimization. Theoretical and numerical performances for the proposed approach for uncertainty quantification are analyzed. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.17203&r=all 
By:  Dake Li; Mikkel PlagborgM{\o}ller; Christian K. Wolf 
Abstract:  We conduct a simulation study of Local Projection (LP) and Vector Autoregression (VAR) estimators of structural impulse responses across thousands of data generating processes (DGPs), designed to mimic the properties of the universe of U.S. macroeconomic data. Our analysis considers various structural identification schemes and several variants of LP and VAR estimators, and we pay particular attention to the role of the researcher's loss function. A clear biasvariance tradeoff emerges: Because our DGPs are not exactly finiteorder VAR models, LPs have lower bias than VAR estimators; however, the variance of LPs is substantially higher than that of VARs at intermediate or long horizons. Unless researchers are overwhelmingly concerned with bias, shrinkage via Bayesian VARs or penalized LPs is attractive. 
Date:  2021–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2104.00655&r=all 
By:  Yiyan Huang; Cheuk Hang Leung; Xing Yan; Qi Wu 
Abstract:  Most existing studies on the double/debiased machine learning method concentrate on the causal parameter estimation recovering from the firstorder orthogonal score function. In this paper, we will construct the $k^{\mathrm{th}}$order orthogonal score function for estimating the average treatment effect (ATE) and present an algorithm that enables us to obtain the debiased estimator recovered from the score function. Such a higherorder orthogonal estimator is more robust to the misspecification of the propensity score than the firstorder one does. Besides, it has the merit of being applicable with many machine learning methodologies such as Lasso, Random Forests, Neural Nets, etc. We also undergo comprehensive experiments to test the power of the estimator we construct from the score function using both the simulated datasets and the real datasets. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.11869&r=all 
By:  Augusteijn, Hilde Elisabeth Maria (Tilburg University); van Aert, Robbie Cornelis Maria; van Assen, Marcel A. L. M. 
Abstract:  Publication bias remains to be a great challenge when conducting a metaanalysis. It may result in overestimated effect sizes, increased frequency of false positives, and over or underestimation of the effect size heterogeneity parameter. A new method is introduced, Bayesian MetaAnalytic Snapshot (BMAS), which evaluates both effect size and its heterogeneity and corrects for potential publication bias. It evaluates the probability of the true effect size being zero, small, medium or large, and the probability of true heterogeneity being zero, small, medium or large. This approach, which provides an intuitive evaluation of uncertainty in the evaluation of effect size and heterogeneity, is illustrated with a realdata example, a simulation study, and a Shiny web application of BMAS. 
Date:  2021–03–18 
URL:  http://d.repec.org/n?u=RePEc:osf:osfxxx:avkgj&r=all 
By:  Lukas Boer; Helmut Lütkepohl 
Abstract:  A major challenge for proxy vector autoregressive analysis is the construction of a suitable external instrument variable or proxy for identifying a shock of interest. Some authors construct sophisticated proxies that account for the dating and size of the shock while other authors consider simpler versions that use only the dating and signs of particular shocks. It is shown that such qualitative (sign)proxies can lead to impulse response estimates of the impact effects of the shock of interest that are nearly as efficient as or even more efficient than estimators based on more sophisticated quantitative proxies that also reflect the size of the shock. Moreover, the signproxies tend to provide more precise impulse response estimates than an approach based merely on the higher volatility of the shocks of interest on event dates. 
Keywords:  GMM, heteroskedastic VAR, instrumental variable estimation, proxy VAR, structural vector autoregression 
JEL:  C32 
Date:  2021 
URL:  http://d.repec.org/n?u=RePEc:diw:diwwpp:dp1940&r=all 
By:  Yinchu Zhu 
Abstract:  We consider the setting in which a strong binary instrument is available for a binary treatment. The traditional LATE approach assumes the monotonicity condition stating that there are no defiers (or compliers). Since this condition is not always obvious, we investigate the sensitivity and testability of this condition. In particular, we focus on the question: does a slight violation of monotonicity lead to a small problem or a big problem? We find a phase transition for the monotonicity condition. On one of the boundary of the phase transition, it is easy to learn the sign of LATE and on the other side of the boundary, it is impossible to learn the sign of LATE. Unfortunately, the impossible side of the phase transition includes datagenerating processes under which the proportion of defiers tends to zero. This boundary of phase transition is explicitly characterized in the case of binary outcomes. Outside a special case, it is impossible to test whether the datagenerating process is on the nice side of the boundary. However, in the special case that the noncompliance is almost onesided, such a test is possible. We also provide simple alternatives to monotonicity. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.13369&r=all 
By:  Riccardo (Jack) Lucchetti (Dipartimento di Scienze Economiche e Sociali (DiSES), Università Politecnica delle Marche); Claudia Pigini (Dipartimento di Scienze Economiche e Sociali (DiSES), Università Politecnica delle Marche) 
Abstract:  Estimation of randomeffects dynamic probit models for panel data entails the socalled “initial conditions problem”. We argue that the relative finitesample performance of the two main competing solutions is driven by the magnitude of the individual unobserved heterogeneity and/or of the state dependence in the data. We investigate our conjecture by means of a comprehensive Monte Carlo experiment and offer useful indications for the practitioner. 
Keywords:  Panel data, dynamic probit, initial conditions 
JEL:  C23 C25 
Date:  2020 
URL:  http://d.repec.org/n?u=RePEc:ven:wpaper:2020:27&r=all 
By:  Torben G. Andersen; Rasmus T. Varneskov 
Abstract:  This paper studies the properties of predictive regressions for asset returns in economic systems governed by persistent vector autoregressive dynamics. In particular, we allow for the state variables to be fractionally integrated, potentially of different orders, and for the returns to have a latent persistent conditional mean, whose memory is difficult to estimate consistently by standard techniques in finite samples. Moreover, the predictors may be endogenous and “imperfect”. In this setting, we provide a cointegration rank test to determine the predictive model framework as well as the latent persistence of returns. This motivates a rankaugmented Local Spectrum (LCM) procedure, which is consistent and delivers asymptotic Gaussian inference. Simulations illustrate the theoretical arguments. Finally, in an empirical application concerning monthly S&P 500 return prediction, we provide evidence for a fractionally integrated conditional mean component. Moreover, using the rankaugmented LCM procedure, we document significant predictive power for key state variables such as the priceearnings ratio and the default spread. 
JEL:  G12 G17 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:28569&r=all 
By:  O'Brien, Martin (Central Bank of Ireland); Velasco, Sofia (Central Bank of Ireland) 
Abstract:  This paper develops a multivariate filter based on an unobserved component trendcycle model. It incorporates stochastic volatility and relies on specific formulations for the cycle component. We test the performance of this algorithm within a MonteCarlo experiment and apply this decomposition tool to study the evolution of the financial cycle (estimated as the cycle of the credittoGDP ratio) for the United States, the United Kingdom and Ireland. We compare our credit cycle measure to the Basel III creditto GDP gap, prominent for its role informing the setting of countercyclical capital buffers. The Baselgap employs the HodrickPrescott filter for trend extraction. Filtering methods reliant on similarduration assumptions suffer from endpointbias or spurious cycles. These shortcomings might bias the shape of the credit cycle and thereby limit the precision of the policy assessment reliant on its evolution to target financial distress. Allowing for a flexible law of motion of the variance covariance matrix and informing the estimation of the cycle via economic fundamentalsweare able to improve the statistical properties and to find a more economically meaningful measure of the buildup of cyclical systemic risks. Additionally, we find a large heterogeneity in the drivers of the credit cycles across time and countries. This result stresses the relevance in macro prudential policy of considering flexible approaches that can be tailored to country characteristics in contrast to standardized indicators. 
Keywords:  Credit imbalances, cyclical systemic risk, financial cycle, macroprudential analysis, multivariate unobservedcomponents models, stochastic volatility . 
JEL:  C32 E32 E58 G01 G28 
Date:  2020–12 
URL:  http://d.repec.org/n?u=RePEc:cbi:wpaper:09/rt/20&r=all 
By:  Joanna Morais; Christine ThomasAgnan (TSE  Toulouse School of Economics  UT1  Université Toulouse 1 Capitole  EHESS  École des hautes études en sciences sociales  CNRS  Centre National de la Recherche Scientifique  INRAE  Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement) 
Abstract:  In the framework of Compositional Data Analysis, vectors carrying relative information, also called compositional vectors, can appear in regression models either as dependent or as explanatory variables. In some situations, they can be on both sides of the regression equation. Measuring the marginal impacts of covariates in these types of models is not straightforward since a change in one component of a closed composition automatically affects the rest of the composition. Previous work by the authors has shown how to measure, compute and interpret these marginal impacts in the case of linear regression models with compositions on both sides of the equation. The resulting natural interpretation is in terms of an elasticity, a quantity commonly used in econometrics and marketing applications. They also demonstrate the link between these elasticities and simplicial derivatives. The aim of this contribution is to extend these results to other situations, namely when the compositional vector is on a single side of the regression equation. In these cases, the marginal impact is related to a semielasticity and also linked to some simplicial derivative. Moreover we consider the possibility that a total variable is used as an explanatory variable, with several possible interpretations of this total and we derive the elasticity formulas in that case. 
Keywords:  compositional regression model,marginal effects,simplicial derivative,elasticity,semielasticity 
Date:  2021–01 
URL:  http://d.repec.org/n?u=RePEc:hal:journl:hal03180682&r=all 
By:  Mohammadreza Ghanbari; Mahdi Goldani 
Abstract:  Support vector machine modeling is a new approach in machine learning for classification showing good performance on forecasting problems of small samples and high dimensions. Later, it promoted to Support Vector Regression (SVR) for regression problems. A big challenge for achieving reliable is the choice of appropriate parameters. Here, a novel Golden sine algorithm (GSA) based SVR is proposed for proper selection of the parameters. For comparison, the performance of the proposed algorithm is compared with eleven other metaheuristic algorithms on some historical stock prices of technological companies from Yahoo Finance website based on Mean Squared Error and Mean Absolute Percent Error. The results demonstrate that the given algorithm is efficient for tuning the parameters and is indeed competitive in terms of accuracy and computing time. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.11459&r=all 
By:  Joachim Freyberger 
Abstract:  An important class of structural models investigates the determinants of skill formation and the optimal timing of interventions. To achieve point identification of the parameters, researcher typically normalize the scale and location of the unobserved skills. This paper shows that these seemingly innocuous restrictions can severely impact the interpretation of the parameters and counterfactual predictions. For example, simply changing the units of measurements of observed variables might yield ineffective investment strategies and misleading policy recommendations. To tackle these problems, this paper provides a new identification analysis, which pools all restrictions of the model, characterizes the identified set of all parameters without normalizations, illustrates which features depend on these normalizations, and introduces a new set of important policyrelevant parameters that are identified under weak assumptions and yield robust conclusions. As a byproduct, this paper also presents a general and formal definition of when restrictions are truly normalizations. 
Date:  2021–04 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2104.00473&r=all 
By:  Jonas Meier 
Abstract:  This paper introduces multivariate distribution regression (MDR), a semiparametric approach to estimate the joint distribution of outcomes. The method allows studying complex dependence structures and distributional treatment eects without making strong parametric assumptions. I show that the MDR coecient process converges to a Gaussian process and that the bootstrap is consistent for the asymptotic distribution of the estimator. Methodologically, MDR contributes by oering the analysis of many functionals of the CDF. For instance, this includes counterfactual distributions. Compared to copula models, MDR achieves the same accuracy but is (i) more robust to misspecication and (ii) allows to condition on many covariates, thus ensuring a high degree of exibility. Finally, an application analyzes shifts in spousal labor supply in response to a health shock. I find that if lowincome individuals receive disability insurance benets, their spouses respond by increasing their labor supply. Whereas the opposite holds for highincome households, likely because they are well insured and can aord to work fewer hours. 
Keywords:  Distribution regression; joint distribution; decomposition analysis, distributional treatment eects 
JEL:  C14 C21 
Date:  2020–12 
URL:  http://d.repec.org/n?u=RePEc:ube:dpvwib:dp2023&r=all 
By:  V. A. Kalyagin; A. P. Koldanov; P. A. Koldanov 
Abstract:  Maximum spanning tree (MST) is a popular tool in market network analysis. Large number of publications are devoted to the MST calculation and it's interpretation for particular stock markets. However, much less attention is payed in the literature to the analysis of uncertainty of obtained results. In the present paper we suggest a general framework to measure uncertainty of MST identification. We study uncertainty in the framework of the concept of random variable network (RVN). We consider different correlation based networks in the large class of elliptical distributions. We show that true MST is the same in three networks: Pearson correlation network, Fechner correlation network, and Kendall correlation network. We argue that among different measures of uncertainty the FDR (False Discovery Rate) is the most appropriated for MST identification. We investigate FDR of Kruskal algorithm for MST identification and show that reliability of MST identification is different in these three networks. In particular, for Pearson correlation network the FDR essentially depends on distribution of stock returns. We prove that for market network with Fechner correlation the FDR is non sensitive to the assumption on stock's return distribution. Some interesting phenomena are discovered for Kendall correlation network. Our experiments show that FDR of Kruskal algorithm for MST identification in Kendall correlation network weakly depend on distribution and at the same time the value of FDR is almost the best in comparison with MST identification in other networks. These facts are important in practical applications. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.14593&r=all 
By:  Kilian Huber 
Abstract:  Researchers use (quasi)experimental methods to estimate how shocks affect directly treated firms and households. Such methods typically do not account for general equilibrium spillover effects. I outline a method that estimates spillovers operating among groups of firms and households. I argue that the presence of multiple types of spillovers, measurement error, and nonlinear effects can severely bias estimates. I show how instrumental variables, heterogeneity tests, and flexible functional forms can overcome different sources of bias. The analysis is particularly relevant to the estimation of spillovers following largescale financial and business cycle shocks. 
Keywords:  general equilibrium effects, spillovers, estimation, macroeconomic shocks, financial shocks 
Date:  2021 
URL:  http://d.repec.org/n?u=RePEc:ces:ceswps:_8955&r=all 
By:  Wenyang Huang; Huiwen Wang; Shanshan Wang 
Abstract:  The (openhighlowclose) OHLC data is the most common data form in the field of finance and the investigate object of various technical analysis. With increasing features of OHLC data being collected, the issue of extracting their useful information in a comprehensible way for visualization and easy interpretation must be resolved. The inherent constraints of OHLC data also pose a challenge for this issue. This paper proposes a novel approach to characterize the features of OHLC data in a dataset and then performs dimension reduction, which integrates the feature information extraction method and principal component analysis. We refer to it as the pseudoPCA method. Specifically, we first propose a new way to represent the OHLC data, which will free the inherent constraints and provide convenience for further analysis. Moreover, there is a onetoone match between the original OHLC data and its featurebased representations, which means that the analysis of the featurebased data can be reversed to the original OHLC data. Next, we develop the pseudoPCA procedure for OHLC data, which can effectively identify important information and perform dimension reduction. Finally, the effectiveness and interpretability of the proposed method are investigated through finite simulations and the spot data of China's agricultural product market. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.16908&r=all 
By:  Hanjo Odendaal (Department of Economics, Stellenbosch University) 
Abstract:  This paper aims to offer an alternative to the manually labour intensive process of constructing a domain specific lexicon or dictionary through the operationalization of subjective information processing. This paper builds on current empirical literature by (a) constructing a domain specific dictionary for various economic confidence indices, (b) introducing a novel weighting schema of text tokens that account for time dependence; and (c) operationalising subjective information processing of text data using machine learning. The results show that sentiment indices constructed from machine generated dictionaries have a better fit with multiple indicators of economic activity than @loughran2011liability's manually constructed dictionary. Analysis shows a lower RMSE for the domain specific dictionaries in a five year holdout sample period from 2012 to 2017. The results also justify the time series weighting design used to overcome the p>>n problem, commonly found when working with economic time series and text data. 
Keywords:  Sentometrics, Machine learning, Domainspecific dictionaries 
JEL:  C32 C45 C53 C55 
Date:  2021 
URL:  http://d.repec.org/n?u=RePEc:sza:wpaper:wpapers366&r=all 
By:  Yijian Chuan; Chaoyi Zhao; Zhenrui He; Lan Wu 
Abstract:  We develop a novel approach to explain why AdaBoost is a successful classifier. By introducing a measure of the influence of the noise points (ION) in the training data for the binary classification problem, we prove that there is a strong connection between the ION and the test error. We further identify that the ION of AdaBoost decreases as the iteration number or the complexity of the base learners increases. We confirm that it is impossible to obtain a consistent classifier without deep trees as the base learners of AdaBoost in some complicated situations. We apply AdaBoost in portfolio management via empirical studies in the Chinese market, which corroborates our theoretical propositions. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.12345&r=all 
By:  Sophocles Mavroeidis 
Abstract:  I show that the Zero Lower Bound (ZLB) on interest rates can be used to identify the causal effects of monetary policy. Identification depends on the extent to which the ZLB limits the efficacy of monetary policy. I develop a general econometric methodology for the identification and estimation of structural vector autoregressions (SVARs) with an occasionally binding constraint. The method provides a simple way to test the efficacy of unconventional policies, modelled via a `shadow rate'. I apply this method to U.S. monetary policy using a threeequation SVAR model of inflation, unemployment and the federal funds rate. I reject the null hypothesis that unconventional monetary policy has no effect at the ZLB, but find some evidence that it is not as effective as conventional monetary policy. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.12779&r=all 
By:  S. Borağan Aruoba; Marko Mlikota; Frank Schorfheide; Sergio Villalvazo 
Abstract:  We develop a structural VAR in which an occasionallybinding constraint generates censoring of one of the dependent variables. Once the censoring mechanism is triggered, we allow some of the coefficients for the remaining variables to change. We show that a necessary condition for a unique reduced form is that regression functions for the noncensored variables are continuous at the censoring point and that parameters satisfy some mild restrictions. In our application the censored variable is a nominal interest rate constrained by an effective lower bound (ELB). According to our estimates based on U.S. data, once the ELB becomes binding, the coefficients in the inflation equation change significantly, which translates into a change of the inflation responses to (unconventional) monetary policy and demand shocks. Our results suggest that the presence of the ELB is indeed empirically relevant for the propagation of shocks. We also obtain a shadow interest rate that shows a significant accommodation in the early parts of the Great Recession, followed by a mild and steady accommodation until liftoff in 2016. 
JEL:  C11 C22 C34 E32 E52 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:nbr:nberwo:28571&r=all 
By:  Pamela Jakiela 
Abstract:  Differenceindifferences estimation is a widely used method of program evaluation. When treatment is implemented in different places at different times, researchers often use twoway fixed effects to control for locationspecific and periodspecific shocks. Such estimates can be severely biased when treatment effects change over time within treated units. I review the sources of this bias and propose several simple diagnostics for assessing its likely severity. I illustrate these tools through a case study of free primary education in SubSaharan Africa. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.13229&r=all 
By:  Huiwen Wang; Wenyang Huang; Shanshan Wang 
Abstract:  Forecasting the (openhighlowclose)OHLC data contained in candlestick chart is of great practical importance, as exemplified by applications in the field of finance. Typically, the existence of the inherent constraints in OHLC data poses great challenge to its prediction, e.g., forecasting models may yield unrealistic values if these constraints are ignored. To address it, a novel transformation approach is proposed to relax these constraints along with its explicit inverse transformation, which ensures the forecasting models obtain meaningful openhighlowclose values. A flexible and efficient framework for forecasting the OHLC data is also provided. As an example, the detailed procedure of modelling the OHLC data via the vector autoregression (VAR) model and vector error correction (VEC) model is given. The new approach has high practical utility on account of its flexibility, simple implementation and straightforward interpretation. Extensive simulation studies are performed to assess the effectiveness and stability of the proposed approach. Three financial data sets of the Kweichow Moutai, CSI 100 index and 50 ETF of Chinese stock market are employed to document the empirical effect of the proposed methodology. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2104.00581&r=all 
By:  Apostolos Chalkis; Emmanouil Christoforou; Theodore Dalamagkas; Ioannis Z. Emiris 
Abstract:  We exploit a recent computational framework to model and detect financial crises in stock markets, as well as shock events in cryptocurrency markets, which are characterized by a sudden or severe drop in prices. Our method manages to detect all past crises in the French industrial stock market starting with the crash of 1929, including financial crises after 1990 (e.g. dotcom bubble burst of 2000, stock market downturn of 2002), and all past crashes in the cryptocurrency market, namely in 2018, and also in 2020 due to covid19. We leverage copulae clustering, based on the distance between probability distributions, in order to validate the reliability of the framework; we show that clusters contain copulae from similar market states such as normal states, or crises. Moreover, we propose a novel regression model that can detect successfully all past events using less than 10% of the information that the previous framework requires. We train our model by historical data on the industry assets, and we are able to detect all past shock events in the cryptocurrency market. Our tools provide the essential components of our software framework that offers fast and reliable detection, or even prediction, of shock events in stock and cryptocurrency markets of hundreds of assets. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.13294&r=all 
By:  Shoya Ishimaru 
Abstract:  This paper shows that a twoway fixed effects (TWFE) estimator is a weighted average of firstdifference (FD) estimators with different gaps between periods, generalizing a wellknown equivalence theorem in a twoperiod panel. Exploiting the identity, I clarify required conditions for the causal interpretation of the TWFE estimator. I highlight its several limitations and propose a generalized estimator that overcomes the limitations. An empirical application on the estimates of the minimum wage effects illustrates that recognizing the numerical equivalence and making use of the generalized estimator enable more transparent understanding of what we get from the TWFE estimator. 
Date:  2021–03 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2103.12374&r=all 