nep-ecm New Economics Papers
on Econometrics
Issue of 2023‒05‒22
nineteen papers chosen by
Sune Karlsson
Örebro universitet

  1. Determination of the effective cointegration rank in high-dimensional time-series predictive regressions By Puyi Fang; Zhaoxing Gao; Ruey S. Tsay
  2. The Ordinary Least Eigenvalues Estimator By Yassine Sbai Sassi
  3. Cross-temporal Probabilistic Forecast Reconciliation By Daniele Girolimetto; George Athanasopoulos; Tommaso Di Fonzo; Rob J Hyndman
  4. Averaging Impulse Responses Using Prediction Pools By Paul Ho; Thomas A. Lubik; Christian Matthes
  5. Bayesian Predictive Distributions of Oil Returns Using Mixed Data Sampling Volatility Models By Virbickaite, Audrone; Nguyen, Hoang; Tran, Minh-Ngoc
  6. Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series By Jarek Duda
  7. Modelling physical activity profiles in COPD patients: a new approach to variable-domain functional regression models By Hernandez Amaro, Pavel; Durbán Reguera, María Luz; Aguilera Morillo, Maria Del Carmen; Esteban Gonzalez, Cristobal; Arostegui, Inma
  8. Recurrent neural network based parameter estimation of Hawkes model on high-frequency financial data By Kyungsub Lee
  9. Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles By Carranza, Aldo Gael; Krishnamurthy, Sanath Kumar; Athey, Susan
  10. Parametric models of income distributions integrating misreporting and non-response mechanisms By Mathias Silva
  11. Policy Learning under Biased Sample Selection By Lihua Lei; Roshni Sahoo; Stefan Wager
  12. Error Spotting with Gradient Boosting: A Machine Learning-Based Application for Central Bank Data Quality By Csaba Burger; Mihály Berndt
  13. High-Dimensional Radial Symmetry of Copula Functions: Multiplier Bootstrap vs. Randomization By Monica Billio; Lorenzo Frattarolo; Dominique Guégan
  14. Revisiting the Returns to Higher Education: Heterogeneity by Cognitive and Non-Cognitive Abilities By Oliver Cassagneau-Francis
  15. Goodhart's law and machine learning: a structural perspective By A. Hennessy, Christopher; Goodhart, C. A. E.
  16. Covariance matrix estimation for robust portfolio allocation By Ahmad W. Bitar; Nathan de Carvalho; Valentin Gatignol
  17. Synthetic Controls with Multiple Outcomes: Estimating the Effects of Non-Pharmaceutical Interventions in the COVID-19 Pandemic By Wei Tian; Seojeong Lee; Valentyn Panchenko
  18. Machine Learning for Economics Research: When What and How? By Ajit Desai
  19. Breaks in the Phillips Curve: Evidence from Panel Data By Simon Smith; Allan Timmermann; Jonathan H. Wright

  1. By: Puyi Fang; Zhaoxing Gao; Ruey S. Tsay
    Abstract: This paper proposes a new approach to identifying the effective cointegration rank in high-dimensional unit-root (HDUR) time series from a prediction perspective using reduced-rank regression. For a HDUR process $\mathbf{x}_t\in \mathbb{R}^N$ and a stationary series $\mathbf{y}_t\in \mathbb{R}^p$ of interest, our goal is to predict future values of $\mathbf{y}_t$ using $\mathbf{x}_t$ and lagged values of $\mathbf{y}_t$. The proposed framework consists of a two-step estimation procedure. First, the Principal Component Analysis is used to identify all cointegrating vectors of $\mathbf{x}_t$. Second, the co-integrated stationary series are used as regressors, together with some lagged variables of $\mathbf{y}_t$, to predict $\mathbf{y}_t$. The estimated reduced rank is then defined as the effective cointegration rank of $\mathbf{x}_t$. Under the scenario that the autoregressive coefficient matrices are sparse (or of low-rank), we apply the Least Absolute Shrinkage and Selection Operator (or the reduced-rank techniques) to estimate the autoregressive coefficients when the dimension involved is high. Theoretical properties of the estimators are established under the assumptions that the dimensions $p$ and $N$ and the sample size $T \to \infty$. Both simulated and real examples are used to illustrate the proposed framework, and the empirical application suggests that the proposed procedure fares well in predicting stock returns.
    Date: 2023–04
  2. By: Yassine Sbai Sassi
    Abstract: We propose a rate optimal estimator for the linear regression model on network data with interacted (unobservable) individual effects. The estimator achieves a faster rate of convergence $N$ compared to the standard estimators' $\sqrt{N}$ rate and is efficient in cases that we discuss. We observe that the individual effects alter the eigenvalue distribution of the data's matrix representation in significant and distinctive ways. We subsequently offer a correction for the \textit{ordinary least squares}' objective function to attenuate the statistical noise that arises due to the individual effects, and in some cases, completely eliminate it. The new estimator is asymptotically normal and we provide a valid estimator for its asymptotic covariance matrix. While this paper only considers models accounting for first-order interactions between individual effects, our estimation procedure is naturally extendable to higher-order interactions and more general specifications of the error terms.
    Date: 2023–04
  3. By: Daniele Girolimetto; George Athanasopoulos; Tommaso Di Fonzo; Rob J Hyndman
    Abstract: Forecast reconciliation is a post-forecasting process that involves transforming a set of incoherent forecasts into coherent forecasts which satisfy a given set of linear constraints for a multivariate time series. In this paper we extend the current state-of-the-art cross-sectional probabilistic forecast reconciliation approach to encompass a cross-temporal framework, where temporal constraints are also applied. Our proposed methodology employs both parametric Gaussian and non-parametric bootstrap approaches to draw samples from an incoherent crosstemporal distribution. To improve the estimation of the forecast error covariance matrix, we propose using multi-step residuals, especially in the time dimension where the usual one-step residuals fail. To address high-dimensionality issues, we present four alternatives for the covariance matrix, where we exploit the twofold nature (cross-sectional and temporal) of the cross-temporal structure, and introduce the idea of overlapping residuals. We evaluate the proposed methods through a detailed simulation study that investigates their theoretical and empirical properties. We further assess the effectiveness of the proposed cross-temporal reconciliation approach by applying it to two empirical forecasting experiments, using the Australian GDP and the Australian Tourism Demand datasets. For both applications, we show that the optimal cross-temporal reconciliation approaches significantly outperform the incoherent base forecasts in terms of the Continuous Ranked Probability Score and the Energy Score. Overall, our study expands and unifies the notation for cross-sectional, temporal and cross-temporal reconciliation, thus extending and deepening the probabilistic cross-temporal framework. The results highlight the potential of the proposed cross-temporal forecast reconciliation methods in improving the accuracy of probabilistic forecasting models.
    Keywords: coherent, GDP, linear constraints, multivariate time series, temporal aggregation, tourism flows
    Date: 2023
  4. By: Paul Ho; Thomas A. Lubik; Christian Matthes
    Abstract: Macroeconomists construct impulse responses using many competing time series models and different statistical paradigms (Bayesian or frequentist). We adapt optimal linear prediction pools to efficiently combine impulse response estimators for the effects of the same economic shock from this vast class of possible models. We thus alleviate the need to choose one specific model, obtaining weights that are typically positive for more than one model. Three Monte Carlo simulations and two monetary shock empirical applications illustrate how the weights leverage the strengths of each model by (i) trading off properties of each model depending on variable, horizon, and application and (ii) accounting for the full predictive distribution rather than being restricted to specific moments.
    Keywords: prediction pools; model averaging; impulse responses; misspecification
    JEL: C32 C52
    Date: 2023–02
  5. By: Virbickaite, Audrone (CUNEF Universidad); Nguyen, Hoang (Örebro University School of Business); Tran, Minh-Ngoc (Discipline of Business Analytics, The University of Sydney Business School)
    Abstract: This study explores the benefits of incorporating fat-tailed innovations, asymmetric volatility response, and an extended information set into crude oil return modeling and forecasting. To this end, we utilize standard volatility models such as Generalized Autoregressive Conditional Heteroskedastic (GARCH), Generalized Autoregressive Score (GAS), and Stochastic Volatility (SV), along with Mixed Data Sampling (MIDAS) regressions, which enable us to incorporate the impacts of relevant financial/macroeconomic news into asset price movements. For inference and prediction, we employ an innovative Bayesian estimation approach called the density-tempered sequential Monte Carlo method. Our findings indicate that the inclusion of exogenous variables is beneficial for GARCH-type models while offering only a marginal improvement for GAS and SV-type models. Notably, GAS-family models exhibit superior performance in terms of in-sample fit, out-of-sample forecast accuracy, as well as Value-at-Risk and Expected Shortfall prediction.
    Keywords: ES; GARCH; GAS; log marginal likelihood; MIDAS; SV; VaR
    JEL: C22 C52 C58 G32
    Date: 2023–04–14
  6. By: Jarek Duda
    Abstract: The real life time series are usually nonstationary, bringing a difficult question of model adaptation. Classical approaches like GARCH assume arbitrary type of dependence. To prevent such bias, we will focus on recently proposed agnostic philosophy of moving estimator: in time $t$ finding parameters optimizing e.g. $F_t=\sum_{\tau
    Date: 2023–04
  7. By: Hernandez Amaro, Pavel; Durbán Reguera, María Luz; Aguilera Morillo, Maria Del Carmen; Esteban Gonzalez, Cristobal; Arostegui, Inma
    Abstract: Motivated by the increasingly common technology for collecting data, like cellphones, smartwatches, etc, functional data analysis has been intensively studied in recent decades, and along with it, functional regression models. However, the majority of functional data methods in general and functional regression models, in particular, are based on the fact that the observed datapresent the same domain. When the data have variable domain it needs to be aligned or registered in order to be fitted with the usual modeling techniques adding computational burden. To avoid this, a model that contemplates the variable domain features of the data is needed, but this type of models are scarce and its estimation method presents some limitations. In this article, we propose a new scalar-on-function regression model for variable domain functional data that eludes the need for alignment and a new estimation methodology that we extend to other variable domain regression models.
    Keywords: Variable Domain Functional Data; B-Splines; Mixed Models; Copd
    Date: 2025–05–05
  8. By: Kyungsub Lee
    Abstract: This study examines the use of a recurrent neural network for estimating the parameters of a Hawkes model based on high-frequency financial data, and subsequently, for computing volatility. Neural networks have shown promising results in various fields, and interest in finance is also growing. Our approach demonstrates significantly faster computational performance compared to traditional maximum likelihood estimation methods while yielding comparable accuracy in both simulation and empirical studies. Furthermore, we demonstrate the application of this method for real-time volatility measurement, enabling the continuous estimation of financial volatility as new price data keeps coming from the market.
    Date: 2023–04
  9. By: Carranza, Aldo Gael (Stanford U); Krishnamurthy, Sanath Kumar (Stanford U); Athey, Susan (Stanford U)
    Abstract: Contextual bandit algorithms often estimate reward models to inform decision-making. However, true rewards can contain action- independent redundancies that are not relevant for decision-making. We show it is more data- efficient to estimate any function that explains the reward differences between actions, that is, the treatment effects. Motivated by this obser- vation, building on recent work on oracle-based bandit algorithms, we provide the first reduction of contextual bandits to general-purpose hetero- geneous treatment effect estimation, and we de- sign a simple and computationally efficient algo- rithm based on this reduction. Our theoretical and experimental results demonstrate that hetero- geneous treatment effect estimation in contextual bandits offers practical advantages over reward estimation, including more efficient model esti- mation and greater flexibility to model misspeci- fication.
    Date: 2023–02
  10. By: Mathias Silva (Aix-Marseille Univ, CNRS, AMSE, Marseille, France.)
    Abstract: Several representativeness issues affect the available data sources in studying populations' income distributions. High-income under-reporting and non-response issues have been evidenced to be particularly significant in the literature, due to their consequence in underestimating income growth and inequality. This paper bridges several past parametric modelling attempts to account for high-income data issues in making parametric inference on income distributions at the population level. A unified parametric framework integrating parametric income distribution models and popular data replacing and reweighting corrections is developped. To exploit this framework for empirical analysis, an Approximate Bayesian Computation approach is developped. This approach updates prior beliefs on the population income distribution and the high-income data issues pressumably affecting the available data by attempting to reproduce the observed income distribution under simulations from the parametric model. Applications on simulated and EU-SILC data illustrate the performance of the approach in studying population-level mean incomes and inequality from data potentially affected by these high-income issues.
    Keywords: 'Missing rich', GB2, Bayesian inference
    JEL: D31 C18 C11
    Date: 2023–05
  11. By: Lihua Lei; Roshni Sahoo; Stefan Wager
    Abstract: Practitioners often use data from a randomized controlled trial to learn a treatment assignment policy that can be deployed on a target population. A recurring concern in doing so is that, even if the randomized trial was well-executed (i.e., internal validity holds), the study participants may not represent a random sample of the target population (i.e., external validity fails)--and this may lead to policies that perform suboptimally on the target population. We consider a model where observable attributes can impact sample selection probabilities arbitrarily but the effect of unobservable attributes is bounded by a constant, and we aim to learn policies with the best possible performance guarantees that hold under any sampling bias of this type. In particular, we derive the partial identification result for the worst-case welfare in the presence of sampling bias and show that the optimal max-min, max-min gain, and minimax regret policies depend on both the conditional average treatment effect (CATE) and the conditional value-at-risk (CVaR) of potential outcomes given covariates. To avoid finite-sample inefficiencies of plug-in estimates, we further provide an end-to-end procedure for learning the optimal max-min and max-min gain policies that does not require the separate estimation of nuisance parameters.
    Date: 2023–04
  12. By: Csaba Burger (Magyar Nemzeti Bank (the Central Bank of Hungary)); Mihály Berndt (Clarity Consulting Kft)
    Abstract: Supervised machine learning methods, in which no error labels are present, are increasingly popular methods for identifying potential data errors. Such algorithms rely on the tenet of a ‘ground truth’ in the data, which in other words assumes correctness in the majority of the cases. Points deviating from such relationships, outliers, are flagged as potential data errors. This paper implements an outlier-based error-spotting algorithm using gradient boosting, and presents a blueprint for the modelling pipeline. More specifically, it underpins three main modelling hypotheses with empirical evidence, which are related to (1) missing value imputation, (2) the loss-function choice and (3) the location of the error. By doing so, it uses a cross sectional view on the loan-to-value and its related columns of the Credit Registry (Hitelregiszter) of the Central Bank of Hungary (MNB), and introduces a set of synthetic error types to test its hypotheses. The paper shows that gradient boosting is not materially impacted by the choice of the imputation method, hence, replacement with a constant, the computationally most efficient, is recommended. Second, the Huber-loss function, which is piecewise quadratic up until the Huber-slope parameter and linear above it, is better suited to cope with outlier values; it is therefore better in capturing data errors. Finally, errors in the target variable are captured best, while errors in the predictors are hardly found at all. These empirical results may generalize to other cases, depending on data specificities, and the modelling pipeline described underscores significant modelling decisions.
    Keywords: data quality, machine learning, gradient boosting, central banking, loss functions, missing values
    JEL: C5 C81 E58
    Date: 2023
  13. By: Monica Billio (University of Ca’ Foscari [Venice, Italy]); Lorenzo Frattarolo (JRC - European Commission - Joint Research Centre [Ispra]); Dominique Guégan (University of Ca’ Foscari [Venice, Italy], CES - Centre d'économie de la Sorbonne - UP1 - Université Paris 1 Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, UP1 - Université Paris 1 Panthéon-Sorbonne)
    Abstract: We use a recently proposed fast test of copula radial symmetry based on multiplier bootstrap and obtain an equivalent randomization test. The literature shows the statistical superiority of the randomization approach in the bivariate case. We extend the comparison of statistical performance focusing on the high-dimensional regime in a simulation study. We document radial asymmetry in the joint distribution of the percentage changes of sectorial industrial production indices of the European Union.
    Keywords: copula, reflection symmetry, radial symmetry, empirical process, test, copula reflection symmetry radial symmetry empirical process test
    Date: 2022–01–07
  14. By: Oliver Cassagneau-Francis (ECON - Département d'économie (Sciences Po) - Sciences Po - Sciences Po - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Recent work has highlighted the significant variation in returns to higher education across individuals. We develop a novel methodology-exploiting recent advances in the identification of mixture models-which groups individuals according to their prior ability and estimates the wage returns to a university degree by group. We prove the non-parametric identification of our model. Applying our method to data from a UK cohort study, our findings reflect recent evidence that skills and ability are multidimensional. Our flexible model allows the returns to university to vary across the (multi-dimensional) ability distribution, a flexibility missing from commonly used additive models, but which we show is empirically important. The returns to higher education are 3-4 times larger than the returns to prior cognitive and non-cognitive abilities. Returns are generally increasing in ability for both men and women, but vary non-monotonically across the ability distribution.
    Keywords: Mixture models, Distributions, Treatment effects, Higher education, Wages, Human capital, Cognitive and non-cognitive abilities
    Date: 2022–05–19
  15. By: A. Hennessy, Christopher; Goodhart, C. A. E.
    Abstract: We develop a simple structural model to illustrate how penalized regressions generate Goodhart bias when training data are clean but covariates are manipulated at known cost by future agents. With quadratic (extremely steep) manipulation costs, bias is proportional to Ridge (Lasso) penalization. If costs depend on absolute or percentage manipulation, the following algorithm yields manipulation-proof prediction: Within training data, evaluate candidate coefficients at their respective incentive-compatible manipulation configuration. We derive analytical coefficient adjustments: slopes (intercept) shift downward if costs depend on percentage (absolute) manipulation. Statisticians ignoring manipulation costs select socially suboptimal penalization. Model averaging reduces these manipulation costs.
    JEL: J1
    Date: 2023–03–21
  16. By: Ahmad W. Bitar (UTT - Université de Technologie de Troyes, CentraleSupélec); Nathan de Carvalho (UPCité - Université Paris Cité, CentraleSupélec, Engie Global Markets); Valentin Gatignol (Qube Research and Technologies, CentraleSupélec)
    Abstract: In this technical report , we aim to combine different protfolio allocation techniques with covariance matrix estimators to meet two types of clients' requirements: client A who wants to invest money wisely, not taking too much risk, and not willing to pay too much in rebalancing fees; and client B who wants to make money quickly, benefit from market's short-term volatility, and ready to pay rebalancing fees. Four portfolio techniques are considered (mean-variance, robust portfolio, minimum-variance, and equi-risk budgeting), and four covariance estimators are applied (sample covariance, ordinary least squares (OLS) covariance, cross-validated eigenvalue shrinkage covariance, and eigenvalue clipping). Some comparisons between the covariance estimators in terms of eigenvalue stability and four metrics (i.e. expected risk, gross leverage, Sharpe ratio and effective diversification) exhibit the superiority of the eigenvalue clipping covariance estimator. The experiments on the Russel1000 dataset show that the minimum-variance with eigenvalue clipping is the model suitable for client A, whereas robust portfolio with eigenvalue clipping is the one suitable for client B.
    Keywords: Robust portfolio, minimum-variance, eigenvalue clipping, OLS covariance
    Date: 2023–03–26
  17. By: Wei Tian (School of Economics, UNSW Business School, UNSW); Seojeong Lee (Department of Economics, Seoul National University); Valentyn Panchenko (School of Economics, UNSW Business School, UNSW)
    Abstract: We propose a generalization of the synthetic control method to a multiple-outcome framework, which improves the reliability of treatment effect estimation. This is done by supplementing the conventional pre-treatment time dimension with the extra dimension of related outcomes in computing the synthetic control weights. Our generalization can be particularly useful for studies evaluating the effect of a treatment on multiple outcome variables. To illustrate our method, we estimate the effects of non-pharmaceutical interventions (NPIs) on various outcomes in Sweden in the first 3 quarters of 2020. Our results suggest that if Sweden had implemented stricter NPIs like the other European countries by March, then there would have been about 70% fewer cumulative COVID-19 infection cases and deaths by July, and 20% fewer deaths from all causes in early May, whereas the impacts of the NPIs were relatively mild on the labor market and economic outcomes.
    Keywords: Synthetic control, Policy evaluation, Causal inference, Public health
    JEL: C32 C54 I18
    Date: 2023–03
  18. By: Ajit Desai
    Abstract: This article provides a curated review of selected papers published in prominent economics journals that use machine learning (ML) tools for research and policy analysis. The review focuses on three key questions: (1) when ML is used in economics, (2) what ML models are commonly preferred, and (3) how they are used for economic applications. The review highlights that ML is particularly used in processing nontraditional and unstructured data, capturing strong nonlinearity, and improving prediction accuracy. Deep learning models are suitable for nontraditional data, whereas ensemble learning models are preferred for traditional datasets. While traditional econometric models may suffice for analyzing low-complexity data, the increasing complexity of economic data due to rapid digitalization and the growing literature suggest that ML is becoming an essential addition to the econometrician's toolbox.
    Date: 2023–03
  19. By: Simon Smith; Allan Timmermann; Jonathan H. Wright
    Abstract: We revisit time-variation in the Phillips curve, applying new Bayesian panel methods with breakpoints to US and European Union disaggregate data. Our approach allows us to accurately estimate both the number and timing of breaks in the Phillips curve. It further allows us to determine the existence of clusters of industries, cities, or countries whose Phillips curves display similar patterns of instability and to examine lead-lag patterns in how individual inflation series change. We find evidence of a marked flattening in the Phillips curves for US sectoral data and among EU countries, particularly poorer ones. Conversely, evidence of a flattening is weaker for MSA-level data and for the wage Phillips curve. US regional data and EU data point to a kink in the price Phillips curve which remains relatively steep when the economy is running hot.
    JEL: C11 C22 E51 E52
    Date: 2023–04

This nep-ecm issue is ©2023 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.