nep-ecm 2022-02-21 papers

on Econometrics

Issue of 2022‒02‒21
25 papers chosen by
Sune Karlsson
Örebro universitet

Test of Neglected Heterogeneity in Dyadic Models By Jinyong Hahn; Hyungsik Roger Moon; Ruoyao Shi
Asymptotic properties of Bayesian inference in linear regression with a structural break By Kenichi Shimizu
Semiparametric Bayesian Estimation of Discrete Choice Models By Andriy Norets; Kenichi Shimizu
A New Test for Multiple Predictive Regression By Ke-Li Xu; Junjie Guo
Determining the number of factors in high-dimensional generalized latent factor models By Chen, Yunxiao; Li, Xiaoou
An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses By Ron Mittelhammer; George Judge; Miguel Henry
Stationary GE-Process and its Application in Analyzing Gold Price Data By Debasis Kundu
On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation By Xiaohong Chen; Zhengling Qi
Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity By Ando, Tomohiro; Bai, Jushan
Why you should not use the LSV herding measure By Jurkatis, Simon
Reverse matching for ex-ante policy evaluation By George Planiteros
Treatment Effect Risk: Bounds and Inference By Nathan Kallus
Power Approximations for Meta-Analysis of Dependent Effect Sizes By Vembye, Mikkel Helding; Pustejovsky, James E; Pigott, Terri
Optimal stopping and worker selection in crowdsourcing: an adaptive sequential probability ratio test framework By Li, Xiaoou; Chen, Yunxiao; Chen, Xi; Liu, Jingchen; Ying, Zhiliang
The Prior Adaptive Group Lasso and the Factor Zoo By Kristoffer Pons Bertelsen
Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies By Joshi, Megha; Pustejovsky, James E; Beretvas, S. Natasha
Exploiting disagreement between high-dimensional variable selectors for uncertainty visualization By Yuen, Christine; Fryzlewicz, Piotr
Bayesian Estimation of Multivariate Panel Probits with Higher-order Network Interdependence and an Application to Firms' Global Market Participation in Guangdong By Badi H. Baltagi; Peter H. Egger; Michaela Kesina
Multi-cutoff RD designs with observations located at each cutoff: problems and solutions By Margherita Fort; Andrea Ichino; Enrico Rettore; Giulio Zanella
Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values By Thomas R. Cook; Greg Gupton; Zach Modig; Nathan M. Palmer
High-dimensional, multiscale online changepoint detection By Chen, Yudong; Wang, Tengyao; Samworth, Richard J.
Convolutional regression for big spatial data By Yasumasa Matsuda; Xin Yuan
The importance of supply and demand for oil prices: evidence from non-Gaussianity By Braun, Robin
A statistical foundation for the measurement of managerial ability By Banker, Rajiv; Park, Han-Up; Sahoo, Biresh
A machine learning search for optimal GARCH parameters By Luke De Clerk; Sergey Savl'ev

Test of Neglected Heterogeneity in Dyadic Models

By:	Jinyong Hahn (UCLA); Hyungsik Roger Moon (USC & Yonsei); Ruoyao Shi (Department of Economics, University of California Riverside)
Abstract:	We develop a Lagrange Multiplier (LM) test of neglected heterogeneity in dyadic models. The test statistic is derived by modifying Breusch and Pagan (1980)â€™s test. We establish the asymptotic distribution of the test statistic under the null using a novel martingale construction. We also consider the power of the LM test in generic panel models. Even though the test is motivated by random effects, we show that it has a power for detecting fixed effects as well. Finally, we examine how the estimation noise of the maximum likelihood estimator affects the asymptotic distribution of the test under the null, and show that such a noise may be ignored in large samples.
Keywords:	Lagrange Multiplier test, dyadic regression model, error component panel regression model, fixed effects, local power
JEL:	C12 C23
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:ucr:wpaper:202206&r=

Asymptotic properties of Bayesian inference in linear regression with a structural break

By:	Kenichi Shimizu
Abstract:	This paper studies large sample properties of a Bayesian approach to inference about slope parameters γ in linear regression models with a structural break. In contrast to the conventional approach to inference about γ that does not take into account the uncertainty of the unknown break location τ , the Bayesian approach that we consider incorporates such uncertainty. Our main theoretical contribution is a Bernstein-von Mises type theorem (Bayesian asymptotic normality) for γ under a wide class of priors, which essentially indicates an asymptotic equivalence between the conventional frequentist and Bayesian inference. Consequently, a frequentist researcher could look at credible intervals of γ to check robustness with respect to the uncertainty of τ . Simulation studies show that the conventional confidence intervals of γ tend to undercover in finite samples whereas the credible intervals offer more reasonable coverages in general. As the sample size increases, the two methods coincide, as predicted from our theoretical conclusion. Using data from Paye and Timmermann (2006) on stock return prediction, we illustrate that the traditional confidence intervals on γ might underrepresent the true sampling uncertainty.
Keywords:	Structural break, Bernstein-von Mises theorem, Sensitivity check, Model averaging
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:gla:glaewp:2022_05&r=

Semiparametric Bayesian Estimation of Discrete Choice Models

By:	Andriy Norets; Kenichi Shimizu
Abstract:	We propose a tractable semiparametric estimation method for dynamic discrete choice models. The distribution of additive utility shocks is modeled by location-scale mixtures of extreme value distributions with varying numbers of mixture components. Our approach exploits the analytical tractability of extreme value distributions and the flexibility of the location-scale mixtures. We implement the Bayesian approach to inference using Hamiltonian Monte Carlo and an approximately optimal reversible jump algorithm from Norets (2021). For binary dynamic choice model, our approach delivers estimation results that are consistent with the previous literature. We also apply the proposed method to multinomial choice models, for which previous literature does not provide tractable estimation methods in general settings without distributional assumptions on the utility shocks. We develop theoretical results on approximations by location-scale mixtures in an appropriate distance and posterior concentration of the set identified utility parameters and the distribution of shocks in the model.
Keywords:	Dynamic Discrete choice, Bayesian nonparametrics, set identification, location-scale mixtures, MCMC, Hamiltonian Monte Carlo, reversible jump
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:gla:glaewp:2022_06&r=

A New Test for Multiple Predictive Regression

By:	Ke-Li Xu (Department of Economics, Indiana University); Junjie Guo (School of Finance, Central University of Finance and Economics, Beijing, China)
Abstract:	We consider inference for predictive regressions with multiple predictors. Extant tests for predictability may perform unsatisfactorily and tend to discover spurious predictability as the number of predictors increases. We propose a battery of new instrumental-variables based tests which involve enforcement or partial enforcement of the null hypothesis in variance estimation. A test based on the few-predictors-at-a-time parsimonious system approach is recommended. Empirical Monte Carlos demonstrate the remarkable finite-sample performance regardless of numerosity of predictors and their persistence properties. Empirical application to equity premium predictability is provided.
Keywords:	Curse of dimensionality, Lagrange-multipliers test, persistence, predictive regression, return predictability
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:inu:caeprp:2022001&r=

Determining the number of factors in high-dimensional generalized latent factor models

By:	Chen, Yunxiao; Li, Xiaoou
Abstract:	As a generalization of the classical linear factor model, generalized latent factor models are useful for analysing multivariate data of different types, including binary choices and counts. This paper proposes an information criterion to determine the number of factors in generalized latent factor models. The consistency of the proposed information criterion is established under a high-dimensional setting, where both the sample size and the number of manifest variables grow to infinity, and data may have many missing values. An error bound is established for the parameter estimates, which plays an important role in establishing the consistency of the proposed information criterion. This error bound improves several existing results and may be of independent theoretical interest. We evaluate the proposed method by a simulation study and an application to Eysenck’s personality questionnaire.
Keywords:	generalized latent factor model; joint maximum likelihood estimator; high-dimensional data; information criteria; selection consistency; OUP deal
JEL:	C1
Date:	2021–08–23
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:111574&r=

An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses

By:	Ron Mittelhammer; George Judge; Miguel Henry
Abstract:	In this paper, we introduce a flexible and widely applicable nonparametric entropy-based testing procedure that can be used to assess the validity of simple hypotheses about a specific parametric population distribution. The testing methodology relies on the characteristic function of the population probability distribution being tested and is attractive in that, regardless of the null hypothesis being tested, it provides a unified framework for conducting such tests. The testing procedure is also computationally tractable and relatively straightforward to implement. In contrast to some alternative test statistics, the proposed entropy test is free from user-specified kernel and bandwidth choices, idiosyncratic and complex regularity conditions, and/or choices of evaluation grids. Several simulation exercises were performed to document the empirical performance of our proposed test, including a regression example that is illustrative of how, in some contexts, the approach can be applied to composite hypothesis-testing situations via data transformations. Overall, the testing procedure exhibits notable promise, exhibiting appreciable increasing power as sample size increases for a number of alternative distributions when contrasted with hypothesized null distributions. Possible general extensions of the approach to composite hypothesis-testing contexts, and directions for future work are also discussed.
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2201.06647&r=

Stationary GE-Process and its Application in Analyzing Gold Price Data

By:	Debasis Kundu
Abstract:	In this paper we introduce a new discrete time and continuous state space stationary process $\{X_n; n = 1, 2, \ldots \}$, such that $X_n$ follows a two-parameter generalized exponential (GE) distribution. Joint distribution functions, characterization and some dependency properties of this new process have been investigated. The GE-process has three unknown parameters, two shape parameters and one scale parameter, and due to this reason it is more flexible than the existing exponential process. In presence of the scale parameter, if the two shape parameters are equal, then the maximum likelihood estimators of the unknown parameters can be obtained by solving one non-linear equation and if the two shape parameters are arbitrary, then the maximum likelihood estimators can be obtained by solving a two dimensional optimization problem. Two {\color{black} synthetic} data sets, and one real gold-price data set have been analyzed to see the performance of the proposed model in practice. Finally some generalizations have been indicated.
Date:	2021–10
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2201.02568&r=

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

By:	Xiaohong Chen; Zhengling Qi
Abstract:	We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision process with continuous states and actions. We recast the $Q$-function estimation into a special form of the nonparametric instrumental variables (NPIV) estimation problem. We first show that under one mild condition the NPIV formulation of $Q$-function estimation is well-posed in the sense of $L^2$-measure of ill-posedness with respect to the data generating distribution, bypassing a strong assumption on the discount factor $\gamma$ imposed in the recent literature for obtaining the $L^2$ convergence rates of various $Q$-function estimators. Thanks to this new well-posed property, we derive the first minimax lower bounds for the convergence rates of nonparametric estimation of $Q$-function and its derivatives in both sup-norm and $L^2$-norm, which are shown to be the same as those for the classical nonparametric regression (Stone, 1982). We then propose a sieve two-stage least squares estimator and establish its rate-optimality in both norms under some mild conditions. Our general results on the well-posedness and the minimax lower bounds are of independent interest to study not only other nonparametric estimators for $Q$-function but also efficient estimation on the value of any target policy in off-policy settings.
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2201.06169&r=

Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity

By:	Ando, Tomohiro; Bai, Jushan
Abstract:	This paper provides methods for flexibly capturing unobservable heterogeneity from longitudinal data in the context of an exponential family of distributions. The group memberships of individual units are left unspecified, and their heterogeneity is influenced by group-specific unobservable structures, as well as heterogeneous regression coefficients. We discuss a computationally efficient estimation method and derive the corresponding asymptotic theory. The established asymptotic theory includes verifying the uniform consistency of the estimated group membership. To test the heterogeneous regression coefficients within groups, we propose the Swamy-type test, which considers unobserved heterogeneity. We apply the proposed method to study the market structure of the taxi industry in New York City. Our method reveals interesting important insights from large-scale longitudinal data that consist of over 450 million data points.
Keywords:	Clustering; Factor analysis; Generalized linear models; Longitudinal data; Unobserved heterogeneity.
JEL:	C33 C38 C55
Date:	2021–12–23
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:111431&r=

Why you should not use the LSV herding measure

By:	Jurkatis, Simon (Bank of England)
Abstract:	Here are three reasons. (a) This paper proves that the popular investor-level herding measure is a biased estimator of herding. Monte Carlo simulations demonstrate that the measure underestimates herding by 20% to 100% of the estimation target. (b) The bias varies with the number of traders active in an asset such that regression type analyses using LSV to understand the causes and consequences of herding are likely to yield inconsistent estimates if controls are not carefully chosen. (c) The measure should be understood purely as a test on binomial overdispersion. However, alternative tests have superior size and power properties.
Keywords:	Herding; estimation; market microstructure; overdispersion
JEL:	C13 C58 G14 G40
Date:	2022–01–07
URL:	http://d.repec.org/n?u=RePEc:boe:boeewp:0959&r=

Reverse matching for ex-ante policy evaluation

By:	George Planiteros
Abstract:	The paper attacks the central policy evaluation question of forecasting the impact of interventions never previously experienced. It introduces treatment effects approach into a cognitive domain not currently spanned by its methodological arsenal. Existing causal effects bounding analysis is adjusted to the ex-ante program evaluation setting. A Monte Carlo experiment is conducted to test how severe the estimates of the proposed approach deviate from the "real" causal effect in the presence of selection and unobserved heterogeneity. The simulation shows that the approach is valid regarding the formulation of the counterfactual states given previous knowledge of the program rules and a sufficiently informative treatment probability. It also demonstrates that the width of the bounds are resilient to several deviations from the conditional independence assumption.
Keywords:	Policy evaluation, forecasting, treatment e ects, hypothetical treatment group, bounding and sensitivity analysis
Date:	2022–01–28
URL:	http://d.repec.org/n?u=RePEc:aue:wpaper:2206&r=

Treatment Effect Risk: Bounds and Inference

By:	Nathan Kallus
Abstract:	Since the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population. Assessing such risk is difficult, however, because any one individual treatment effect (ITE) is never observed so the 10% worst-affected cannot be identified, while distributional treatment effects only compare the first deciles within each treatment group, which does not correspond to any 10%-subpopulation. In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution. We leverage the availability of pre-treatment covariates and characterize the tightest-possible upper and lower bounds on ITE-CVaR given by the covariate-conditional average treatment effect (CATE) function. Some bounds can also be interpreted as summarizing a complex CATE function into a single metric and are of interest independently of being a bound. We then proceed to study how to estimate these bounds efficiently from data and construct confidence intervals. This is challenging even in randomized experiments as it requires understanding the distribution of the unknown CATE function, which can be very complex if we use rich covariates so as to best control for heterogeneity. We develop a debiasing method that overcomes this and prove it enjoys favorable statistical properties even when CATE and other nuisances are estimated by black-box machine learning or even inconsistently. Studying a hypothetical change to French job-search counseling services, our bounds and inference demonstrate a small social benefit entails a negative impact on a substantial subpopulation.
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2201.05893&r=

Power Approximations for Meta-Analysis of Dependent Effect Sizes

By:	Vembye, Mikkel Helding; Pustejovsky, James E; Pigott, Terri (Georgia State University)
Abstract:	Meta-analytic models for dependent effect sizes have grown increasingly sophisticated over the last few decades, which has created challenges for a priori power calculations. We introduce power approximations for tests of average effect sizes based upon the most common models for handling dependent effect sizes. In a Monte Carlo simulation, we show that the new power formulas can accurately approximate the true power of common meta-analytic models for dependent effect sizes. Lastly, we investigate the Type I error rate and power for several common models, finding that tests using robust variance estimation provide better Type I error calibration than tests with model-based variance estimation. We consider implications for practice with respect to selecting a working model and an inferential approach.
Date:	2022–01–06
URL:	http://d.repec.org/n?u=RePEc:osf:metaar:6tp9y&r=

Optimal stopping and worker selection in crowdsourcing: an adaptive sequential probability ratio test framework

By:	Li, Xiaoou; Chen, Yunxiao; Chen, Xi; Liu, Jingchen; Ying, Zhiliang
Abstract:	In this study, we solve a class of multiple testing problems under a Bayesian sequential decision framework. Our work is motivated by binary labeling tasks in crowdsourcing, where a requestor needs to simultaneously choose a worker to provide a label and decide when to stop collecting labels, under a certain budget constraint. We begin by using a binary hypothesis testing problem to determine the true label of a single object, and provide an optimal solution by casting it under an adaptive sequential probability ratio test framework. Then, we characterize the structure of the optimal solution, that is, the optimal adaptive sequential design, which minimizes the Bayes risk using a log-likelihood ratio statistic. We also develop a dynamic programming algorithm to efficiently compute the optimal solution. For the multiple testing problem, we propose an empirical Bayes approach for estimating the class priors, and show that the average loss of our method converges to the minimal Bayes risk under the true model. Experiments on both simulated and real data show the robustness of our method, as well as its superiority over existing methods in terms of its labeling accuracy.
Keywords:	Bayesian decision theory; crowdsourcing; empirical Bayes; sequential analysis; sequential probability ratio test
JEL:	R14 J01 J50 C1
Date:	2021–01–01
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:100873&r=

The Prior Adaptive Group Lasso and the Factor Zoo

By:	Kristoffer Pons Bertelsen (Aarhus University and CREATES)
Abstract:	This paper develops and presents the prior adaptive group lasso (pag-lasso) for generalized linear models. The pag-lasso is an extension of the prior lasso, which allows for the use of existing information in the lasso estimation. We show that the estimator exhibits properties similar to the adaptive group lasso. The performance of the pag-lasso estimator is illustrated in a Monte Carlo study. The estimator is used to select the set of relevant risk factors in asset pricing models while requiring that the chosen factors must be able to price the test assets as well as the unselected factors. The study shows that the pag-lasso yields a set of factors that explain the time variation in the returns while delivering estimated pricing errors close to zero. We find that canonical low-dimensional factor models from the asset pricing literature are insufficient to price the cross section of the test assets together with the remaining traded factors. The required number of pricing factors to include at any given time is closer to 20.
Keywords:	Asset Pricing, Factor Selection, Factor Zoo, High-Dimensional Modeling, Prior Information, Variable Selection
JEL:	C13 C33 C38 C51 C55 C58 G12
Date:	2022–01–24
URL:	http://d.repec.org/n?u=RePEc:aah:create:2022-05&r=

Cluster Wild Bootstrapping to Handle Dependent Effect Sizes in Meta-Analysis with a Small Number of Studies

By:	Joshi, Megha; Pustejovsky, James E; Beretvas, S. Natasha
Abstract:	The most common and well-known meta-regression models work under the assumption that there is only one effect size estimate per study and that the estimates are independent. However, meta-analytic reviews of social science research often include multiple effect size estimates per primary study, leading to dependence in the estimates. Some meta-analyses also include multiple studies conducted by the same lab or investigator, creating another potential source of dependence. An increasingly popular method to handle dependence is robust variance estimation (RVE), but this method can result in inflated Type I error rates when the number of studies is small. Small-sample correction methods for RVE have been shown to control Type I error rates adequately but may be overly conservative, especially for tests of multiple-contrast hypotheses. We evaluated an alternative method for handling dependence, cluster wild bootstrapping, which has been examined in the econometrics literature but not in the context of meta-analysis. Results from two simulation studies indicate that cluster wild bootstrapping maintains adequate Type I error rates and provides more power than extant small sample correction methods, particularly for multiple-contrast hypothesis tests. We recommend using cluster wild bootstrapping to conduct hypothesis tests for meta-analyses with a small number of studies. We have also created an R package that implements such tests.
Date:	2021–09–24
URL:	http://d.repec.org/n?u=RePEc:osf:metaar:x6uhk&r=

Exploiting disagreement between high-dimensional variable selectors for uncertainty visualization

By:	Yuen, Christine; Fryzlewicz, Piotr
Abstract:	We propose combined selection and uncertainty visualizer (CSUV), which visualizes selection uncertainties for covariates in high-dimensional linear regression by exploiting the (dis)agreement among different base selectors. Our proposed method highlights covariates that get selected the most frequently by the different base variable selection methods on subsampled data. The method is generic and can be used with different existing variable selection methods. We demonstrate its performance using real and simulated data. The corresponding R package CSUV is at https://github.com/christineyuen/CSUV, and the graphical tool is also available online via https://csuv.shinyapps.io/csuv.
Keywords:	high-dimensional data; variable selection; uncertainty visualization; T&F deal
JEL:	C1
Date:	2021–11–17
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:112480&r=

Bayesian Estimation of Multivariate Panel Probits with Higher-order Network Interdependence and an Application to Firms' Global Market Participation in Guangdong

By:	Badi H. Baltagi (Center for Policy Research, Maxwell School, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244); Peter H. Egger (ETH Zürich, CEPR, CESifo, GEP); Michaela Kesina (University of Groningen)
Abstract:	This paper proposes a Bayesian estimation framework for panel-data sets with binary dependent variables where a large number of cross-sectional units is observed over a short period of time, and cross-sectional units are interdependent in more than a single network domain. The latter provides for a substantial degree of flexibility towards modelling the decay function in network neighborliness (e.g., by disentangling the importance of rings of neighbors) or towards allowing for several channels of interdependence whose relative importance is unknown ex ante. Besides the flexible parameterization of cross-sectional dependence, the approach allows for simultaneity of the equations. These features should make the approach interesting for applications in a host of contexts involving structural and reduced-form models of multivariate choice problems at micro-, meso-, and macroeconomic levels. The paper outlines the estimation approach, illustrates its suitability by simulation examples, and provides an application to study exporting and foreign ownership among potentially interdependent firms in the specialized and transport machinery sector in the province of Guangdong.
Keywords:	Network Models; Spatial Models; Higher-Order Network Interdependence; Multivariate Panel Probit; Bayesian Estimation; Firm-Level Data; Chinese Firms
JEL:	C11 C31 C35 F14 F23 L22 R10
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:max:cprwps:247&r=

Multi-cutoff RD designs with observations located at each cutoff: problems and solutions

By:	Margherita Fort (University of Bologna, FBK-IRVAPP, CEPR, CESifo and IZA); Andrea Ichino (European University Institute, University of Bologna, CEPR, CESifo and IZA); Enrico Rettore (University of Padova, FBK-IRVAPP and IZA); Giulio Zanella (University of Bologna and IZA)
Abstract:	In RD designs with multiple cutoffs, the identification of an average causal effect across cutoffs may be problematic if a marginally exposed subject is located exactly at each cutoff. This occurs whenever a fixed number of treatment slots is allocated starting from the subject with the highest (or lowest) value of the score, until exhaustion. Exploiting the â€œwithinâ€ variability at each cutoff is the safest and likely efficient option. Alternative strategies exist, but they do not always guarantee identification of a meaningful causal effect and are less precise. To illustrate our findings, we revisit the study of Pop-Eleches and Urquiola (2013).
Keywords:	Regression Discontinuity, multiple cutoffs, Normalizing-and-Pooling
JEL:	C01
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:pad:wpaper:0278&r=

Explaining Machine Learning by Bootstrapping Partial Dependence Functions and Shapley Values

By:	Thomas R. Cook; Greg Gupton; Zach Modig; Nathan M. Palmer
Abstract:	Machine learning and artificial intelligence methods are often referred to as “black boxes” when compared with traditional regression-based approaches. However, both traditional and machine learning methods are concerned with modeling the joint distribution between endogenous (target) and exogenous (input) variables. Where linear models describe the fitted relationship between the target and input variables via the slope of that relationship (coefficient estimates), the same fitted relationship can be described rigorously for any machine learning model by first-differencing the partial dependence functions. Bootstrapping these first-differenced functionals provides standard errors and confidence intervals for the estimated relationships. We show that this approach replicates the point estimates of OLS coefficients and demonstrate how this generalizes to marginal relationships in machine learning and artificial intelligence models. We further discuss the relationship of partial dependence functions to Shapley value decompositions and explore how they can be used to further explain model outputs.
Keywords:	Machine learning; Artificial intelligence; Explainable machine learning; Shapley values; Model interpretation
JEL:	C14 C15 C18
Date:	2021–11–15
URL:	http://d.repec.org/n?u=RePEc:fip:fedkrw:93596&r=

High-dimensional, multiscale online changepoint detection

By:	Chen, Yudong; Wang, Tengyao; Samworth, Richard J.
Abstract:	We introduce a new method for high-dimensional, online changepoint detection in settings where a p-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of different scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that both its storage requirements and worst-case computational complexity per new observation are independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal, which is implemented in the R package ocd, and we also demonstrate its utility on a seismology data set.
Keywords:	average run length; detection delay; high-dimensional changepoint detection; online algorithm; sequential method; grant EP/T02772X/1; EP/N031938/1; EP/P031447/1
JEL:	C1
Date:	2022–01–23
URL:	http://d.repec.org/n?u=RePEc:ehl:lserod:113665&r=

Convolutional regression for big spatial data

By:	Yasumasa Matsuda; Xin Yuan
Abstract:	Recently it is common to collect big spatial data on a national or continental scale at discrete time points. This paper aims at a regression model when both dependent and independent variables are big spatial data. Regarding spatial data as functions over a region, we propose a functional regression by a parametric convolution kernel together with the least squares estimation on the frequency domain by applying Fourier transform. It can handle massive datasets with asymptotic validations under the mixed asymptotics. The regression is applied to Covid-19 weekly new cases and human mobility collected in city levels all over Japan to find that an increase of human mobility is followed by an increase of Covid-19 new cases in time lag of two weeks.
Date:	2022–02
URL:	http://d.repec.org/n?u=RePEc:toh:dssraa:124&r=

The importance of supply and demand for oil prices: evidence from non-Gaussianity

By:	Braun, Robin (Bank of England)
Abstract:	When quantifying the importance of supply and demand for oil price fluctuations, a wide range of estimates have been reported. Models identified via a sharp upper bound on the short-run price elasticity of supply, find supply shocks to be minor drivers. In turn, when replacing the upper bound with a fairly uninformative prior, supply shocks turn out to be quite important. In this paper, I revisit the evidence with a model identified by a combination of weakly informative priors and non-Gaussianity. For this purpose, a structural vector autoregressive (SVAR) model is developed where the distributions of the structural shocks are modelled non-parametrically. The empirical findings indicate that once non-Gaussianity is incorporated into the model, posterior mass of the short-run oil supply elasticity shifts towards zero and oil supply shocks become minor drivers of oil prices. In terms of contributions to the forecast error variance of oil prices, the model arrives at median estimates of just 6% over a 16-month horizon.
Keywords:	Oil market; SVAR; identification by non-Gaussianity; non-parametric Bayesian methods
JEL:	C32 Q43
Date:	2021–12–17
URL:	http://d.repec.org/n?u=RePEc:boe:boeewp:0957&r=

A statistical foundation for the measurement of managerial ability

By:	Banker, Rajiv; Park, Han-Up; Sahoo, Biresh
Abstract:	Demerjian, Lev, and McVay (2012) (DLM) provide a conceptual framework for the measurement of managerial ability using data envelopment analysis (DEA). We show that the DLM method provides a consistent estimator of managerial ability. The DLM approach to measuring managerial ability begins with the first stage estimation of firm efficiency in transforming inputs into outputs. The second stage removes the impact of contextual variables on the firm efficiency so that the residuals measure the impact of unobserved managerial ability. We leverage the properties of the DEA estimator (Banker and Natarajan 2008) to show that the DLM approach provides a statistically consistent estimator of the managerial ability’s impact on firm efficiency.
Keywords:	DEA, Efficiency, Managerial Ability, Simulation
JEL:	C63 C67 D24
Date:	2022–01–08
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:111832&r=

A machine learning search for optimal GARCH parameters

By:	Luke De Clerk; Sergey Savl'ev
Abstract:	Here, we use Machine Learning (ML) algorithms to update and improve the efficiencies of fitting GARCH model parameters to empirical data. We employ an Artificial Neural Network (ANN) to predict the parameters of these models. We present a fitting algorithm for GARCH-normal(1,1) models to predict one of the model's parameters, $\alpha_1$ and then use the analytical expressions for the fourth order standardised moment, $\Gamma_4$ and the unconditional second order moment, $\sigma^2$ to fit the other two parameters; $\beta_1$ and $\alpha_0$, respectively. The speed of fitting of the parameters and quick implementation of this approach allows for real time tracking of GARCH parameters. We further show that different inputs to the ANN namely, higher order standardised moments and the autocovariance of time series can be used for fitting model parameters using the ANN, but not always with the same level of accuracy.
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2201.03286&r=

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.