
on Econometrics 
By:  Sokbae Lee; Yuan Liao; Myung Hwan Seo; Youngki Shin 
Abstract:  While applications of big data analytics have brought many new opportunities to economic research, with datasets containing millions of observations, making usual econometric inferences based on extreme estimators would require huge computing powers and memories that are often not accessible. In this paper, we focus on linear quantile regression employed to analyze "ultralarge" datasets such as U.S. decennial censuses. We develop a new inference framework that runs very fast, based on the stochastic subgradient descent (SsubGD) updates. The crosssectional data are treated sequentially into the inference procedure: (i) the parameter estimate is updated when each "new observation" arrives, (ii) it is aggregated as the PolyakRuppert average, and (iii) a pivotal statistic for inference is computed using a solution path only. We leverage insights from time series regression and construct an asymptotically pivotal statistic via random scaling. Our proposed test statistic is computed in a fully online fashion and the critical values are obtained without any resampling methods. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the number of regressors, our method can generate new insights beyond the computational capabilities of existing inference methods. Specifically, we uncover the trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over $10^3$ covariates to mitigate confounding effects. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.14502&r= 
By:  Miguel A. Delgado; Julius Vainora 
Abstract:  We propose a crossclassification rule for the dependent and explanatory variables resulting in a contingency table such that the classical trinity of chisquare statistics can be used to check for conditional distribution specification. The resulting Pearson statistic is equal to the Lagrange multiplier statistic. We also provide a ChernoffLehmann result for the Pearson statistic using the raw data maximum likelihood estimator, which is applied to show that the corresponding limiting distribution of the Wald statistic does not depend on the number of parameters. The asymptotic distribution of the proposed statistics does not change when the grouping is data dependent. An algorithm allowing to control the number of observations per cell is developed. Monte Carlo experiments provide evidence of the excellent size accuracy of the proposed tests and their good power performance, compared to omnibus tests, in high dimensions. 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.00624&r= 
By:  William C. Horrace (Center for Policy Research, Maxwell School, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244); Hyunseok Jung (Department of Economics, University of Arkansas, Fayetteville, AR 72701); Yi Yang (Amazon) 
Abstract:  We survey formulations of the conditional mode estimator for technical inefficiency in parametric stochastic frontier models with normal errors and introduce new formulations for models with Laplace errors. We prove the conditional mode estimator converges pointwise to the true inefficiency value as the noise variance goes to zero. We also prove that the conditional mode estimator in the normalexponential model achieves nearminimax optimality. Our minimax theorem implies that the worstcase risk occurs when many firms are nearly efficient, and the conditional mode estimator minimizes estimation risk in this case by estimating these small inefficiency firms as efficient. Unlike the conditional expectation estimator, the conditional mode estimator produces multiple firms with inefficiency estimates exactly equal to zero, suggesting a rule for selecting a subset of maximally efficient firms. Our simulation results show that this “zeromode subset” has reasonably high probability of containing the most efficient firm, particularly when inefficiency is exponentially distributed. The rule is easy to apply and interpret for practitioners. We include an empirical example demonstrating the merits of the conditional mode estimator. 
Keywords:  Stochastic Frontier Model, Efficiency Estimation, Laplace Distribution, Minimax Optimality, Ranking and Selection 
JEL:  C14 C23 D24 
Date:  2022–08 
URL:  http://d.repec.org/n?u=RePEc:max:cprwps:249&r= 
By:  Matias D. Cattaneo; Ricardo P. Masini; William G. Underwood 
Abstract:  Yurinskii's coupling is a popular tool for finitesample distributional approximation in mathematical statistics and applied probability, offering a Gaussian strong approximation for sums of random vectors under easily verified conditions with an explicit rate of approximation. Originally stated for sums of independent random vectors in $\ell^2$norm, it has recently been extended to the $\ell^p$norm, where $1 \leq p \leq \infty$, and to vectorvalued martingales in $\ell^2$norm under some rather strong conditions. We provide as our main result a generalization of all of the previous forms of Yurinskii's coupling, giving a Gaussian strong approximation for martingales in $\ell^p$norm under relatively weak conditions. We apply this result to some areas of statistical theory, including highdimensional martingale central limit theorems and uniform strong approximations for martingale empirical processes. Finally we give a few illustrative examples in statistical methodology, applying our results to partitioningbased series estimators for nonparametric regression, distributional approximation of $\ell^p$norms of highdimensional martingales, and local polynomial regression estimators. We address issues of feasibility, demonstrating implementable statistical inference procedures in each section. 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.00362&r= 
By:  Stefan Faridani; Paul Niehaus 
Abstract:  We study the problem of estimating the average causal effect of treating every member of a population, as opposed to none, using an experiment that treats only some. This is the policyrelevant estimand when (for example) deciding whether to scale up an intervention based on the results of an RCT. But it differs from the usual average treatment effect in the presence of spillovers. We study both estimation and experimental design given a bound (parametrized by $\eta$) on the rate of decay of spillovers between units that are "distant" in a general sense, encompassing (for example) spatial settings, panel data, and some directed networks. We show that over all estimators linear in the outcomes and all clusterrandomized designs the optimal rate of convergence to the average global effect is $n^{\frac{1}{2+\frac{1}{\eta}}}$, and provide a generalized "Scaling Clusters" design under which this rate can be achieved. Both of these results are unchanged under the additional assumption (common in applied work) that potential outcomes are linear in population treatment assignments and the estimator is OLS. We also provide methods to improve finitesample performance, including a shrinkage estimator that takes advantage of additional information about the structure of the spillovers when linearity holds, and an optimized weighting approach when it does not. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.14181&r= 
By:  Matthew Read (Reserve Bank of Australia) 
Abstract:  Structural vector autoregressions that are set identified (e.g. with sign restrictions) are typically used to analyse the effects of standard deviation shocks. However, answering questions of economic interest often requires knowing the effects of a 'unit' shock. For example, central bankers want to answer questions like 'what are the effects of a 100 basis point increase in the policy rate?' The problem is that setidentifying restrictions do not always rule out the possibility that a variable does not react contemporaneously to its own shock. As a consequence, identified sets for the impulse responses to unit shocks may be unbounded, which implies that setidentifying restrictions may be extremely uninformative. Simply assuming that responses are nonzero turns out to be an arbitrary and unsatisfactory solution. I argue that it is therefore important to communicate about the extent to which the identified set may be unbounded, since this tells us about the informativeness of the identifying restrictions, and I develop tools to facilitate this. I explain how to draw useful posterior inferences about impulse responses when identified sets are unbounded with positive probability. I illustrate the empirical relevance of these issues by estimating the response of US output to a 100 basis point federal funds rate shock under different sets of identifying restrictions. Some restrictions are very uninformative about the effects of a 100 basis point shock. The output responses I obtain under a rich set of identifying restrictions lie towards the smaller end of the range of existing estimates. 
Keywords:  Bayesian inference; impulse responses; monetary policy; setidentified models; sign restrictions; zero restrictions 
JEL:  C32 E52 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:rba:rbardp:rdp202204&r= 
By:  Marina Khismatullina; Michael Vogt 
Abstract:  We develop new econometric methods for the comparison of nonparametric time trends. In many applications, practitioners are interested in whether the observed time series all have the same time trend. Moreover, they would often like to know which trends are different and in which time intervals they differ. We design a multiscale test to formally approach these questions. Specifically, we develop a test which allows to make rigorous confidence statements about which time trends are different and where (that is, in which time intervals) they differ. Based on our multiscale test, we further develop a clustering algorithm which allows to cluster the observed time series into groups with the same trend. We derive asymptotic theory for our test and clustering methods. The theory is complemented by a simulation study and two applications to GDP growth data and house pricing data. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.10841&r= 
By:  Yingyao Hu; Yang Liu; Jiaxiong Yao 
Abstract:  Latent variable models are crucial in scientific research, where a key variable, such as effort, ability, and belief, is unobserved in the sample but needs to be identified. This paper proposes a novel method for estimating realizations of a latent variable $X^*$ in a random sample that contains its multiple measurements. With the key assumption that the measurements are independent conditional on $X^*$, we provide sufficient conditions under which realizations of $X^*$ in the sample are locally unique in a class of deviations, which allows us to identify realizations of $X^*$. To the best of our knowledge, this paper is the first to provide such identification in observation. We then use the KullbackLeibler distance between the two probability densities with and without the conditional independence as the loss function to train a Generative Element Extraction Networks (GEEN) that maps from the observed measurements to realizations of $X^*$ in the sample. The simulation results imply that this proposed estimator works quite well and the estimated values are highly correlated with realizations of $X^*$. Our estimator can be applied to a large class of latent variable models and we expect it will change how people deal with latent variables. 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.01300&r= 
By:  Niko Hauzenberger; Florian Huber; Gary Koop; James Mitchell 
Abstract:  In light of widespread evidence of parameter instability in macroeconomic models, many timevarying parameter (TVP) models have been proposed. This paper proposes a nonparametric TVPVAR model using Bayesian Additive Regression Trees (BART). The novelty of this model arises from the law of motion driving the parameters being treated nonparametrically. This leads to great flexibility in the nature and extent of parameter change, both in the conditional mean and in the conditional variance. In contrast to other nonparametric and machine learning methods that are black box, inference using our model is straightforward because, in treating the parameters rather than the variables nonparametrically, the model remains conditionally linear in the mean. Parsimony is achieved through adopting nonparametric factor structures and use of shrinkage priors. In an application to US macroeconomic data, we illustrate the use of our model in tracking both the evolving nature of the Phillips curve and how the effects of business cycle shocks on inflationary measures vary nonlinearly with movements in uncertainty. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.11970&r= 
By:  Hugo Freeman 
Abstract:  This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixedeffects when estimating coefficients on the observed covariates. First, the model is embedded within the standard twodimensional panel framework and restrictions are derived under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters. The second approach considers group fixedeffects and kernel methods that are more robust to the multidimensional nature of the problem. Theoretical results and simulations show the benefit of standard twodimensional panel methods when the structure of the interactive fixedeffect term is known, but also highlight how the group fixedeffects and kernel methods perform well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer under a handful of models for demand. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.11691&r= 
By:  Siliang Zeng; Mingyi Hong; Alfredo Garcia 
Abstract:  We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nestedloop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a singleloop estimation algorithm with finite time guarantees that is equipped to deal with highdimensional state spaces without compromising reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm converges to a stationary solution with a finitetime guarantee. Further, if the reward is parameterized linearly, we show that the algorithm approximates the maximum likelihood estimator sublinearly. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks. 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.01282&r= 
By:  Canavire Bacarreza,Gustavo Javier; Rios Avila,Fernando; Sacco Capurro,Flavia Giannina 
Abstract:  This paper proposes a method to analyze intervalcensored data, using multiple imputationbased on a heteroskedastic interval regression approach. The proposed model aims to obtain a synthetic data set that canbe used for standard analysis, including standard linear regression, quantile regression, or poverty and inequalityestimation. The paper presents two applications to show the performance of the method. First, it runs a Monte Carlosimulation to show the method's performance under the assumption of multiplicative heteroskedasticity, with andwithout conditional normality. Second, it uses the proposed methodology to analyze labor income data in Grenada for2013–20, where the salary data are intervalcensored according to the salary intervals prespecified in the surveyquestionnaire. The results obtained are consistent across both exercises. 
Date:  2022–08–22 
URL:  http://d.repec.org/n?u=RePEc:wbk:wbrwps:10147&r= 
By:  Marzio Di Vece; Diego Garlaschelli; Tiziano Squartini 
Abstract:  In the study of economic networks, econometric approaches interpret the traditional Gravity Model specification as the expected link weight coming from a probability distribution whose functional form can be chosen arbitrarily, while statisticalphysics approaches construct maximumentropy distributions of weighted graphs, constrained to satisfy a given set of measurable network properties. In a recent, companion paper, we integrated the two approaches and applied them to the World Trade Web, i.e. the network of international trade among world countries. While the companion paper dealt only with discretevalued link weights, the present paper extends the theoretical framework to continuousvalued link weights. In particular, we construct two broad classes of maximumentropy models, namely the integrated and the conditional ones, defined by different criteria to derive and combine the probabilistic rules for placing links and loading them with weights. In the integrated models, both rules follow from a single, constrained optimization of the continuous KullbackLeibler divergence; in the conditional models, the two rules are disentangled and the functional form of the weight distribution follows from a conditional, optimization procedure. After deriving the general functional form of the two classes, we turn each of them into a proper family of econometric models via a suitable identification of the econometric function relating the corresponding, expected link weights to macroeconomic factors. After testing the two classes of models on World Trade Web data, we discuss their strengths and weaknesses. 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.01179&r= 
By:  Cengiz, Doruk; Tekgüç, Hasan 
Abstract:  We extend the scope of the forecast reconciliation literature and use its tools in the context of causal inference. Researchers are interested in both the average treatment effect on the treated and treatment effect heterogeneity. We show that ex post correction of the counterfactual estimates using the aggregation constraints that stem from the hierarchical or grouped structure of the data is likely to yield more accurate estimates. Building on the geometric interpretation of forecast reconciliation, we provide additional insights into the exact factors determining the size of the accuracy improvement due to the reconciliation. We experiment with U.S. GDP and employment data. We find that the reconciled treatment effect estimates tend to be closer to the truth than the original (base) counterfactual estimates even in cases where the aggregation constraints are nonlinear. Consistent with our theoretical expectations, improvement is greater when machine learning methods are used. 
Keywords:  Forecast Reconciliation; Nonlinear Constraints; Causal Machine Learning Methods; Counterfactual Estimation; DifferenceinDifferences 
JEL:  C53 
Date:  2022–06 
URL:  http://d.repec.org/n?u=RePEc:pra:mprapa:114478&r= 
By:  Spark C. Tseung; Ian Weng Chan; Tsz Chai Fung; Andrei L. Badescu; X. Sheldon Lin 
Abstract:  A welldesigned framework for risk classification and ratemaking in automobile insurance is key to insurers' profitability and risk management, while also ensuring that policyholders are charged a fair premium according to their risk profile. In this paper, we propose to adapt a flexible regression model, called the Mixed LRMoE, to the problem of a posteriori risk classification and ratemaking, where policyholderlevel random effects are incorporated to better infer their risk profile reflected by the claim history. We also develop a stochastic variational ExpectationConditionalMaximization algorithm for estimating model parameters and inferring the posterior distribution of random effects, which is numerically efficient and scalable to large insurance portfolios. We then apply the Mixed LRMoE model to a real, multiyear automobile insurance dataset, where the proposed framework is shown to offer better fit to data and produce posterior premium which accurately reflects policyholders' claim history. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.15212&r= 
By:  Kozubowski, Tomasz J. (University of Nevada); Mazur, Stepan (Örebro University School of Business); Podgórski, Krzysztof (Lund University) 
Abstract:  There is considerable literature on matrixvariate gamma distributions, also known as Wishart distributions, which are driven by a shape parameter with values in the (Gindikin) set. We provide an extension of this class to the case where the shape parameter may actually take on any positive value. In addition to the wellknown singular Wishart as well as nonsingular matrixvariate gamma distributions, the proposed class includes new singular matrixvariate distributions, with the shape parameter outside of the Gindikin set. This singular, nonWishart case is no longer permutation invariant and derivation of its scaling properties requires special care. Among numerous newly established properties of the extended class are grouplike relations with respect to the positive shape parameter. The latter provide a natural substitute for the classical convolution properties that are crucial in the study of infinite divisibility. Our results provide further clarification regarding the lack of infinite divisibility of Wishart distributions, a classical observation of Paul Lévy. In particular, we clarify why the row/column vectors in the offdiagonal blocks are infinitely divisible. A class of matrixvariate Laplace distributions arises naturally in this setup as the distributions of the offdiagonal blocks of random gamma matrices. For the class of Laplace rectangular matrices, we obtain distributional identities that follow from the role they play in the structure of the matrix gamma distributions. We present several elegant and convenient stochastic representations of the discussed classes of matrixvalued distributions. In particular, we show that the matrixvariate gamma distribution is a symmetrization of the triangular Rayleigh distributed matrix { a new class of the matrix variables that naturally extend the classical univariate Rayleigh variables. Finally, a connection of the matrixvariate gamma distributions to matrixvalued Lévy processes of a vector argument is made. Namely, a Lévy process, termed a matrix gamma Laplace motion, is obtained by the subordination of the triangular Brownian motion of a vector argument to a vectorvalued gamma motion of a vector argument. In this context, we introduce a triangular matrixvalued Rayleigh process, which, through symmetrization, leads to a new matrixvariate gamma process. This process when taken at a properly defined onedimensional argument has the matrix gamma marginal distribution with the shape parameter equal to its argument. 
Keywords:  Random matrices; singular random matrices; distribution theory; matrixvariate gamma distribution; Wishart distribution; matrixvariate Laplace distribution; infinitely divisible and stable distributions; matrixvalued Levy processes; triangular matrixvalued Rayleigh process; matrixvariate gamma process; characterization and structure for multivariate probability distributions 
JEL:  C10 C30 C46 
Date:  2022–10–14 
URL:  http://d.repec.org/n?u=RePEc:hhs:oruesi:2022_012&r= 
By:  Caio Almeida (Princeton University); Gustavo Freire (Erasmus School of Economics); Rafael Azevedo (Getulio Vargas Foundation (FGV)); Kym Ardison (Getulio Vargas Foundation (FGV)) 
Abstract:  We propose a family of nonparametric estimators for an option price that require only the use of underlying return data, but can also easily incorporate information from observed option prices. Each estimator comes from a riskneutral measure minimizing generalized entropy according to a different CressieRead discrepancy. We apply our method to price S&P 500 options and the crosssection of individual equity options, using distinct amounts of option data in the estimation. Estimators incorporating mild nonlinearities produce optimal pricing accuracy within the CressieRead family and outperform several benchmarks such as the BlackScholes and different GARCH option pricing models. Overall, we provide a powerful option pricing technique suitable for scenarios of limited option data availability. 
Keywords:  RiskNeutral Measure, Option Pricing, Nonparametric Estimation, Generalized Entropy, CressieRead Discrepancies 
JEL:  C14 C58 G13 
Date:  2022–05 
URL:  http://d.repec.org/n?u=RePEc:pri:econom:202225&r= 
By:  Toshiki Tsuda 
Abstract:  Heckman et al. (2008) examine the identification of the marginal treatment effect (MTE) with multivalued treatments by extending the local instrumental variable (LIV) approach of Heckman and Vytlacil (1999). Lee and Salani\'e (2018) study the identification of conditional expectations given unobserved heterogeneity; in Section 5.2 of their paper, they analyze the identification of MTE under the same selection mechanism as in Heckman et al. (2008). We note that the construction of their model in Section 5.2 in Lee and Salani\'e (2018) is incomplete, and we establish sufficient conditions for the identification of MTE with an improved model. While we reduce the unordered multiplechoice model to the binary treatment setting as in Heckman et al. (2008), we can identify the MTE defined as a natural extension of the MTE using the binary treatment defined in Heckman and Vytlacil (2005). Further, our results can help identify other parameters such as the marginal distribution of potential outcomes. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.11444&r= 
By:  Soren Blomquist; Anil Kumar; CheYuan Liang; Whitney K. Newey 
Abstract:  This paper is about the nonparametric regression of a choice variable on a nonlinear budget set when there is general heterogeneity, i.e., in the random utility model (RUM). We show that utility maximization makes this a threedimensional regression with piecewise linear, convex budget sets with a more parsimonious specification than previously derived. We show that the regression allows for measurement and/or optimization errors in the outcome variable. We characterize all of the restrictions of utility maximization on the budget set regression and show how to check these restrictions. We formulate nonlinear budget set effects that can be identified by this regression and give automatic debiased machine learners of these effects. We find that in practice nonconvexities in the budget set have little effect on these estimates. We use control variables to allow for endogeneity of budget sets and adjust for productivity growth in taxable income. We apply the results to estimate .52 as the elasticity of an overall tax rate change in Sweden. We also find that the restrictions of utility maximization are satisfied at the choices made by nearly all individuals in the data. 
Keywords:  nonlinear budget sets; nonparametric estimation; heterogeneous preferences; taxable income; revealed stochastic preference 
JEL:  C14 C24 H31 J22 
Date:  2022–09–28 
URL:  http://d.repec.org/n?u=RePEc:fip:feddwp:94867&r= 
By:  Yuehao Bai; Meng Hsuan Hsieh; Jizhou Liu; Max TabordMeehan 
Abstract:  In this paper we revisit some common recommendations regarding the analysis of matchedpair and stratified experimental designs in the presence of attrition. Our main objective is to clarify a number of wellknown claims about the practice of dropping pairs with an attrited unit when analyzing matchedpair designs. Contradictory advice appears in the literature about whether or not dropping pairs is beneficial or harmful, and stratifying into larger groups has been recommended as a resolution to the issue. To address these claims, we derive the estimands obtained from the differenceinmeans estimator in a matchedpair design both when the observations from pairs with an attrited unit are retained and when they are dropped. We find limited evidence to support the claims that dropping pairs is beneficial, other than in potentially helping recover a convex weighted average of conditional average treatment effects. We then repeat the same exercise for stratified designs by studying the estimands obtained from a regression of outcomes on treatment with and without strata fixed effects. We do not find compelling evidence to support the claims that stratified designs should be preferred to matchedpair designs in the presence of attrition. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.11840&r= 
By:  Matthieu Stigler; Apratim Dey; Andrew Hobbs; David Lobell 
Abstract:  New satellite sensors will soon make it possible to estimate fieldlevel crop yields, showing a great potential for agricultural index insurance. This paper identifies an important threat to better insurance from these new technologies: data with many fields and few years can yield downward biased estimates of basis risk, a fundamental metric in index insurance. To demonstrate this bias, we use stateoftheart satellitebased data on agricultural yields in the US and in Kenya to estimate and simulate basis risk. We find a substantive downward bias leading to a systematic overestimation of insurance quality. In this paper, we argue that big data in crop insurance can lead to a new situation where the number of variables $N$ largely exceeds the number of observations $T$. In such a situation where $T\ll N$, conventional asymptotics break, as evidenced by the large bias we find in simulations. We show how the highdimension, lowsamplesize (HDLSS) asymptotics, together with the spiked covariance model, provide a more relevant framework for the $T\ll N$ case encountered in index insurance. More precisely, we derive the asymptotic distribution of the relative share of the first eigenvalue of the covariance matrix, a measure of systematic risk in index insurance. Our formula accurately approximates the empirical bias simulated from the satellite data, and provides a useful tool for practitioners to quantify bias in insurance quality. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.14611&r= 
By:  Scutari, Marco; Panero, Francesca; Proissl, Manuel 
Abstract:  In this paper, we present a general framework for estimating regression models subject to a userdefined level of fairness. We enforce fairness as a model selection step in which we choose the value of a ridge penalty to control the effect of sensitive attributes. We then estimate the parameters of the model conditional on the chosen penalty value. Our proposal is mathematically simple, with a solution that is partly in closed form and produces estimates of the regression coefficients that are intuitive to interpret as a function of the level of fairness. Furthermore, it is easily extended to generalised linear models, kernelised regression models and other penalties, and it can accommodate multiple definitions of fairness. We compare our approach with the regression model from Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018), which implements a provably optimal linear regression model and with the fair models from Zafar et al. (J Mach Learn Res 20:1–42, 2019). We evaluate these approaches empirically on six different data sets, and we find that our proposal provides better goodness of fit and better predictive accuracy for the same level of fairness. In addition, we highlight a source of bias in the original experimental evaluation in Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018). 
Keywords:  fairness; generalised linear models; linear regression; logistic regression; ridge regression; EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1); EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1) 
JEL:  C1 
Date:  2022–09–18 
URL:  http://d.repec.org/n?u=RePEc:ehl:lserod:116916&r= 
By:  Andrew Y. Chen; Tom Zimmermann 
Abstract:  Researchers are more likely to share notable findings. As a result, published findings tend to overstate the magnitude of realworld phenomena. This bias is a natural concern for asset pricing research, which has found hundreds of return predictors and little consensus on their origins. Empirical evidence on publication bias comes from large scale metastudies. Metastudies of crosssectional return predictability have settled on four stylized facts that demonstrate publication bias is not a dominant factor: (1) almost all findings can be replicated, (2) predictability persists outofsample, (3) empirical $t$statistics are much larger than 2.0, and (4) predictors are weakly correlated. Each of these facts has been demonstrated in at least three metastudies. Empirical Bayes statistics turn these facts into publication bias corrections. Estimates from three metastudies find that the average correction (shrinkage) accounts for only 10 to 15 percent of insample mean returns and that the risk of inference going in the wrong direction (the false discovery rate) is less than 6%. Metastudies also find that tstatistic hurdles exceed 3.0 in multiple testing algorithms and that returns are 30 to 50 percent weaker in alternative portfolio tests. These facts are easily misinterpreted as evidence of publication bias effects. We clarify these misinterpretations and others, including the conflating of "mostly false findings" with "many insignificant findings," "data snooping" with "liquidity effects," and "failed replications" with "insignificant adhoc trading strategies." Metastudies outside of the crosssectional literature are rare. The four facts from crosssectional metastudies provide a framework for future research. We illustrate with a preliminary reexamination of equity premium predictability. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.13623&r= 
By:  Håvard Hungnes; Terje Skjerpen; Jørn Ivar Hamre; Xiaoming Chen Jansen; Dinh Quang Pham; Ole Sandvik (Statistics Norway) 
Abstract:  The labour force surveys (LFSs) on all Eurostat countries underwent a substantial redesign in January 2021. To ensure coherent labour market time series for the main indicators in the Norwegian LFS, we model the impact of the redesign. We use a statespace model that takes explicit account of the rotating pattern of the LFS. We also include auxiliary variables related to employment and unemployment that are highly correlated with the LFS variables we consider. The results of a parallel run are also included in the model. This paper makes two contributions to the literature on the effects of LFS redesign. First, we suggest a symmetric specification of the process of the wavespecific effects. Second, we account for substantial fluctuations in the labour force estimates during the Covid19 pandemic by applying timevarying hyperparameters. Likelihoodratio tests and examination of the auxiliary residuals show the latter to be warranted. 
Keywords:  Statespace models; Auxiliary information; Labour market domains; Level shifts; Covid19; Norway 
JEL:  C32 C51 C83 J21 
Date:  2022–08 
URL:  http://d.repec.org/n?u=RePEc:ssb:dispap:987&r= 
By:  Carsten Chong; Marc Hoffmann; Yanghui Liu; Mathieu Rosenbaum; Gr\'egoire Szymanski 
Abstract:  In recent years, there has been substantive empirical evidence that stochastic volatility is rough. In other words, the local behavior of stochastic volatility is much more irregular than semimartingales and resembles that of a fractional Brownian motion with Hurst parameter $H 
Date:  2022–10 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2210.01216&r= 
By:  Luofeng Liao; Yuan Gao; Christian Kroer 
Abstract:  Statistical inference under market equilibrium effects has attracted increasing attention recently. In this paper we focus on the specific case of linear Fisher markets. They have been widely use in fair resource allocation of food/blood donations and budget management in largescale Internet ad auctions. In resource allocation, it is crucial to quantify the variability of the resource received by the agents (such as blood banks and food banks) in addition to fairness and efficiency properties of the systems. For ad auction markets, it is important to establish statistical properties of the platform's revenues in addition to their expected values. To this end, we propose a statistical framework based on the concept of infinitedimensional Fisher markets. In our framework, we observe a market formed by a finite number of items sampled from an underlying distribution (the "observed market") and aim to infer several important equilibrium quantities of the underlying longrun market. These equilibrium quantities include individual utilities, social welfare, and pacing multipliers. Through the lens of sample average approximation (SSA), we derive a collection of statistical results and show that the observed market provides useful statistical information of the longrun market. In other words, the equilibrium quantities of the observed market converge to the true ones of the longrun market with strong statistical guarantees. These include consistency, finite sample bounds, asymptotics, and confidence. As an extension, we discuss revenue inference in quasilinear Fisher markets. 
Date:  2022–09 
URL:  http://d.repec.org/n?u=RePEc:arx:papers:2209.15422&r= 