nep-ecm New Economics Papers
on Econometrics
Issue of 2022‒10‒31
twenty-six papers chosen by
Sune Karlsson
Örebro universitet

  1. Fast Inference for Quantile Regression with Millions of Observations By Sokbae Lee; Yuan Liao; Myung Hwan Seo; Youngki Shin
  2. Chi-Square Goodness-of-Fit Tests for Conditional Distributions By Miguel A. Delgado; Julius Vainora
  3. The Conditional Mode in Parametric Frontier Models By William C. Horrace; Hyunseok Jung; Yi Yang
  4. Yurinskii's Coupling for Martingales By Matias D. Cattaneo; Ricardo P. Masini; William G. Underwood
  5. Rate-optimal linear estimation of average global effects By Stefan Faridani; Paul Niehaus
  6. The Unit-effect Normalisation in Set-identified Structural Vector Autoregressions By Matthew Read
  7. Multiscale Comparison of Nonparametric Trend Curves By Marina Khismatullina; Michael Vogt
  8. Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN) By Yingyao Hu; Yang Liu; Jiaxiong Yao
  9. Bayesian Modeling of Time-varying Parameters Using Regression Trees By Niko Hauzenberger; Florian Huber; Gary Koop; James Mitchell
  10. Multidimensional Interactive Fixed-Effects By Hugo Freeman
  11. Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees By Siliang Zeng; Mingyi Hong; Alfredo Garcia
  12. Recovering Income Distribution in the Presence of Interval-Censored Data By Canavire Bacarreza,Gustavo Javier; Rios Avila,Fernando; Sacco Capurro,Flavia Giannina
  13. Reconciling econometrics with continuous maximum-entropy models By Marzio Di Vece; Diego Garlaschelli; Tiziano Squartini
  14. Counterfactual Reconciliation: Incorporating Aggregation Constraints For More Accurate Causal Effect Estimates By Cengiz, Doruk; Tekgüç, Hasan
  15. A Posteriori Risk Classification and Ratemaking with Random Effects in the Mixture-of-Experts Model By Spark C. Tseung; Ian Weng Chan; Tsz Chai Fung; Andrei L. Badescu; X. Sheldon Lin
  16. Matrix Gamma Distributions and Related Stochastic Processes By Kozubowski, Tomasz J.; Mazur, Stepan; Podgórski, Krzysztof
  17. Nonparametric Option Pricing with Generalized Entropic Estimators By Caio Almeida; Gustavo Freire; Rafael Azevedo; Kym Ardison
  18. Identification of the Marginal Treatment Effect with Multivalued Treatments By Toshiki Tsuda
  19. Nonlinear Budget Set Regressions for the Random Utility Model By Soren Blomquist; Anil Kumar; Che-Yuan Liang; Whitney K. Newey
  20. Revisiting the Analysis of Matched-Pair and Stratified Experiments in the Presence of Attrition By Yuehao Bai; Meng Hsuan Hsieh; Jizhou Liu; Max Tabord-Meehan
  21. With big data come big problems: pitfalls in measuring basis risk for crop index insurance By Matthieu Stigler; Apratim Dey; Andrew Hobbs; David Lobell
  22. Achieving fairness with a simple ridge penalty By Scutari, Marco; Panero, Francesca; Proissl, Manuel
  23. Publication Bias in Asset Pricing Research By Andrew Y. Chen; Tom Zimmermann
  24. Structural break in the Norwegian LFS due to the 2021 redesign By Håvard Hungnes; Terje Skjerpen; Jørn Ivar Hamre; Xiaoming Chen Jansen; Dinh Quang Pham; Ole Sandvik
  25. Statistical inference for rough volatility: Central limit theorems By Carsten Chong; Marc Hoffmann; Yanghui Liu; Mathieu Rosenbaum; Gr\'egoire Szymanski
  26. Statistical Inference for Fisher Market Equilibrium By Luofeng Liao; Yuan Gao; Christian Kroer

  1. By: Sokbae Lee; Yuan Liao; Myung Hwan Seo; Youngki Shin
    Abstract: While applications of big data analytics have brought many new opportunities to economic research, with datasets containing millions of observations, making usual econometric inferences based on extreme estimators would require huge computing powers and memories that are often not accessible. In this paper, we focus on linear quantile regression employed to analyze "ultra-large" datasets such as U.S. decennial censuses. We develop a new inference framework that runs very fast, based on the stochastic sub-gradient descent (S-subGD) updates. The cross-sectional data are treated sequentially into the inference procedure: (i) the parameter estimate is updated when each "new observation" arrives, (ii) it is aggregated as the Polyak-Ruppert average, and (iii) a pivotal statistic for inference is computed using a solution path only. We leverage insights from time series regression and construct an asymptotically pivotal statistic via random scaling. Our proposed test statistic is computed in a fully online fashion and the critical values are obtained without any resampling methods. We conduct extensive numerical studies to showcase the computational merits of our proposed inference. For inference problems as large as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the number of regressors, our method can generate new insights beyond the computational capabilities of existing inference methods. Specifically, we uncover the trends in the gender gap in the U.S. college wage premium using millions of observations, while controlling over $10^3$ covariates to mitigate confounding effects.
    Date: 2022–09
  2. By: Miguel A. Delgado; Julius Vainora
    Abstract: We propose a cross-classification rule for the dependent and explanatory variables resulting in a contingency table such that the classical trinity of chi-square statistics can be used to check for conditional distribution specification. The resulting Pearson statistic is equal to the Lagrange multiplier statistic. We also provide a Chernoff-Lehmann result for the Pearson statistic using the raw data maximum likelihood estimator, which is applied to show that the corresponding limiting distribution of the Wald statistic does not depend on the number of parameters. The asymptotic distribution of the proposed statistics does not change when the grouping is data dependent. An algorithm allowing to control the number of observations per cell is developed. Monte Carlo experiments provide evidence of the excellent size accuracy of the proposed tests and their good power performance, compared to omnibus tests, in high dimensions.
    Date: 2022–10
  3. By: William C. Horrace (Center for Policy Research, Maxwell School, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244); Hyunseok Jung (Department of Economics, University of Arkansas, Fayetteville, AR 72701); Yi Yang (Amazon)
    Abstract: We survey formulations of the conditional mode estimator for technical inefficiency in parametric stochastic frontier models with normal errors and introduce new formulations for models with Laplace errors. We prove the conditional mode estimator converges pointwise to the true inefficiency value as the noise variance goes to zero. We also prove that the conditional mode estimator in the normal-exponential model achieves near-minimax optimality. Our minimax theorem implies that the worst-case risk occurs when many firms are nearly efficient, and the conditional mode estimator minimizes estimation risk in this case by estimating these small inefficiency firms as efficient. Unlike the conditional expectation estimator, the conditional mode estimator produces multiple firms with inefficiency estimates exactly equal to zero, suggesting a rule for selecting a subset of maximally efficient firms. Our simulation results show that this “zero-mode subset” has reasonably high probability of containing the most efficient firm, particularly when inefficiency is exponentially distributed. The rule is easy to apply and interpret for practitioners. We include an empirical example demonstrating the merits of the conditional mode estimator.
    Keywords: Stochastic Frontier Model, Efficiency Estimation, Laplace Distribution, Minimax Optimality, Ranking and Selection
    JEL: C14 C23 D24
    Date: 2022–08
  4. By: Matias D. Cattaneo; Ricardo P. Masini; William G. Underwood
    Abstract: Yurinskii's coupling is a popular tool for finite-sample distributional approximation in mathematical statistics and applied probability, offering a Gaussian strong approximation for sums of random vectors under easily verified conditions with an explicit rate of approximation. Originally stated for sums of independent random vectors in $\ell^2$-norm, it has recently been extended to the $\ell^p$-norm, where $1 \leq p \leq \infty$, and to vector-valued martingales in $\ell^2$-norm under some rather strong conditions. We provide as our main result a generalization of all of the previous forms of Yurinskii's coupling, giving a Gaussian strong approximation for martingales in $\ell^p$-norm under relatively weak conditions. We apply this result to some areas of statistical theory, including high-dimensional martingale central limit theorems and uniform strong approximations for martingale empirical processes. Finally we give a few illustrative examples in statistical methodology, applying our results to partitioning-based series estimators for nonparametric regression, distributional approximation of $\ell^p$-norms of high-dimensional martingales, and local polynomial regression estimators. We address issues of feasibility, demonstrating implementable statistical inference procedures in each section.
    Date: 2022–10
  5. By: Stefan Faridani; Paul Niehaus
    Abstract: We study the problem of estimating the average causal effect of treating every member of a population, as opposed to none, using an experiment that treats only some. This is the policy-relevant estimand when (for example) deciding whether to scale up an intervention based on the results of an RCT. But it differs from the usual average treatment effect in the presence of spillovers. We study both estimation and experimental design given a bound (parametrized by $\eta$) on the rate of decay of spillovers between units that are "distant" in a general sense, encompassing (for example) spatial settings, panel data, and some directed networks. We show that over all estimators linear in the outcomes and all cluster-randomized designs the optimal rate of convergence to the average global effect is $n^{-\frac{1}{2+\frac{1}{\eta}}}$, and provide a generalized "Scaling Clusters" design under which this rate can be achieved. Both of these results are unchanged under the additional assumption (common in applied work) that potential outcomes are linear in population treatment assignments and the estimator is OLS. We also provide methods to improve finite-sample performance, including a shrinkage estimator that takes advantage of additional information about the structure of the spillovers when linearity holds, and an optimized weighting approach when it does not.
    Date: 2022–09
  6. By: Matthew Read (Reserve Bank of Australia)
    Abstract: Structural vector autoregressions that are set identified (e.g. with sign restrictions) are typically used to analyse the effects of standard deviation shocks. However, answering questions of economic interest often requires knowing the effects of a 'unit' shock. For example, central bankers want to answer questions like 'what are the effects of a 100 basis point increase in the policy rate?' The problem is that set-identifying restrictions do not always rule out the possibility that a variable does not react contemporaneously to its own shock. As a consequence, identified sets for the impulse responses to unit shocks may be unbounded, which implies that set-identifying restrictions may be extremely uninformative. Simply assuming that responses are non-zero turns out to be an arbitrary and unsatisfactory solution. I argue that it is therefore important to communicate about the extent to which the identified set may be unbounded, since this tells us about the informativeness of the identifying restrictions, and I develop tools to facilitate this. I explain how to draw useful posterior inferences about impulse responses when identified sets are unbounded with positive probability. I illustrate the empirical relevance of these issues by estimating the response of US output to a 100 basis point federal funds rate shock under different sets of identifying restrictions. Some restrictions are very uninformative about the effects of a 100 basis point shock. The output responses I obtain under a rich set of identifying restrictions lie towards the smaller end of the range of existing estimates.
    Keywords: Bayesian inference; impulse responses; monetary policy; set-identified models; sign restrictions; zero restrictions
    JEL: C32 E52
    Date: 2022–10
  7. By: Marina Khismatullina; Michael Vogt
    Abstract: We develop new econometric methods for the comparison of nonparametric time trends. In many applications, practitioners are interested in whether the observed time series all have the same time trend. Moreover, they would often like to know which trends are different and in which time intervals they differ. We design a multiscale test to formally approach these questions. Specifically, we develop a test which allows to make rigorous confidence statements about which time trends are different and where (that is, in which time intervals) they differ. Based on our multiscale test, we further develop a clustering algorithm which allows to cluster the observed time series into groups with the same trend. We derive asymptotic theory for our test and clustering methods. The theory is complemented by a simulation study and two applications to GDP growth data and house pricing data.
    Date: 2022–09
  8. By: Yingyao Hu; Yang Liu; Jiaxiong Yao
    Abstract: Latent variable models are crucial in scientific research, where a key variable, such as effort, ability, and belief, is unobserved in the sample but needs to be identified. This paper proposes a novel method for estimating realizations of a latent variable $X^*$ in a random sample that contains its multiple measurements. With the key assumption that the measurements are independent conditional on $X^*$, we provide sufficient conditions under which realizations of $X^*$ in the sample are locally unique in a class of deviations, which allows us to identify realizations of $X^*$. To the best of our knowledge, this paper is the first to provide such identification in observation. We then use the Kullback-Leibler distance between the two probability densities with and without the conditional independence as the loss function to train a Generative Element Extraction Networks (GEEN) that maps from the observed measurements to realizations of $X^*$ in the sample. The simulation results imply that this proposed estimator works quite well and the estimated values are highly correlated with realizations of $X^*$. Our estimator can be applied to a large class of latent variable models and we expect it will change how people deal with latent variables.
    Date: 2022–10
  9. By: Niko Hauzenberger; Florian Huber; Gary Koop; James Mitchell
    Abstract: In light of widespread evidence of parameter instability in macroeconomic models, many time-varying parameter (TVP) models have been proposed. This paper proposes a nonparametric TVP-VAR model using Bayesian Additive Regression Trees (BART). The novelty of this model arises from the law of motion driving the parameters being treated nonparametrically. This leads to great flexibility in the nature and extent of parameter change, both in the conditional mean and in the conditional variance. In contrast to other nonparametric and machine learning methods that are black box, inference using our model is straightforward because, in treating the parameters rather than the variables nonparametrically, the model remains conditionally linear in the mean. Parsimony is achieved through adopting nonparametric factor structures and use of shrinkage priors. In an application to US macroeconomic data, we illustrate the use of our model in tracking both the evolving nature of the Phillips curve and how the effects of business cycle shocks on inflationary measures vary nonlinearly with movements in uncertainty.
    Date: 2022–09
  10. By: Hugo Freeman
    Abstract: This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixed-effects when estimating coefficients on the observed covariates. First, the model is embedded within the standard two-dimensional panel framework and restrictions are derived under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters. The second approach considers group fixed-effects and kernel methods that are more robust to the multidimensional nature of the problem. Theoretical results and simulations show the benefit of standard two-dimensional panel methods when the structure of the interactive fixed-effect term is known, but also highlight how the group fixed-effects and kernel methods perform well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer under a handful of models for demand.
    Date: 2022–09
  11. By: Siliang Zeng; Mingyi Hong; Alfredo Garcia
    Abstract: We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states. This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized. Several approaches have been proposed to alleviate the computational burden of this nested-loop structure, but these methods still suffer from high complexity when the state space is either discrete with large cardinality or continuous in high dimensions. Other approaches in the inverse reinforcement learning (IRL) literature emphasize policy estimation at the expense of reduced reward estimation accuracy. In this paper we propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces without compromising reward estimation accuracy. In the proposed algorithm, each policy improvement step is followed by a stochastic gradient step for likelihood maximization. We show that the proposed algorithm converges to a stationary solution with a finite-time guarantee. Further, if the reward is parameterized linearly, we show that the algorithm approximates the maximum likelihood estimator sublinearly. Finally, by using robotics control problems in MuJoCo and their transfer settings, we show that the proposed algorithm achieves superior performance compared with other IRL and imitation learning benchmarks.
    Date: 2022–10
  12. By: Canavire Bacarreza,Gustavo Javier; Rios Avila,Fernando; Sacco Capurro,Flavia Giannina
    Abstract: This paper proposes a method to analyze interval-censored data, using multiple imputationbased on a heteroskedastic interval regression approach. The proposed model aims to obtain a synthetic data set that canbe used for standard analysis, including standard linear regression, quantile regression, or poverty and inequalityestimation. The paper presents two applications to show the performance of the method. First, it runs a Monte Carlosimulation to show the method's performance under the assumption of multiplicative heteroskedasticity, with andwithout conditional normality. Second, it uses the proposed methodology to analyze labor income data in Grenada for2013–20, where the salary data are interval-censored according to the salary intervals prespecified in the surveyquestionnaire. The results obtained are consistent across both exercises.
    Date: 2022–08–22
  13. By: Marzio Di Vece; Diego Garlaschelli; Tiziano Squartini
    Abstract: In the study of economic networks, econometric approaches interpret the traditional Gravity Model specification as the expected link weight coming from a probability distribution whose functional form can be chosen arbitrarily, while statistical-physics approaches construct maximum-entropy distributions of weighted graphs, constrained to satisfy a given set of measurable network properties. In a recent, companion paper, we integrated the two approaches and applied them to the World Trade Web, i.e. the network of international trade among world countries. While the companion paper dealt only with discrete-valued link weights, the present paper extends the theoretical framework to continuous-valued link weights. In particular, we construct two broad classes of maximum-entropy models, namely the integrated and the conditional ones, defined by different criteria to derive and combine the probabilistic rules for placing links and loading them with weights. In the integrated models, both rules follow from a single, constrained optimization of the continuous Kullback-Leibler divergence; in the conditional models, the two rules are disentangled and the functional form of the weight distribution follows from a conditional, optimization procedure. After deriving the general functional form of the two classes, we turn each of them into a proper family of econometric models via a suitable identification of the econometric function relating the corresponding, expected link weights to macroeconomic factors. After testing the two classes of models on World Trade Web data, we discuss their strengths and weaknesses.
    Date: 2022–10
  14. By: Cengiz, Doruk; Tekgüç, Hasan
    Abstract: We extend the scope of the forecast reconciliation literature and use its tools in the context of causal inference. Researchers are interested in both the average treatment effect on the treated and treatment effect heterogeneity. We show that ex post correction of the counterfactual estimates using the aggregation constraints that stem from the hierarchical or grouped structure of the data is likely to yield more accurate estimates. Building on the geometric interpretation of forecast reconciliation, we provide additional insights into the exact factors determining the size of the accuracy improvement due to the reconciliation. We experiment with U.S. GDP and employment data. We find that the reconciled treatment effect estimates tend to be closer to the truth than the original (base) counterfactual estimates even in cases where the aggregation constraints are non-linear. Consistent with our theoretical expectations, improvement is greater when machine learning methods are used.
    Keywords: Forecast Reconciliation; Non-linear Constraints; Causal Machine Learning Methods; Counterfactual Estimation; Difference-in-Differences
    JEL: C53
    Date: 2022–06
  15. By: Spark C. Tseung; Ian Weng Chan; Tsz Chai Fung; Andrei L. Badescu; X. Sheldon Lin
    Abstract: A well-designed framework for risk classification and ratemaking in automobile insurance is key to insurers' profitability and risk management, while also ensuring that policyholders are charged a fair premium according to their risk profile. In this paper, we propose to adapt a flexible regression model, called the Mixed LRMoE, to the problem of a posteriori risk classification and ratemaking, where policyholder-level random effects are incorporated to better infer their risk profile reflected by the claim history. We also develop a stochastic variational Expectation-Conditional-Maximization algorithm for estimating model parameters and inferring the posterior distribution of random effects, which is numerically efficient and scalable to large insurance portfolios. We then apply the Mixed LRMoE model to a real, multiyear automobile insurance dataset, where the proposed framework is shown to offer better fit to data and produce posterior premium which accurately reflects policyholders' claim history.
    Date: 2022–09
  16. By: Kozubowski, Tomasz J. (University of Nevada); Mazur, Stepan (Örebro University School of Business); Podgórski, Krzysztof (Lund University)
    Abstract: There is considerable literature on matrix-variate gamma distributions, also known as Wishart distributions, which are driven by a shape parameter with values in the (Gindikin) set. We provide an extension of this class to the case where the shape parameter may actually take on any positive value. In addition to the well-known singular Wishart as well as non-singular matrix-variate gamma distributions, the proposed class includes new singular matrix-variate distributions, with the shape parameter outside of the Gindikin set. This singular, non-Wishart case is no longer permutation invariant and derivation of its scaling properties requires special care. Among numerous newly established properties of the extended class are group-like relations with respect to the positive shape parameter. The latter provide a natural substitute for the classical convolution properties that are crucial in the study of infinite divisibility. Our results provide further clarification regarding the lack of infinite divisibility of Wishart distributions, a classical observation of Paul Lévy. In particular, we clarify why the row/column vectors in the off-diagonal blocks are infinitely divisible. A class of matrix-variate Laplace distributions arises naturally in this set-up as the distributions of the off-diagonal blocks of random gamma matrices. For the class of Laplace rectangular matrices, we obtain distributional identities that follow from the role they play in the structure of the matrix gamma distributions. We present several elegant and convenient stochastic representations of the discussed classes of matrix-valued distributions. In particular, we show that the matrix-variate gamma distribution is a symmetrization of the triangular Rayleigh distributed matrix { a new class of the matrix variables that naturally extend the classical univariate Rayleigh variables. Finally, a connection of the matrix-variate gamma distributions to matrix-valued Lévy processes of a vector argument is made. Namely, a Lévy process, termed a matrix gamma- Laplace motion, is obtained by the subordination of the triangular Brownian motion of a vector argument to a vector-valued gamma motion of a vector argument. In this context, we introduce a triangular matrix-valued Rayleigh process, which, through symmetrization, leads to a new matrix-variate gamma process. This process when taken at a properly defined one-dimensional argument has the matrix gamma marginal distribution with the shape parameter equal to its argument.
    Keywords: Random matrices; singular random matrices; distribution theory; matrix-variate gamma distribution; Wishart distribution; matrix-variate Laplace distribution; infinitely divisible and stable distributions; matrix-valued Levy processes; triangular matrix-valued Rayleigh process; matrix-variate gamma process; characterization and structure for multivariate probability distributions
    JEL: C10 C30 C46
    Date: 2022–10–14
  17. By: Caio Almeida (Princeton University); Gustavo Freire (Erasmus School of Economics); Rafael Azevedo (Getulio Vargas Foundation (FGV)); Kym Ardison (Getulio Vargas Foundation (FGV))
    Abstract: We propose a family of nonparametric estimators for an option price that require only the use of underlying return data, but can also easily incorporate information from observed option prices. Each estimator comes from a risk-neutral measure minimizing generalized entropy according to a different Cressie-Read discrepancy. We apply our method to price S&P 500 options and the cross-section of individual equity options, using distinct amounts of option data in the estimation. Estimators incorporating mild nonlinearities produce optimal pricing accuracy within the Cressie-Read family and outperform several benchmarks such as the Black-Scholes and different GARCH option pricing models. Overall, we provide a powerful option pricing technique suitable for scenarios of limited option data availability.
    Keywords: Risk-Neutral Measure, Option Pricing, Nonparametric Estimation, Generalized Entropy, Cressie-Read Discrepancies
    JEL: C14 C58 G13
    Date: 2022–05
  18. By: Toshiki Tsuda
    Abstract: Heckman et al. (2008) examine the identification of the marginal treatment effect (MTE) with multivalued treatments by extending the local instrumental variable (LIV) approach of Heckman and Vytlacil (1999). Lee and Salani\'e (2018) study the identification of conditional expectations given unobserved heterogeneity; in Section 5.2 of their paper, they analyze the identification of MTE under the same selection mechanism as in Heckman et al. (2008). We note that the construction of their model in Section 5.2 in Lee and Salani\'e (2018) is incomplete, and we establish sufficient conditions for the identification of MTE with an improved model. While we reduce the unordered multiple-choice model to the binary treatment setting as in Heckman et al. (2008), we can identify the MTE defined as a natural extension of the MTE using the binary treatment defined in Heckman and Vytlacil (2005). Further, our results can help identify other parameters such as the marginal distribution of potential outcomes.
    Date: 2022–09
  19. By: Soren Blomquist; Anil Kumar; Che-Yuan Liang; Whitney K. Newey
    Abstract: This paper is about the nonparametric regression of a choice variable on a nonlinear budget set when there is general heterogeneity, i.e., in the random utility model (RUM). We show that utility maximization makes this a three-dimensional regression with piecewise linear, convex budget sets with a more parsimonious specification than previously derived. We show that the regression allows for measurement and/or optimization errors in the outcome variable. We characterize all of the restrictions of utility maximization on the budget set regression and show how to check these restrictions. We formulate nonlinear budget set effects that can be identified by this regression and give automatic debiased machine learners of these effects. We find that in practice nonconvexities in the budget set have little effect on these estimates. We use control variables to allow for endogeneity of budget sets and adjust for productivity growth in taxable income. We apply the results to estimate .52 as the elasticity of an overall tax rate change in Sweden. We also find that the restrictions of utility maximization are satisfied at the choices made by nearly all individuals in the data.
    Keywords: nonlinear budget sets; nonparametric estimation; heterogeneous preferences; taxable income; revealed stochastic preference
    JEL: C14 C24 H31 J22
    Date: 2022–09–28
  20. By: Yuehao Bai; Meng Hsuan Hsieh; Jizhou Liu; Max Tabord-Meehan
    Abstract: In this paper we revisit some common recommendations regarding the analysis of matched-pair and stratified experimental designs in the presence of attrition. Our main objective is to clarify a number of well-known claims about the practice of dropping pairs with an attrited unit when analyzing matched-pair designs. Contradictory advice appears in the literature about whether or not dropping pairs is beneficial or harmful, and stratifying into larger groups has been recommended as a resolution to the issue. To address these claims, we derive the estimands obtained from the difference-in-means estimator in a matched-pair design both when the observations from pairs with an attrited unit are retained and when they are dropped. We find limited evidence to support the claims that dropping pairs is beneficial, other than in potentially helping recover a convex weighted average of conditional average treatment effects. We then repeat the same exercise for stratified designs by studying the estimands obtained from a regression of outcomes on treatment with and without strata fixed effects. We do not find compelling evidence to support the claims that stratified designs should be preferred to matched-pair designs in the presence of attrition.
    Date: 2022–09
  21. By: Matthieu Stigler; Apratim Dey; Andrew Hobbs; David Lobell
    Abstract: New satellite sensors will soon make it possible to estimate field-level crop yields, showing a great potential for agricultural index insurance. This paper identifies an important threat to better insurance from these new technologies: data with many fields and few years can yield downward biased estimates of basis risk, a fundamental metric in index insurance. To demonstrate this bias, we use state-of-the-art satellite-based data on agricultural yields in the US and in Kenya to estimate and simulate basis risk. We find a substantive downward bias leading to a systematic overestimation of insurance quality. In this paper, we argue that big data in crop insurance can lead to a new situation where the number of variables $N$ largely exceeds the number of observations $T$. In such a situation where $T\ll N$, conventional asymptotics break, as evidenced by the large bias we find in simulations. We show how the high-dimension, low-sample-size (HDLSS) asymptotics, together with the spiked covariance model, provide a more relevant framework for the $T\ll N$ case encountered in index insurance. More precisely, we derive the asymptotic distribution of the relative share of the first eigenvalue of the covariance matrix, a measure of systematic risk in index insurance. Our formula accurately approximates the empirical bias simulated from the satellite data, and provides a useful tool for practitioners to quantify bias in insurance quality.
    Date: 2022–09
  22. By: Scutari, Marco; Panero, Francesca; Proissl, Manuel
    Abstract: In this paper, we present a general framework for estimating regression models subject to a user-defined level of fairness. We enforce fairness as a model selection step in which we choose the value of a ridge penalty to control the effect of sensitive attributes. We then estimate the parameters of the model conditional on the chosen penalty value. Our proposal is mathematically simple, with a solution that is partly in closed form and produces estimates of the regression coefficients that are intuitive to interpret as a function of the level of fairness. Furthermore, it is easily extended to generalised linear models, kernelised regression models and other penalties, and it can accommodate multiple definitions of fairness. We compare our approach with the regression model from Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018), which implements a provably optimal linear regression model and with the fair models from Zafar et al. (J Mach Learn Res 20:1–42, 2019). We evaluate these approaches empirically on six different data sets, and we find that our proposal provides better goodness of fit and better predictive accuracy for the same level of fairness. In addition, we highlight a source of bias in the original experimental evaluation in Komiyama et al. (in: Proceedings of machine learning research. 35th international conference on machine learning (ICML), vol 80, pp 2737–2746, 2018).
    Keywords: fairness; generalised linear models; linear regression; logistic regression; ridge regression; EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1); EPSRC and MRC Centre for Doctoral Training in Statistical Science; University of Oxford (Grant EP/L016710/1)
    JEL: C1
    Date: 2022–09–18
  23. By: Andrew Y. Chen; Tom Zimmermann
    Abstract: Researchers are more likely to share notable findings. As a result, published findings tend to overstate the magnitude of real-world phenomena. This bias is a natural concern for asset pricing research, which has found hundreds of return predictors and little consensus on their origins. Empirical evidence on publication bias comes from large scale meta-studies. Meta-studies of cross-sectional return predictability have settled on four stylized facts that demonstrate publication bias is not a dominant factor: (1) almost all findings can be replicated, (2) predictability persists out-of-sample, (3) empirical $t$-statistics are much larger than 2.0, and (4) predictors are weakly correlated. Each of these facts has been demonstrated in at least three meta-studies. Empirical Bayes statistics turn these facts into publication bias corrections. Estimates from three meta-studies find that the average correction (shrinkage) accounts for only 10 to 15 percent of in-sample mean returns and that the risk of inference going in the wrong direction (the false discovery rate) is less than 6%. Meta-studies also find that t-statistic hurdles exceed 3.0 in multiple testing algorithms and that returns are 30 to 50 percent weaker in alternative portfolio tests. These facts are easily misinterpreted as evidence of publication bias effects. We clarify these misinterpretations and others, including the conflating of "mostly false findings" with "many insignificant findings," "data snooping" with "liquidity effects," and "failed replications" with "insignificant ad-hoc trading strategies." Meta-studies outside of the cross-sectional literature are rare. The four facts from cross-sectional meta-studies provide a framework for future research. We illustrate with a preliminary re-examination of equity premium predictability.
    Date: 2022–09
  24. By: Håvard Hungnes; Terje Skjerpen; Jørn Ivar Hamre; Xiaoming Chen Jansen; Dinh Quang Pham; Ole Sandvik (Statistics Norway)
    Abstract: The labour force surveys (LFSs) on all Eurostat countries underwent a substantial redesign in January 2021. To ensure coherent labour market time series for the main indicators in the Norwegian LFS, we model the impact of the redesign. We use a state-space model that takes explicit account of the rotating pattern of the LFS. We also include auxiliary variables related to employment and unemployment that are highly correlated with the LFS variables we consider. The results of a parallel run are also included in the model. This paper makes two contributions to the literature on the effects of LFS redesign. First, we suggest a symmetric specification of the process of the wave-specific effects. Second, we account for substantial fluctuations in the labour force estimates during the Covid-19 pandemic by applying time-varying hyperparameters. Likelihood-ratio tests and examination of the auxiliary residuals show the latter to be warranted.
    Keywords: State-space models; Auxiliary information; Labour market domains; Level shifts; Covid19; Norway
    JEL: C32 C51 C83 J21
    Date: 2022–08
  25. By: Carsten Chong; Marc Hoffmann; Yanghui Liu; Mathieu Rosenbaum; Gr\'egoire Szymanski
    Abstract: In recent years, there has been substantive empirical evidence that stochastic volatility is rough. In other words, the local behavior of stochastic volatility is much more irregular than semimartingales and resembles that of a fractional Brownian motion with Hurst parameter $H
    Date: 2022–10
  26. By: Luofeng Liao; Yuan Gao; Christian Kroer
    Abstract: Statistical inference under market equilibrium effects has attracted increasing attention recently. In this paper we focus on the specific case of linear Fisher markets. They have been widely use in fair resource allocation of food/blood donations and budget management in large-scale Internet ad auctions. In resource allocation, it is crucial to quantify the variability of the resource received by the agents (such as blood banks and food banks) in addition to fairness and efficiency properties of the systems. For ad auction markets, it is important to establish statistical properties of the platform's revenues in addition to their expected values. To this end, we propose a statistical framework based on the concept of infinite-dimensional Fisher markets. In our framework, we observe a market formed by a finite number of items sampled from an underlying distribution (the "observed market") and aim to infer several important equilibrium quantities of the underlying long-run market. These equilibrium quantities include individual utilities, social welfare, and pacing multipliers. Through the lens of sample average approximation (SSA), we derive a collection of statistical results and show that the observed market provides useful statistical information of the long-run market. In other words, the equilibrium quantities of the observed market converge to the true ones of the long-run market with strong statistical guarantees. These include consistency, finite sample bounds, asymptotics, and confidence. As an extension, we discuss revenue inference in quasilinear Fisher markets.
    Date: 2022–09

This nep-ecm issue is ©2022 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.