|
on Econometrics |
By: | Pellatt , Daniel; Sun, Yixiao |
Abstract: | This paper proposes tests of linear hypotheses when the variables may be continuous-time processes with observations collected at a high sampling frequency over a long span. Utilizing series long run variance (LRV) estimation in place of the traditional kernel LRV estimation, we develop easy-to-implement and more accurate F tests in both stationary and nonstationary environments. The nonstationary environment accommodates endogenous regressors that are general semimartinglales. The F tests can be implemented in exactly the same way as in the usual discrete-time setting. The F tests are, therefore, robust to the continuous-time or discrete-time nature of the data. Simulations demonstrate the improved size accuracy and competitive power of the F tests relative to existing continuous-time testing procedures and their improved versions. The F tests are of practical interest as recent work by Chang et al. (2018) demonstrates that traditional inference methods can become invalid and produce spurious results when continuous-time processes are observed on finer grids over a long span. |
Keywords: | Social and Behavioral Sciences, continuous time model, F distribution, high frequency regression, long run variance estimation |
Date: | 2020–10–29 |
URL: | http://d.repec.org/n?u=RePEc:cdl:ucsdec:qt19f0d9wz&r=all |
By: | Joshua C.C. Chan; Rodney W. Strachan |
Abstract: | State space models play an important role in macroeconometric analysis and the Bayesian approach has been shown to have many advantages. This paper outlines recent developments in state space modelling applied to macroeconomics using Bayesian methods. We outline the directions of recent research, specifically the problems being addressed and the solutions proposed. After presenting a general form for the linear Gaussian model, we discuss the interpretations and virtues of alternative estimation routines and their outputs. This discussion includes the Kalman filter and smoother, and precision based algorithms. As the advantages of using large models have become better understood, a focus has developed on dimension reduction and computational advances to cope with high-dimensional parameter spaces. We give an overview of a number of recent advances in these directions. Many models suggested by economic theory are either non-linear or non-Gaussian, or both. We discuss work on the particle filtering approach to such models as well as other techniques that use various approximations - to either the time t state and measurement equations or to the full posterior for the states - to obtain draws. |
Keywords: | State space model, filter, smoother, non-linear, non-Gaussian, high-dimension, dimension reduction. |
JEL: | C11 C22 E32 |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:een:camaaa:2020-90&r=all |
By: | Harold D. Chiang; Bing Yang Tan |
Abstract: | This paper studies the asymptotic properties of and improved inference methods for kernel density estimation (KDE) for dyadic data. We first establish novel uniform convergence rates for dyadic KDE under general assumptions. As the existing analytic variance estimator is known to behave unreliably in finite samples, we propose a modified jackknife empirical likelihood procedure for inference. The proposed test statistic is self-normalised and no variance estimator is required. In addition, it is asymptotically pivotal regardless of presence of dyadic clustering. The results are extended to cover the practically relevant case of incomplete dyadic network data. Simulations show that this jackknife empirical likelihood-based inference procedure delivers precise coverage probabilities even under modest sample sizes and with incomplete dyadic data. Finally, we illustrate the method by studying airport congestion. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.08838&r=all |
By: | Martínez-Iriarte, Julian; Sun, Yixiao |
Abstract: | This paper studies the identification and estimation of unconditional policy effects when the treatment is binary and endogenous. We first characterize the asymptotic bias of the unconditional regression estimator that ignores the endogeneity and elaborate on the channels that the endogeneity can render the unconditional regressor estimator inconsistent. We show that even if the treatment status is exogenous, the unconditional regression estimator can still be inconsistent when there are common covariates affecting both the treatment status and the outcome variable. We introduce a new class of marginal treatment effects (MTE) based on the influence function of the functional underlying the policy target. We show that an unconditional policy effect can be represented as a weighted average of the newly defined MTEs over the individuals at the margin of indifference. Point identification is achieved using the local instrumental variable approach. Furthermore, the unconditional policy effects are shown to include the marginal policy-relevant treatment effect in the literature as a special case. Methods of estimation and inference for the unconditional policy effects are provided. In the empirical application, we estimate the effect of changing college enrollment status, induced by higher tuition subsidy, on the quantiles of the wage distribution. |
Keywords: | Social and Behavioral Sciences, unconditional quantile regressions, unconditional policy effect, selection models, instrumental variables, marginal treatment effect, marginal policy-relevant treatment effect. |
Date: | 2020–10–29 |
URL: | http://d.repec.org/n?u=RePEc:cdl:ucsdec:qt2bc57830&r=all |
By: | Bin Chen; Kenwin Maung |
Abstract: | In this paper, we propose a new nonparametric estimator of time-varying forecast combination weights. When the number of individual forecasts is small, we study the asymptotic properties of the local linear estimator. When the number of candidate forecasts exceeds or diverges with the sample size, we consider penalized local linear estimation with the group SCAD penalty. We show that the estimator exhibits the oracle property and correctly selects relevant forecasts with probability approaching one. Simulations indicate that the proposed estimators outperform existing combination schemes when structural changes exist. Two empirical studies on inflation forecasting and equity premium prediction highlight the merits of our approach relative to other popular methods. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.10435&r=all |
By: | Zongwu Cai (Department of Economics, The University of Kansas, Lawrence, KS 66045, USA); Xiyuan Liu (Department of Economics, The University of Kansas, Lawrence, KS 66045, USA) |
Abstract: | The degree of interdependences among holdings of financial sectors and its varying patterns play important roles in forming systemic risks within a financial system. In this article, we propose a VAR model of conditional quantiles with functional coefficients to construct a novel class of dynamic network system, of which the interdependences among tail risks such as Value-at-Risk are allowed to vary with a variable of general economy. Methodologically, we develop an easy-to-implement two-stage procedure to estimate functionals in the dynamic network system by the local linear smoothing technique. We establish the consistency and the asymptotic normality of the proposed estimator under time series settings. The simulation studies are conducted to show that our new methods work fairly well. The potential of the proposed estimation procedures is demonstrated by an empirical study of constructing and estimating a new type of dynamic financial network. |
Keywords: | cDynamic financial network; Functional coefficient models; Multivariate conditional quantile models; Nonparametric estimation; VAR modeling |
JEL: | C14 C58 C45 G32 |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:kan:wpaper:202017&r=all |
By: | Zhentao Shi; Liangjun Su; Tian Xie |
Abstract: | This paper presents a novel high dimensional forecast combination estimator in the presence of many forecasts and potential latent group structures. The new algorithm, which we call $\ell_2$-relaxation, minimizes the squared $\ell_2$-norm of the weight vector subject to a relaxed version of the first-order conditions, instead of minimizing the mean squared forecast error as those standard optimal forecast combination procedures. A proper choice of the tuning parameter achieves bias and variance trade-off, and incorporates as special cases the simple average (equal-weight) strategy and the conventional optimal weighting scheme. When the variance-covariance (VC) matrix of the individual forecast errors exhibits latent group structures -- a block equicorrelation matrix plus a VC for idiosyncratic noises, $\ell_2$-relaxation delivers combined forecasts with roughly equal within-group weights. Asymptotic optimality of the new method is established by exploiting the duality between the sup-norm restriction and the high-dimensional sparse $\ell_1$-norm penalization. Excellent finite sample performance of our method is demonstrated in Monte Carlo simulations. Its wide applicability is highlighted in three real data examples concerning empirical applications of microeconomics, macroeconomics and finance. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.09477&r=all |
By: | Lam, Clifford |
Abstract: | Covariance matrix estimation plays an important role in statistical analysis in many fields, including (but not limited to) portfolio allocation and risk management in finance, graphical modeling, and clustering for genes discovery in bioinformatics, Kalman filtering and factor analysis in economics. In this paper, we give a selective review of covariance and precision matrix estimation when the matrix dimension can be diverging with, or even larger than the sample size. Two broad categories of regularization methods are presented. The first category exploits an assumed structure of the covariance or precision matrix for consistent estimation. The second category shrinks the eigenvalues of a sample covariance matrix, knowing from random matrix theory that such eigenvalues are biased from the population counterparts when the matrix dimension grows at the same rate as the sample size. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data Statistical and Graphical Methods of Data Analysis > Multivariate Analysis Statistical and Graphical Methods of Data Analysis > Nonparametric Methods. |
Keywords: | Structured covariance estimation; sparsity; low rank plus sparse; factor model; shrinkage |
JEL: | C1 |
Date: | 2020–03–01 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:101667&r=all |
By: | Chaojun Li; Yan Liu |
Abstract: | This study proves the asymptotic properties of the maximum likelihood estimator (MLE) in a wide range of endogenous regime-switching models. This class of models extends the constant state transition probability in Markov-switching models to a time-varying probability that includes information from observations. A feature of importance in this proof is the mixing rate of the state process conditional on the observations, which is time varying owing to the time-varying transition probabilities. Consistency and asymptotic normality follow from the almost deterministic geometric decaying bound of the mixing rate. Relying on low-level assumptions that have been shown to hold in general, this study provides theoretical foundations for statistical inference in most endogenous regime-switching models in the literature. As an empirical application, an endogenous regime-switching autoregressive conditional heteroscedasticity (ARCH) model is estimated and analyzed with the obtained inferential results. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.04930&r=all |
By: | J\"org Stoye |
Abstract: | This paper revisits the simple, but empirically salient, problem of inference on a real-valued parameter that is partially identified through upper and lower bounds with asymptotically normal estimators. A simple confidence interval is proposed and is shown to have the following properties: - It is never empty or awkwardly short, including when the sample analog of the identified set is empty. - It is valid for a well-defined pseudotrue parameter whether or not the model is well-specified. - It involves no tuning parameters and minimal computation. In general, computing the interval requires concentrating out one scalar nuisance parameter. For uncorrelated estimators of bounds --notably if bounds are estimated from distinct subsamples-- and conventional coverage levels, this step can be skipped. The proposed $95\%$ confidence interval then simplifies to the union of a simple $90\%$ (!) confidence interval for the partially identified parameter and an equally simple $95\%$ confidence interval for a point-identified pseudotrue parameter. This case obtains in the motivating empirical application, in which improvement over existing inference methods is demonstrated. More generally, simulations suggest excellent length and size control properties. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.10484&r=all |
By: | Tu, Yundong; Yao, Qiwei; Zhang, Rongmao |
Abstract: | Cointegration inferences often rely on a correct specification for the short-run dynamic vector autoregression. However, this specification is unknown, a priori. A lag length that is too small leads to an erroneous inference as a result of the misspecification. In contrast, using too many lags leads to a dramatic increase in the number of parameters, especially when the dimension of the time series is high. In this paper, we develop a new methodology which adds an error-correction term for the long-run equilibrium to a latent factor model in order to model the short-run dynamic relationship. The inferences use the eigenanalysis-based methods to estimate the cointegration and latent factor process. The proposed error-correction factor model does not require an explicit specification of the short-run dynamics, and is particularly effective for high-dimensional cases, in which the standard error-correction suffers from overparametrization. In addition, the model improves the predictive performance of the pure factor model. The asymptotic properties of the proposed methods are established when the dimension of the time series is either fixed or diverging slowly as the length of the time series goes to infinity. Lastly, the performance of the model is evaluated using both simulated and real data sets. |
Keywords: | cointegration; eigenanalysis; factor models; nonstationary processes; vector time series |
JEL: | C1 |
Date: | 2020–07–01 |
URL: | http://d.repec.org/n?u=RePEc:ehl:lserod:106994&r=all |
By: | Rahul Singh; Liyuan Xu; Arthur Gretton |
Abstract: | We propose a novel framework for non-parametric policy evaluation in static and dynamic settings. Under the assumption of selection on observables, we consider treatment effects of the population, of sub-populations, and of alternative populations that may have alternative covariate distributions. We further consider the decomposition of a total effect into a direct effect and an indirect effect (as mediated by a particular mechanism). Under the assumption of sequential selection on observables, we consider the effects of sequences of treatments. Across settings, we allow for treatments that may be discrete, continuous, or even text. Across settings, we allow for estimation of not only counterfactual mean outcomes but also counterfactual distributions of outcomes. We unify analyses across settings by showing that all of these causal learning problems reduce to the re-weighting of a prediction, i.e. causal adjustment. We implement the re-weighting as an inner product in a function space called a reproducing kernel Hilbert space (RKHS), with a closed form solution that can be computed in one line of code. We prove uniform consistency and provide finite sample rates of convergence. We evaluate our estimators in simulations devised by other authors. We use our new estimators to evaluate continuous and heterogeneous treatment effects of the US Jobs Corps training program for disadvantaged youth. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.04855&r=all |
By: | Patrik Guggenberger; Frank Kleibergen; Sophocles Mavroeidis |
Abstract: | We propose a test that a covariance matrix has Kronecker Product Structure (KPS). KPS implies a reduced rank restriction on an invertible transformation of the covariance matrix and the new procedure is an adaptation of the Kleibergen and Paap (2006) reduced rank test. KPS is a generalization of homoscedasticity and allows for more powerful subvector inference in linear Instrumental Variables (IV) regressions than can be achieved under general covariance matrices. Re-examining sixteen highly cited papers conducting IV regressions, we find that KPS is not rejected in 24 of 30 specifications for moderate sample sizes at the 5% nominal size. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.10961&r=all |
By: | George Gui |
Abstract: | Experiments are the gold standard for causal identification but often have limited scale, while observational datasets are large but often violate standard identification assumptions. To improve the estimation efficiency, we propose a new method that combines experimental and observational data by leveraging covariates that satisfy the first-stage relevance condition in the observational data. We first use the observational data to derive a biased estimate of the causal effect and then correct this bias using the experimental data. We show that this bias-corrected estimator is uncorrelated with the benchmark estimator that only uses the experimental data, and can thus be combined with the benchmark estimator to improve efficiency. Under a common experimental design that randomly assigns units into the experimental group, our method can reduce the variance by up to 50%, so that only half of the experimental sample is required to attain the same accuracy. This accuracy can be further improved if the experimental design accounts for this relevant first-stage covariate and select the experimental group differently. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.05117&r=all |
By: | Ollech, Daniel; Webel, Karsten |
Abstract: | Virtually each seasonal adjustment software includes an ensemble of seasonality tests for assessing whether a given time series is in fact a candidate for seasonal adjustment. However, such tests are certain to produce either the same resultor conflicting results, raising the question if there is a method that is capable of identifying the most informative tests in order (1) to eliminate the seemingly non-informative ones in the former case and (2) to find a final decision in the more severe latter case. We argue that identifying the seasonal status of a given time series is essentially a classification problem and, thus, can be solved with machine learning methods. Using simulated seasonal and non-seasonal ARIMA processes that are representative of the Bundesbank's time series database, we compare certain popular methods with respect to accuracy, interpretability and availability of unbiased variable importance measures and find random forests of conditional inference trees to be the method which best balances these key requirements. Applying this method to the seasonality tests implemented in the seasonal adjustment software JDemetra+ finally reveals that the modifiedQSand Friedman tests yield by far the most informative results. |
Keywords: | binary classification,conditional inference trees,correlated predictors,JDemetra+,simulation study,supervised machine learning |
JEL: | C12 C14 C22 C45 C63 |
Date: | 2020 |
URL: | http://d.repec.org/n?u=RePEc:zbw:bubdps:552020&r=all |
By: | Marcin Pitera; Thorsten Schmidt |
Abstract: | While the estimation of risk is an important question in the daily business of banks and insurances, it is surprising that efficient procedures for this task are not well studied. Indeed, many existing plug-in approaches for the estimation of risk suffer from an unnecessary bias which leads to the underestimation of risk and negatively impacts backtesting results, especially in the small sample environment. In this article, we consider efficient estimation of risk in practical situations and provide means to improve the accuracy of risk estimators and their performance in backtesting. In particular, we propose an algorithm for bias correction and show how to apply it for generalized Pareto distributions. Moreover, we propose new estimators for value-at-risk and expected shortfall, respectively, and illustrate the gain in efficiency when heavy tails exist in the data. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.09937&r=all |
By: | Obafèmi Philippe Koutchadé; Alain Carpentier; Fabienne Féménia |
Abstract: | Null crop acreages raise pervasive issues when modelling acreage choices with farm data. We revisit these issues and emphasize that null acreage choices arise not only due to binding non-negativity constraints but also due to crop production fixed costs. Based on this micro-economic background, we present a micro-econometric multi-crop model that consistently handles null acreages and accounts for crop production fixed costs. This multivariate endogenous regime switching model allows for specific crop acreage patterns, such as multiple kinks and jumps in crop acreage responses to economic incentives, that are due to changes in produced crop sets. Currently available micro-econometric multi-crop models, which handle null acreages based on a censored regression approach, cannot represent these patterns. We illustrate the empirical tractability of our modelling framework by estimating a random parameter version of the proposed endogenous regime switching micro-econometric multi-crop model with a panel dataset of French farmers. Our estimation and simulation results support our theoretical analysis, the effects of crop fixed costs and crop set choices on crop acreage choices in particular. More generally, these results suggest that the micro-econometric multi-crop model presented in this article can significantly improves empirical analyses of crop supply based on farm data. |
Keywords: | acreage choice, crop choice, endogenous regime switching, random parameter models |
JEL: | Q12 C13 C15 |
Date: | 2020 |
URL: | http://d.repec.org/n?u=RePEc:rae:wpaper:202009&r=all |
By: | Ben Deaner |
Abstract: | Estimation and inference in dynamic discrete choice models often relies on approximation to lower the computational burden of dynamic programming. Unfortunately, the use of approximation can impart substantial bias in estimation and results in invalid confidence sets. We present a method for set estimation and inference that explicitly accounts for the use of approximation and is thus valid regardless of the approximation error. We show how one can account for the error from approximation at low computational cost. Our methodology allows researchers to assess the estimation error due to the use of approximation and thus more effectively manage the trade-off between bias and computational expedience. We provide simulation evidence to demonstrate the practicality of our approach. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.11482&r=all |
By: | Christian Glocker; Serguei Kaniovski |
Abstract: | We propose a modelling approach involving a series of small-scale factor models. They are connected to each other within a cluster, whose linkages are derived from Granger-causality tests. GDP forecasts are established across the production, income and expenditure accounts within a disaggregated approach. This method merges the benefits of large-scale macroeconomic and small-scale factor models, rendering our Cluster of Dynamic Factor Models (CDFM) useful for model-consistent forecasting on a large scale. While the CDFM has a simple structure, its forecasts outperform those of a wide range of competing models and of professional forecasters. Moreover, the CDFM allows forecasters to introduce their own judgment and hence produce conditional forecasts. |
Keywords: | Forecasting, Dynamic factor model, Granger causality, Structural modeling |
Date: | 2020–10–27 |
URL: | http://d.repec.org/n?u=RePEc:wfo:wpaper:y:2020:i:614&r=all |
By: | Jonathan Roth; Pedro H. C. Sant'Anna |
Abstract: | This paper assesses when the validity of difference-in-differences and related estimators is dependent on functional form. We provide a novel characterization: the parallel trends assumption holds under all monotonic transformations of the outcome if and only if a stronger "parallel trends"-type assumption holds on the entire distribution of potential outcomes. This assumption necessarily holds when treatment is (as if) randomly assigned, but will often be implausible in settings where randomization fails. We show further that the average treatment effect on the treated (ATT) is identified regardless of functional form if and only if the entire distribution of untreated outcomes is identified for the treated group. It is thus impossible to construct an estimator that is consistent (or unbiased) for the ATT regardless of functional form unless one imposes assumptions that identify the entire counterfactual distribution of untreated potential outcomes. Our results suggest that researchers who wish to point-identify the ATT should justify one of the following: (i) why treatment is randomly assigned, (ii) why the chosen functional form is correct at the exclusion of others, or (iii) a method for inferring the entire counterfactual distribution of untreated potential outcomes. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.04814&r=all |
By: | Annabelle Doerr; Anthony Strittmatter |
Abstract: | We study the identification of channels of policy reforms with multiple treatments and different types of selection for each treatment. We disentangle reform effects into policy effects, selection effects, and time effects under the assumption of conditional independence, common trends, and an additional exclusion restriction on the non-treated. Furthermore, we show the identification of direct- and indirect policy effects after imposing additional sequential conditional independence assumptions on mediating variables. We illustrate the approach using the German reform of the allocation system of vocational training for unemployed persons. The reform changed the allocation of training from a mandatory system to a voluntary voucher system. Simultaneously, the selection criteria for participants changed, and the reform altered the composition of course types. We consider the course composition as a mediator of the policy reform. We show that the empirical evidence from previous studies reverses when considering the course composition. This has important implications for policy conclusions. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.05221&r=all |
By: | Edward S. Knotek; Saeed Zaman |
Abstract: | We develop a flexible modeling framework to produce density nowcasts for US inflation at a trading-day frequency. Our framework: (1) combines individual density nowcasts from three classes of parsimonious mixed-frequency models; (2) adopts a novel flexible treatment in the use of the aggregation function; and (3) permits dynamic model averaging via the use of weights that are updated based on learning from past performance. Together these features provide density nowcasts that can accommodate non-Gaussian properties. We document the competitive properties of the nowcasts generated from our framework using high-frequency real-time data over the period 2000-2015. |
Keywords: | mixed-frequency models; inflation; density nowcasts; density combinations |
JEL: | C15 C53 E3 E37 |
Date: | 2020–10–22 |
URL: | http://d.repec.org/n?u=RePEc:fip:fedcwq:88961&r=all |
By: | Koki Fusejima |
Abstract: | In this paper, we establish sufficient conditions for identifying the treatment effects on continuous outcomes in endogenous and multi-valued discrete treatment settings with unobserved heterogeneity. We employ the monotonicity assumption for multi-valued discrete treatments and instruments, and our identification condition is easy to interpret economically. Our result contrasts with related work by Chernozhukov and Hansen (2005) for this point. We also establish identification of the local treatment effects in multi-valued treatment settings and derive the closed-form expressions of the identified treatment effects. We give examples to verify the usefulness of our result. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.04385&r=all |
By: | Kang, Natasha; Marmer, Vadim |
Abstract: | Recurrent boom-and-bust cycles are a salient feature of economic and finan- cial history. Cycles found in the data are stochastic, often highly persistent, and span substantial fractions of the sample size. We refer to such cycles as “long†. In this paper, we develop a novel approach to modeling cyclical behavior specifically designed to capture long cycles. We show that existing inferential procedures may produce misleading results in the presence of long cycles, and propose a new econometric procedure for the inference on the cycle length. Our procedure is asymptotically valid regardless of the cycle length. We apply our methodology to a set of macroeconomic and financial variables for the U.S. We find evidence of long stochastic cycles in the standard business cycle variables, as well as in credit and house prices. However, we rule out the presence of stochastic cycles in asset market data. Moreover, according to our result, financial cycles as characterized by credit and house prices tend to be twice as long as business cycles. |
Keywords: | Stochastic cycles, autoregressive processes, local-to-unity asymptotics, confi- dence sets, business cycle, financial cycle |
JEL: | C12 C22 C5 E32 E44 |
Date: | 2020–10–25 |
URL: | http://d.repec.org/n?u=RePEc:ubc:bricol:vadim_marmer-2020-3&r=all |
By: | Bryan S. Graham |
Abstract: | Consider a bipartite network where N consumers choose to buy or not to buy M different products. This paper considers the properties of the logistic regression of the N × M array of “i-buys-j” purchase decisions, [Y ij ] 1≤i≤N,≤j≤M , onto known functions of consumer and product attributes under asymptotic sequences where (i) both N and M grow large and (ii) the average number of products purchased per consumer is finite in the limit. This latter assumption implies that the network of purchases is sparse: only a (very) small fraction of all possible purchases are actually made (concordant with many real-world settings). Under sparse network asymptotics, the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the dataset in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and iid data. Sparse network asymptotics may lead to better inference in practice since they suggest variance estimators which (i) incorporate additional sources of sampling variation and (ii) are valid under varying degrees of dyadic dependence. |
JEL: | C01 C31 C33 C55 |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:nbr:nberwo:27962&r=all |
By: | David L. Lee; Justin McCrary; Marcelo J. Moreira; Jack Porter |
Abstract: | In the single IV model, current practice relies on the first-stage F exceeding some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an F greater than 104.7. Maintaining 10 as a threshold requires replacing the critical value 1.96 with 3.43. We re-examine 57 AER papers and find that corrected inference causes half of the initially presumed statistically significant results to be insignificant. We introduce a more powerful test, the tF procedure, which provides F-dependent adjusted t-ratio critical values. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.05058&r=all |
By: | Yucheng Yang; Zhong Zheng; Weinan E |
Abstract: | The lack of interpretability and transparency are preventing economists from using advanced tools like neural networks in their empirical work. In this paper, we propose a new class of interpretable neural network models that can achieve both high prediction accuracy and interpretability in regression problems with time series cross-sectional data. Our model can essentially be written as a simple function of a limited number of interpretable features. In particular, we incorporate a class of interpretable functions named persistent change filters as part of the neural network. We apply this model to predicting individual's monthly employment status using high-dimensional administrative data in China. We achieve an accuracy of 94.5% on the out-of-sample test set, which is comparable to the most accurate conventional machine learning methods. Furthermore, the interpretability of the model allows us to understand the mechanism that underlies the ability for predicting employment status using administrative data: an individual's employment status is closely related to whether she pays different types of insurances. Our work is a useful step towards overcoming the "black box" problem of neural networks, and provide a promising new tool for economists to study administrative and proprietary big data. |
Date: | 2020–10 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2010.05311&r=all |