nep-ecm New Economics Papers
on Econometrics
Issue of 2018‒06‒18
twenty papers chosen by
Sune Karlsson
Örebro universitet

  1. Mixed integer linear programming: a new approach for instrumental variable quantile regressions and related problems By Yinchu Zhu
  2. Time series with interdependent level and second moment: statistical testing and applications with Greek external trade and simulated data By Alexandros E. Milionis; Nikolaos G. Galanopoulos
  3. Least Squares and IVX Limit Theory in Systems of Predictive Regressions with GARCH innovations By Tassos Magdalinos
  4. Mildly explosive autoregression under stationary conditional heteroskedasticity By Stelios Arvanitis; Tassos Magdalinos
  5. Testing productivity change, frontier shift, and efficiency change By Mette Asmild; Dorte Kronborg; Anders Rønn-Nielsen
  6. Log periodogram regression of two-dimensional intrinsic stationary random fields By Yoshihiro Yajima; Yasumasa Matsuda
  7. Meta-learning how to forecast time series By Thiyanga S Talagala; Rob J Hyndman; George Athanasopoulos
  8. Modelling asymmetric conditional heteroskedasticity in financial asset returns: an extension of Nelson’s EGARCH model By Cassim, Lucius
  9. Analysis of Asymmetric GARCH Volatility Models with Applications to Margin Measurement By Elena Goldman; Xiangjin Shen
  10. The Effect of the Conservation Reserve Program on Rural Economies: Deriving a Statistical Verdict from a Null Finding By Brown, Jason; Lambert, Dayton; Wojan, Timothy R.
  11. Quantifier Elimination for Deduction in Econometrics By Casey B. Mulligan
  12. Modeling within-household associations in household panel studies By Steele, Fiona; Clarke, Paul; Kuha, Jouni
  13. Unified estimation of densities on bounded and unbounded domains By Mynbayev, Kairat; Martins-Filho, Carlos
  14. A flexible regime switching model with pairs trading application to the S&P 500 high-frequency stock returns By Endres, Sylvia; Stübinger, Johannes
  15. Latent Volatility Granger Causality and Spillovers in Renewable Energy and Crude Oil ETFs By Chia-Lin Chang; Michael McAleer; Yu-Ann Wang
  16. A correlated random effects spatial Durbin model By Miranda, Karen; Martínez Ibáñez, Oscar; Manjón Antolín, Miguel C.
  17. Publication Bias and the Cross-Section of Stock Returns By Andrew Y. Chen; Thomas Zimmermann
  18. A new procedure for pre-testing the distribution properties of Stock returns By Afees A. Salisu; Ibrahim D. Raheem
  19. Integrated choice and latent variable models: A literature review on mode choice By Hélène Bouscasse
  20. Data Science for Institutional and Organizational Economics By Prüfer, Jens; Prüfer, Patricia

  1. By: Yinchu Zhu
    Abstract: This paper proposes a new framework for estimating instrumental variable (IV) quantile models. Our proposal can be cast as a mixed integer linear program (MILP), which allows us to capitalize on recent progress in mixed integer optimization. The computational advantage of the proposed method makes it an attractive alternative to existing estimators in the presence of multiple endogenous regressors. This is a situation that arises naturally when one endogenous variable is interacted with several other variables in a regression equation. In our simulations, the proposed method using MILP with a random starting point can reliably estimate regressions for a sample size of 1000 with 20 endogenous variables in 90 seconds; for high-dimensional problems, our formulation can deliver decent estimates within minutes for problems with 550 endogenous regressors. We also establish asymptotic theory and provide an inference procedure. In our simulations, the asymptotic theory provides an excellent approximation even if we terminate MILP before a certified global solution is found. This suggests that MILP in our setting can quickly approach the global solution. In addition, we show that MILP can also be used for related problems, including censored regression, censored IV quantile regression and high-dimensional IV quantile regression. As an empirical illustration, we examine the heterogeneous treatment effect of Job Training Partnership Act (JTPA) using a regression with 13 interaction terms of the treatment variable.
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:1805.06855&r=ecm
  2. By: Alexandros E. Milionis (Bank of Greece and University of the Aegean); Nikolaos G. Galanopoulos (University of Athens)
    Abstract: This work aims to fill an existing gap in the literature regarding the statistical testing for the existence and the identification of the character of time-varying second moment in its dependence on a non-constant mean level in time series. To this end a new statistical testing procedure is introduced with some considerable advantages over the existing ones. Amongst others it is argued that the existing statistical tests are insufficient and sometimes lead to biased results. Further the effect of the application of this methodology on some crucial elements of time series modelling such as outlier detection and seasonal adjustment is examined, through case studies conducted on a comparative basis using both the new methodology and an established one. The severe consequences of the improper treatment of the type of time-varying second moment dealt with in this work are evidenced and emphasized. The data set comprises time series on monthly external trade statistics for Greece. Overall, the resulting empirical evidence favours the new approach. Further supporting evidence is provided by the application of the new methodology to simulated data.
    Keywords: time series transformations; applied time series analysis; seasonal adjustment; detection of outliers; Greek external trade time series
    JEL: C15 C22 C51 F14
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:bog:wpaper:246&r=ecm
  3. By: Tassos Magdalinos (University of Southampton, UK; Rimini Centre for Economic Analysis)
    Abstract: The paper examines the effect of conditional heteroskedasticity to least squares inference in stochastic regression models. We show that a regressor signal of exact order O^e_p(n^{1+\alpha}) for arbitrary \alpha > 0 is sufficient to eliminate stationary GARCH effects from the limit distributions of least squares based estimators and self-normalised test statistics. The above order dominates the O e p (n) signal of stationary regressors but is dominated by the O e p (n 2 ) signal of I(1) regressors, thereby showing that least squares invariance to GARCH effects is not an exclusively I(1) phenomenon but extends to processes with persistence degree arbitrarily close to stationarity. The theory validates standard inference for self normalised test statistics based on: (i) the OLS estimator when \alpha \in (0,1); (ii) the IVX estimator (Phillips and Magdalinos, 2009; Kostakis, Magdalinos and Stamatogiannis 2015a) when \alpha > 0, when the innovation sequence of the system is a stationary vec-GARCH process. An adjusted version of the IVX testing procedure is shown to also accommodate stationary regressors and produce standard chi-squared inference under conditional heteroskedasticity in the innovations across the full range \alpha \qeq 0.
    Keywords: Central limit theory, Conditional Heteroskedasticity, Mixed Normality, Wald test
    JEL: C22
    Date: 2018–06
    URL: http://d.repec.org/n?u=RePEc:rim:rimwps:18-24&r=ecm
  4. By: Stelios Arvanitis (Athens University of Economics and Business, Greece); Tassos Magdalinos (University of Southampton, UK; Rimini Centre for Economic Analysis)
    Abstract: A limit theory is developed for mildly explosive autoregressions under stationary (weakly or strongly dependent) conditionally heteroskedastic errors. The conditional variance process is allowed to be stationary, integrable and mixingale, thus encompassing general classes of GARCH type or stochastic volatility models. No mixing conditions nor moments of higher order than 2 are assumed for the innovation process. As in Magdalinos (2012), we find that the asymptotic behaviour of the sample moments is affected by the memory of the innovation process both in the form of the limiting distribution and, in the case of long range dependence, the rate of convergence, while conditional heteroskedasticity affects only the asymptotic variance. These effects are cancelled out in least squares regression theory and thereby the Cauchy limit theory of Phillips and Magdalinos (2007a) remains invariant to a wide class of stationary conditionally heteroskedastic innovations processes.
    Keywords: Central limit theory, Explosive autoregression, Long Memory, Conditional heteroskedasticity, GARCH, mixingale, Cauchy distribution
    Date: 2018–06
    URL: http://d.repec.org/n?u=RePEc:rim:rimwps:18-25&r=ecm
  5. By: Mette Asmild (Department of Food and Resource Economics, University of Copenhagen); Dorte Kronborg (Center for Statistics, Department of Finance, Copenhagen Business School); Anders Rønn-Nielsen (Center for Statistics, Department of Finance, Copenhagen Business School)
    Abstract: Inference about productivity change over time based on data envelopment (DEA) has focused primarily on the Malmquist index and is based on asymptotic properties of the index. In this paper we propose a novel set of significance tests for DEA based productivity change measures based on permutations and accounting for the inherent correlations when panel data are observed. The tests are easily implementable and give exact significance probabilities as they are not based on asymptotic properties. Tests are formulated both for the geometric means of the Malmquist index, and also of its components, i.e. the frontier shift index and the eciency change index, which together enable analysis of not only the presence of differences, but also gives an indication of whether the productivity change is due to shifts in the frontiers and/or changes in the efficiency distributions. Simulation results show the power of, and suggest how to interpret the results of, the proposed tests. Finally, the tests are illustrated using a data set from the literature.
    Keywords: Malmquist index, frontier shift, efficiency change, Data Envelopment Analysis (DEA), panel data, permutation tests, inference.
    JEL: C12 C14 C44 C46 C61 D24
    Date: 2018–06
    URL: http://d.repec.org/n?u=RePEc:foi:wpaper:2018_07&r=ecm
  6. By: Yoshihiro Yajima; Yasumasa Matsuda
    Abstract: We propose a new estimator for a semiparametric two-dimensional intrinsic stationary random model observed on a regular grid and derive its asymptotic properties. This random field is nonstationary and includes a fractional Brownian field, which is a Gaussian random field and is used to model many physical processes in space. First we calculate tapered bivariate discrete Fourier transforms and periodograms of data observed on a grid and next apply a log-periodogram regression, which is originally proposed to estimate a long-memory parameter of semiparametric models for time series data. We prove that for a nonstationary two-dimensional random field, the estimator is still consistent and has the limiting normal distribution as the sample size goes to infinity. We conduct a computational simulation to compare the performance of it with those of different estimators proposed by other authors.
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:toh:dssraa:85&r=ecm
  7. By: Thiyanga S Talagala; Rob J Hyndman; George Athanasopoulos
    Abstract: A crucial task in time series forecasting is the identification of the most suitable forecasting method. We present a general framework for forecast-model selection using meta-learning. A random forest is used to identify the best forecasting method using only time series features. The framework is evaluated using time series from the M1 and M3 competitions and is shown to yield accurate forecasts comparable to several benchmarks and other commonly used automated approaches of time series forecasting. A key advantage of our proposed framework is that the time-consuming process of building a classifier is handled in advance of the forecasting task at hand.
    Keywords: FFORMS (Feature-based FORecast-model Selection), time series features, random forest, algorithm selection problem, classsification.
    JEL: C10 C14 C22
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:msh:ebswps:2018-6&r=ecm
  8. By: Cassim, Lucius
    Abstract: Recently, volatility modeling has been a very active and extensive research area in empirical finance and time series econometrics for both academics and practitioners. GARCH models have been the most widely used in this regard. However, GARCH models have been found to have serious limitations empirically among which includes, but not limited to; failure to take into account leverage effect in financial asset returns. As such so many models have been proposed in trying to solve the limitations of the leverage effect in GARCH models two of which are the EGARCH and the TARCH models. The EGARCH model is the most highly used model. It however has its limitations which include, but not limited to; stability conditions in general and existence of unconditional moments in particular depend on the conditional density, failure to capture leverage effect when the parameters are of the same signs, assuming independence of the innovations, lack of asymptotic theory for its estimators et cetera. This paper therefore is geared at extending/improving on the EGARCH model by taking into account the said empirical limitations. The main objective of this paper therefore is to develop a volatility model that solves the problems faced by the exponential GARCH model. Using the Quasi-maximum likelihood estimation technique coupled with martingale techniques, while relaxing the independence assumption of the innovations; the paper has shown that the proposed asymmetric volatility model not only provides strongly consistent estimators but also provides asymptotically efficient estimators
    Keywords: GARCH, TARCH, EGARCH, Quasi Maximum Likelihood Estimation, Martingale
    JEL: C58
    Date: 2018–05–05
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:86615&r=ecm
  9. By: Elena Goldman; Xiangjin Shen
    Abstract: We explore properties of asymmetric generalized autoregressive conditional heteroscedasticity (GARCH) models in the threshold GARCH (GTARCH) family and propose a more general Spline-GTARCH model, which captures high-frequency return volatility, low-frequency macroeconomic volatility as well as an asymmetric response to past negative news in both autoregressive conditional heteroscedasticity (ARCH) and GARCH terms. Based on maximum likelihood estimation of S&P 500 returns, S&P/TSX returns and Monte Carlo numerical example, we find that the proposed more general asymmetric volatility model has better fit, higher persistence of negative news, higher degree of risk aversion and significant effects of macroeconomic variables on the lowfrequency volatility component. We then apply a variety of volatility models in setting initial margin requirements for a central clearing counterparty (CCP). Finally, we show how to mitigate procyclicality of initial margins using a three-regime threshold autoregressive model.
    Keywords: Econometric and statistical models; Payment clearing and settlement systems
    JEL: E41 C31 C36
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:bca:bocawp:18-21&r=ecm
  10. By: Brown, Jason (Federal Reserve Bank of Kansas City); Lambert, Dayton; Wojan, Timothy R.
    Abstract: This article suggests two methods for deriving a statistical verdict from a null finding,allowing economists to more confidently conclude when “not significant" can in fact be interpreted as “no substantive effect." The proposed methodology can be extended to a variety of empirical contexts where size and power matter. The example used to demonstrate the method is the Economic Research Service's 2004 Report to Congress that was charged with statistically identifying any unintended negative employment consequences of the Conservation Reserve Program (the Program). The report failed to identify a statistically significant negative long-term effect of the Program on employment growth, but the authors correctly cautioned that the verdict of “no negative employment effect" was only valid if the econometric test was statistically powerful. We replicate the 2004 analysis and use new methods of statistical inference to resolve the two critical deficiencies that preclude estimation of statistical power by economists: 1) positing a compelling effect size, and 2) providing an estimate of the variability of an unobserved alternative distribution using simulation methods. We conclude that the test used in the report had high power for detecting employment effects of -1 percent or lower resulting from the Program, equivalent to job losses reducing a conservative estimate of environmental benefits by a third.
    Keywords: Power analysis; Monte Carlo simulation; Hypothesis testing
    JEL: C12 Q42 R11
    Date: 2018–05–01
    URL: http://d.repec.org/n?u=RePEc:fip:fedkrw:rwp18-04&r=ecm
  11. By: Casey B. Mulligan
    Abstract: When combined with the logical notion of partially interpreted functions, many nonparametric results in econometrics and statistics can be understood as statements about semi-algebraic sets. Tarski’s quantifier elimination (QE) theorem therefore guarantees that a universal algorithm exists for deducing such results from their assumptions. This paper presents the general framework and then applies QE algorithms to Jensen’s inequality, omitted variable bias, partial identification of the classical measurement error model, point identification in discrete choice models, and comparative statics in the nonparametric Roy model. This paper also discusses the computational complexity of real QE and its implementation in software used for program verification, logic, and computer algebra. I expect that automation will become as routine for abstract econometric reasoning as it already is for numerical matrix inversion.
    JEL: C01 C63 C65
    Date: 2018–05
    URL: http://d.repec.org/n?u=RePEc:nbr:nberwo:24601&r=ecm
  12. By: Steele, Fiona; Clarke, Paul; Kuha, Jouni
    Abstract: Household panel data provide valuable information about the extent of similarity in coresidents' attitudes and behaviours. However, existing analysis approaches do not allow for the complex association structures that arise due to changes in household composition over time. We propose a flexible marginal modeling approach where the changing correlation structure between individuals is modeled directly and the parameters estimated using second-order generalized estimating equations (GEE2). A key component of our correlation model specification is the 'superhousehold', a form of social network in which pairs of observations from different individuals are connected (directly or indirectly) by coresidence. These superhouseholds partition observations into clusters with nonstandard and highly variable correlation structures. We thus conduct a simulation study to evaluate the accuracy and stability of GEE2 for these models. Our approach is then applied in an analysis of individuals' attitudes towards gender roles using British Household Panel Survey data. We find strong evidence of between-individual correlation before, during and after coresidence, with large differences among spouses, parent-child, other family, and unrelated pairs. Our results suggest that these dependencies are due to a combination of non-random sorting and causal effects of coresidence.
    Keywords: household effects; household correlation; longitudinal house-holds; homophily; multiple membership multilevel model; marginal model; generalised estimating equations
    JEL: C1
    Date: 2018–05–29
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:88162&r=ecm
  13. By: Mynbayev, Kairat; Martins-Filho, Carlos
    Abstract: Kernel density estimation in domains with boundaries is known to suffer from undesirable boundary effects. We show that in the case of smooth densities, a general and elegant approach is to estimate an extension of the density. The resulting estimators in domains with boundaries have biases and variances expressed in terms of density extensions and extension parameters. The result is that they have the same rates at boundary and interior points of the domain. Contrary to the extant literature, our estimators require no kernel modification near the boundary and kernels commonly used for estimation on the real line can be applied. Densities defined on the half-axis and in a unit interval are considered. The results are applied to estimation of densities that are discontinuous or have discontinuous derivatives, where they yield the same rates of convergence as for smooth densities on R.
    Keywords: Nonparametric density estimation; Hestenes’ extension; estimation in bounded domains; estimation of discontinuous densities
    JEL: C14
    Date: 2017–07
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:87044&r=ecm
  14. By: Endres, Sylvia; Stübinger, Johannes
    Abstract: This paper develops the regime classification algorithm and applies it within a fully-edged pairs trading framework on minute-by-minute data of the S&P 500 constituents from 1998 to 2015. Specifically, the highly flexible algorithm automatically determines the number of regimes for any stochastic process and provides a complete set of parameter estimations. We demonstrate its performance in a simulation study - the algorithm achieves promising results for the general class of Lévy-driven Ornstein-Uhlenbeck processes with regime switches. In our empirical back-testing study, we apply our regime classification algorithm to propose a high-frequency pair selection and trading strategy. The results show statistically and economically significant returns with an annualized Sharpe ratio of 3.92 after transaction costs - results remain stable even in recent years. We compare our strategy with existing quantitative trading frameworks and find its results to be superior in terms of risk and return characteristics. The algorithm takes full advantage of its flexibility and identifies various regime patterns over time that are well-documented in the literature.
    Keywords: Finance,Pairs trading,Statistical arbitrage,Markov regime switching,Lévy-driven Ornstein-Uhlenbeck process,High-frequency data
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:zbw:iwqwdp:072018&r=ecm
  15. By: Chia-Lin Chang (National Chung Hsing University); Michael McAleer (Asia University, University of Sydney Business School, EUR); Yu-Ann Wang (National Chung Hsing University)
    Abstract: The purpose of the paper is to examine latent volatility Granger causality for four renewable energy Exchange Traded Funds (ETFs) and crude oil ETF (USO), namely solar (TAN), wind (FAN), water (PIO), and nuclear (NLR). Data on the renewable energy and crude oil ETFs are from 18 June 2008 to 20 March 2017. From the underlying stochastic process of a vector random coefficient autoregressive (VRCAR) process for the shocks of returns, we derive Latent Volatility Granger causality from the Diagonal BEKK multivariate conditional volatility model. We follow Chang et al. (2015)’s definition of the co-volatility spillovers of shocks, which calculate the delayed effect of a returns shock in one asset on the subsequent volatility or co-volatility in another asset, and extend the effects of the co-volatility spillovers of shocks to the effects of the co-volatility spillovers of squared shocks. The empirical results show there are significant positive latent volatility Granger causality relationships between solar (TAN), wind (FAN), nuclear (NLR), and crude oil (USO) ETFs, specifically significant volatility spillovers of shocks from solar ETF on the subsequent wind ETF co-volatility with solar ETF, and wind ETF on the subsequent solar ETF co-volatility with wind ETF. Interestingly, there are significant volatility spillovers of squared shocks for the renewable energy ETFs, but not with crude oil ETFs.
    Keywords: Renewable Energy; Latent Volatility; Granger Causality; Co-volatility Spillovers; Solar; Wind; Water; Nuclear Power.
    JEL: C32 C58 G12 G15 Q42
    Date: 2018–05–25
    URL: http://d.repec.org/n?u=RePEc:tin:wpaper:20180052&r=ecm
  16. By: Miranda, Karen; Martínez Ibáñez, Oscar; Manjón Antolín, Miguel C.
    Abstract: We consider a correlated random effects specification of the spatial Durbin (dynamic) panel model with an error-term containing individual effects and their spatial spillovers. We derive the likelihood function of the model and the asymptotic properties of the quasimaximum likelihood estimator. We also provide illustrative evidence from a growth-initial level equation and the country dataset analysed by Lee and Yu (2016). While largely replicating their estimates, our results indicate the existence of spatial contagion in the individual effects. In particular, estimated spill-in/out effects reveal the existence of groups of countries with common patterns in their spillovers. Keywords: correlated random effects, Durbin model, spatial dynamic panel data. JEL Classification: C23
    Keywords: Anàlisi espacial (Estadística), Anàlisi de dades de panel, 33 - Economia,
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:urv:wpaper:2072/313840&r=ecm
  17. By: Andrew Y. Chen; Thomas Zimmermann
    Abstract: We develop an estimator for publication bias and apply it to 156 hedge portfolios based on published cross-sectional return predictors. Publication bias adjusted returns are only 12% smaller than in-sample returns. The small bias comes from the dispersion of returns across predictors, which is too large to be accounted for by data-mined noise. Among predictors that can survive journal review, a low t-stat hurdle of 1.8 controls for multiple testing using statistics recommended by Harvey, Liu, and Zhu (2015). The estimated bias is too small to account for the deterioration in returns after publication, suggesting an important role for mispricing.
    Keywords: Data mining ; Mispricing ; Publication bias ; Stock return anomalies
    JEL: G10 G12
    Date: 2018–05–11
    URL: http://d.repec.org/n?u=RePEc:fip:fedgfe:2018-33&r=ecm
  18. By: Afees A. Salisu (Centre for Econometric and Allied Research, University of Ibadan); Ibrahim D. Raheem (School of Economics, University of Kent, Canterbury, UK)
    Abstract: The study offers a new procedure that helps determine the best distribution prior to modeling stock returns with GARCH-type models. Specifically, it demonstrates that pre-testing the residuals of stock returns for the best distribution can help to identify the appropriate GARCH error distribution regardless of the choice of GARCH-type model. This approach is robust to alternative data frequencies and different stock markets such as those of G7 countries
    Keywords: Stock returns; GARCH-type models; Error distributions
    JEL: C52 C53 G11 G14 G17
    Date: 2018–06
    URL: http://d.repec.org/n?u=RePEc:cui:wpaper:0057&r=ecm
  19. By: Hélène Bouscasse (GAEL - Laboratoire d'Economie Appliquée de Grenoble - Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology - INRA - Institut National de la Recherche Agronomique - CNRS - Centre National de la Recherche Scientifique - UGA - Université Grenoble Alpes)
    Abstract: Mode choice depends on observable characteristics of the transport modes and of the decision maker, but also on unobservable characteristics, known as latent variables. By means of an integrated choice and latent variable (ICLV) model, which is a combination of structural equation model and discrete choice model, it is theoretically possible to integrate both types of variables in a psychologically and economically sound mode choice model. To achieve such a goal requires clear positioning on the four dimensions covered by ICLV models: survey methods, econometrics, psychology and economics. This article presents a comprehensive survey of the ICLV literature applied to mode choice modelling. I review how latent variables are measured and incorporated in the ICLV models, how they contribute to explaining mode choice and how they are used to derive economic outputs. The main results are: 1) the latent variables used to explain mode choice are linked to individual mental states, perceptions of transport modes, or an actual performed behaviour; 2) the richness of structural equation models still needs to be explored to fully embody the psychological theories explaining mode choice; 3) the integration of latent variables helps to improve our understanding of mode choice and to adapt public policies.
    Keywords: Mode choice,Survey,Integrated choice and latent variable model,Structural equation modelling,Behavioural theories,Economic outputs
    Date: 2018–05–02
    URL: http://d.repec.org/n?u=RePEc:hal:wpaper:hal-01795630&r=ecm
  20. By: Prüfer, Jens (Tilburg University, Center For Economic Research); Prüfer, Patricia (Tilburg University, Center For Economic Research)
    Abstract: To which extent can data science methods – such as machine learning, text analysis, or sentiment analysis – push the research frontier in the social sciences? This essay briefly describes the most prominent data science techniques that lend themselves to analyses of institutional and organizational governance structures. We elaborate on several examples applying data science to analyze legal, political, and social institutions and sketch how specific data science techniques can be used to study important research questions that could not (to the same extent) be studied without these techniques. We conclude by comparing the main strengths and limitations of computational social science with traditional empirical research methods and its relation to theory.
    Keywords: data science; maching learning; institutions; text analysis
    JEL: C50 C53 C87 D02 K0
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:tiu:tiucen:6d04f0fe-0bcd-4cf4-86f6-f2e0a86fa575&r=ecm

This nep-ecm issue is ©2018 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.