nep-big New Economics Papers
on Big Data
Issue of 2019‒02‒11
eleven papers chosen by
Tom Coupé
University of Canterbury

  1. Taking the Fed at its Word: Direct Estimation of Central Bank Objectives using Text Analytics By Shapiro, Adam Hale; Wilson, Daniel J.
  2. The Welfare Effects of Social Media By Hunt Allcott; Luca Braghieri; Sarah Eichmeyer; Matthew Gentzkow
  3. Investigating Limit Order Book Characteristics for Short Term Price Prediction: a Machine Learning Approach By Faisal I Qureshi
  4. Local cost for global benefit: The case of wind turbines By Frondel, Manuel; Kussel, Gerhard; Sommer, Stephan; Vance, Colin
  5. A Study on Neural Network Architecture Applied to the Prediction of Brazilian Stock Returns By Leonardo Felizardo; Afonso Pinto
  6. It is not only size that matters: How unique is the Estonian e-governance success story? By Stephany, Fabian
  7. Pricing options and computing implied volatilities using neural networks By Shuaiqiang Liu; Cornelis W. Oosterlee; Sander M. Bohte
  8. Factor Investing: Hierarchical Ensemble Learning By Guanhao Feng; Jingyu He
  9. Learning Choice Functions By Karlson Pfannschmidt; Pritha Gupta; Eyke H\"ullermeier
  10. Iassopack: Model Selection and Prediction with Regularized Regression in Stata By Ahrens, Achim; Hansen, Christian B.; Schaffer, Mark E
  11. Big Data, Data Mining, Machine Learning und Predictive Analytics: Ein konzeptioneller Überblick By Brühl, Volker

  1. By: Shapiro, Adam Hale (Federal Reserve Bank of San Francisco); Wilson, Daniel J. (Federal Reserve Bank of San Francisco)
    Abstract: There is an extensive literature that studies optimal monetary policy with an assumed central bank loss function, yet there has been very little study of what central bank preferences are in practice. We directly estimate the Federal Open Market Committee's (FOMC) loss function, including the implicit inflation target, from the tone of the language used in FOMC transcripts, minutes, and members' speeches. Direct estimation is advantageous because it requires no knowledge of the underlying macroeconomic structure nor observation of central bank actions. We fi nd that the FOMC had an implicit inflation target of approximately 1 1/2 percent on average over our baseline 2000-2012 sample period. We also find that the FOMC's loss depends strongly on output growth and stock market performance and less so on their perception of current slack.
    Date: 2019–01–18
  2. By: Hunt Allcott; Luca Braghieri; Sarah Eichmeyer; Matthew Gentzkow
    Abstract: The rise of social media has provoked both optimism about potential societal benefits and concern about harms such as addiction, depression, and political polarization. We present a randomized evaluation of the welfare effects of Facebook, focusing on US users in the run-up to the 2018 midterm election. We measured the willingness-to-accept of 2,844 Facebook users to deactivate their Facebook accounts for four weeks, then randomly assigned a subset to actually do so in a way that we verified. Using a suite of outcomes from both surveys and direct measurement, we show that Facebook deactivation (i) reduced online activity, including other social media, while increasing offline activities such as watching TV alone and socializing with family and friends; (ii) reduced both factual news knowledge and political polarization; (iii) increased subjective well-being; and (iv) caused a large persistent reduction in Facebook use after the experiment. We use participants' pre-experiment and post-experiment Facebook valuations to quantify the extent to which factors such as projection bias might cause people to overvalue Facebook, finding that the magnitude of any such biases is likely minor relative to the large consumer surplus that Facebook generates.
    JEL: D12 D90 I31 L86 O33
    Date: 2019–01
  3. By: Faisal I Qureshi
    Abstract: With the proliferation of algorithmic high-frequency trading in financial markets, the Limit Order Book has generated increased research interest. Research is still at an early stage and there is much we do not understand about the dynamics of Limit Order Books. In this paper, we employ a machine learning approach to investigate Limit Order Book features and their potential to predict short term price movements. This is an initial broad-based investigation that results in some novel observations about LOB dynamics and identifies several promising directions for further research. Furthermore, we obtain prediction results that are significantly superior to a baseline predictor.
    Date: 2018–12
  4. By: Frondel, Manuel; Kussel, Gerhard; Sommer, Stephan; Vance, Colin
    Abstract: Given the rapid expansion of wind power capacities in Germany, this paper estimates the effects of wind turbines on house prices using real estate price data from Germany's leading online broker. Employing a hedonic price model whose specification is informed by machine learning techniques, our methodological approach provides insights into the sources of heterogeneity in treatment effects. We estimate an average treatment effect (ATE) of up to -7.1% for houses within a one-kilometer radius of a wind turbine, an effect that fades to zero at a distance of 8 to 9 km. Old houses and those in rural areas are affected the most, while home prices in urban areas are hardly affected. These results highlight that substantial local externalities are associated with wind power plants.
    Keywords: wind power,hedonic price model
    JEL: Q21 D12 R31
    Date: 2018
  5. By: Leonardo Felizardo; Afonso Pinto
    Abstract: In this paper we present a statistical analysis about the characteristics that we intend to influence in the performance of the neural networks in terms of assertiveness in the prediction of Brazilian stock returns. We created a population of architectures for analysis and extracted the sample that had the best assertive performance. It was verified how the characteristics of this sample stand out and affect the neural networks. In addition, we make inferences about what kind of influence the different architectures have on the performance of neural networks. In the study, the prediction of the return of a Brazilian stock traded on the stock exchange of S\~ao Paulo to measure the error committed by the different architectures of constructed neural networks. The results are promising and indicate that some aspects of the neural network architecture have a significant impact on the assertiveness of the model.
    Date: 2019–01
  6. By: Stephany, Fabian
    Abstract: User data fuel the digital economy, while individual privacy is at stake. Governments react differently to this challenge. Estonia, a small Baltic state, has become a role model for the renewal of the social contract in times of big data. The Estonian example suggests that online governance is most successful in a small state, with a young population, trustworthy institutions and the need of technological renewal. This work examines the development of e-governance usage during the last decade in Europe from a comprehensive cross-country perspective: Size, age and trust are relevant for the development of digital governance in Europe. However, the quality of past communication infrastructure is not related to e-governance popularity.
    Keywords: E-Governance,Europe,Estonia,Random Model,Trust
    Date: 2018
  7. By: Shuaiqiang Liu; Cornelis W. Oosterlee; Sander M. Bohte
    Abstract: This paper proposes a data-driven approach, by means of an Artificial Neural Network (ANN), to value financial options and to calculate implied volatilities with the aim of accelerating the corresponding numerical methods. With ANNs being universal function approximators, this method trains an optimized ANN on a data set generated by a sophisticated financial model, and runs the trained ANN as an agent of the original solver in a fast and efficient way. We test this approach on three different types of solvers, including the analytic solution for the Black-Scholes equation, the COS method for the Heston stochastic volatility model and Brent's iterative root-finding method for the calculation of implied volatilities. The numerical results show that the ANN solver can reduce the computing time significantly.
    Date: 2019–01
  8. By: Guanhao Feng; Jingyu He
    Abstract: We present a Bayesian hierarchical framework for both cross-sectional and time-series return prediction. Our approach builds on a market-timing predictive system that jointly allows for time-varying coefficients driven by fundamental characteristics. With a Bayesian formulation for ensemble learning, we examine the joint predictability as well as portfolio efficiency via predictive distribution. In the empirical analysis of asset-sector allocation, our hierarchical ensemble learning portfolio achieves 500% cumulative returns in the period 1998-2017, and outperforms most workhorse benchmarks as well as the passive investing index. Our Bayesian inference for model selection identifies useful macro predictors (long-term yield, inflation, and stock market variance) and asset characteristics (dividend yield, accrual, and gross profit). Using the selected model for predicting sector evolution, an equally weighted long-short portfolio on winners over losers achieves a 46% Sharpe ratio with a significant Jensen's alpha. Finally, we explore an underexploited connection between classical Bayesian forecasting and modern ensemble learning.
    Date: 2019–02
  9. By: Karlson Pfannschmidt; Pritha Gupta; Eyke H\"ullermeier
    Abstract: We study the problem of learning choice functions, which play an important role in various domains of application, most notably in the field of economics. Formally, a choice function is a mapping from sets to sets: Given a set of choice alternatives as input, a choice function identifies a subset of most preferred elements. Learning choice functions from suitable training data comes with a number of challenges. For example, the sets provided as input and the subsets produced as output can be of any size. Moreover, since the order in which alternatives are presented is irrelevant, a choice function should be symmetric. Perhaps most importantly, choice functions are naturally context-dependent, in the sense that the preference in favor of an alternative may depend on what other options are available. We formalize the problem of learning choice functions and present two general approaches based on two representations of context-dependent utility functions. Both approaches are instantiated by means of appropriate neural network architectures, and their performance is demonstrated on suitable benchmark tasks.
    Date: 2019–01
  10. By: Ahrens, Achim (Economic and Social Research Institute, Dublin); Hansen, Christian B. (University of Chicago); Schaffer, Mark E (Heriot-Watt University, Edinburgh)
    Abstract: This article introduces lassopack, a suite of programs for regularized regression in Stata. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso and post-estimation OLS. The methods are suitable for the high-dimensional setting where the number of predictors p may be large and possibly greater than the number of observations, n. We offer three different approaches for selecting the penalization ('tuning') parameters: information criteria (implemented in lasso2), K-fold cross-validation and h-step ahead rolling cross-validation for cross-section, panel and time-series data (cvlasso), and theory-driven ('rigorous') penalization for the lasso and square-root lasso for cross-section and panel data (rlasso). We discuss the theoretical framework and practical considerations for each approach. We also present Monte Carlo results to compare the performance of the penalization approaches.
    Keywords: lasso2, cvlasso, rlasso, lasso, elastic net, square-root lasso, cross-validation
    JEL: C53 C87
    Date: 2019–01
  11. By: Brühl, Volker
    Abstract: Mit der fortschreitenden Digitalisierung von Wirtschaft und Gesellschaft wächst die Bedeutung von Big Data Analytics, maschinellem Lernen und Künstlicher Intelligenz für die Analyse und Pognose ökonomischer Trends. Allerdings werden in wirtschaftspolitischen Diskussionen diese Begriffe häufig verwendet, ohne dass jeweils klar zwischen den einzelnen Methoden und Disziplinen differenziert würde. Daher soll nachfolgend ein konzeptioneller Überblick über die Gemeinsamkeiten, Unterschiede und Interdependenzen der vielfältigen Begrifflichkeiten im Bereich Data Science gegeben werden. Denn gerade für Entscheidungsträger aus Wirtschaft und Politik kann eine grundlegende Einordnung der Konzepte eine sachgerechte Diskussion über politische Weichenstellungen erleichtern.
    JEL: A10 C10 D80
    Date: 2019

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.