nep-big New Economics Papers
on Big Data
Issue of 2018‒10‒01
fifteen papers chosen by
Tom Coupé
University of Canterbury

  1. Illuminating Indigenous Economic Development By Donna Feir; Rob Gillezeau; Maggie Jones
  2. Media based sentiment indices as an alternative measure of consumer confidence By Nicolaas Johannes Odendaal; Monique Reid
  3. Predicting Credit Demand with ARMS: A Machine Learning Approach By Ifft, Jennifer E.; Kuhns, Ryan; Patrick, Kevin T.
  4. What are the Culprits Causing Obesity? A Machine Learning Approach in Variable Selection and Parameter Coefficient Inference By Zhu, Manhong; Schmitz, Andrew; Schmitz, Troy G.
  5. Efficient Difference-in-Differences Estimation with High-Dimensional Common Trend Confounding By Michael Zimmert
  6. Determinants of Corporate Failure: The Case of the Johannesburg Stock Exchange By Mabe, Queen Magadi; Lin, Wei
  7. The consequences of cyclone and seasonal crop risks for wealth and technology adoption in rural Mozambique By Larson, D.
  8. Topological recognition of critical transitions in time series of cryptocurrencies By Marian Gidea; Daniel Goldsmith; Yuri Katz; Pablo Roldan; Yonah Shmalo
  9. Artificial Neural Network Based Chaotic Generator Design for The Prediction of Financial Time Series By Lei Zhang
  10. A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations By Philipp Grohs; Fabian Hornung; Arnulf Jentzen; Philippe von Wurstemberger
  11. Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R) By Philipp Bach; Victor Chernozhukov; Martin Spindler
  12. Deep Reinforcement Learning in High Frequency Trading By Prakhar Ganesh; Puneet Rakheja
  13. The NEU Meta-Algorithm for Geometric Learning with Applications in Finance By Anastasis Kratsios; Cody B. Hyndman
  14. Semiparametric Panel Data Using Neural Networks By Crane-Droesch, Andrew
  15. Study on Management and Utilization of Data Generated from Industry (Japanese) By WATANABE Toshiya; HIRAI Yuri; AKUTSU Masami; HIOKI Tomomi; NAGAI Norihito

  1. By: Donna Feir (Department of Economics, University of Victoria); Rob Gillezeau (Department of Economics, University of Victoria); Maggie Jones (Department of Economics, University of Victoria)
    Abstract: There are over 1,000 First Nations and Inuit communities in Canada. Only 357 of these communities are consistently included in the most comprehensive public data source on economic activity, the Community Well-Being (CWB) Database. We propose using nighttime light density measured by satellites as an alternative indicator of well-being. We show that nighttime light density is an effective proxy for per capita income in the Canadian context and provide evidence that existing publicly available databases on well-being consist of heavily selected samples that systematically exclude many of the least developed communities. We show that sample selection into the publicly available data can lead to incorrect conclusions based on three applications: (i) the comparison of well-being across community types over time; (ii) an analysis of the historical and geographic determinants of economic activity in Indigenous communities; and (iii) a study of the effects of mining intensity close to Indigenous communities. Based on these applications, we suggest that using nighttime light density overcomes the biased selection of communities into the publicly available samples and, thus, may present a more complete picture of economic activity in Canada for Indigenous peoples. JEL Classification: I15, J15,J24
    Keywords: light density,nighttime light density,Indigenous peoples,economic development,community well-being index
    Date: 2018–09–19
  2. By: Nicolaas Johannes Odendaal (Department Economics and Bureau of Economic Research, Stellenbosch University); Monique Reid (Department Economics, Stellenbosch University)
    Abstract: The world is currently generating data at an uprecedented rate. Embracing the data revolution, case studies on the construction of alternative consumer confidence indices using large text datasets have started to make its way into the academic literature. These 'sentiment indices' are constructed using text-based analysis. A subfield within computational linguistics. In this paper we consider the feasibility of constructing online sentiment indices using large amounts of media data as an alternative for the conventional survey method in South Africa. A clustering framework is adopted to provide an indication of feasible cadidate sentiment indices that best reflect the traditional survey based confidence consumer index conducted by the BER. The results indicate that the best candidate indices are linked to a single data source with a focus on using specialised financial dictionaries. Finally, composite indices for consumer confidence is constructed using Principle Component Analysis. The resulting indices' high correlation with the traditional consumer confidence index provide motivation for using media data sources to track consumer confidence within an emerging market such as South Africa using sentiment based techniques
    Keywords: Big Data, Sentiment Analysis, Consumer Confidence
    JEL: B41 C52 C83
    Date: 2018
  3. By: Ifft, Jennifer E.; Kuhns, Ryan; Patrick, Kevin T.
    Keywords: Agricultural Finance, Agribusiness, Agricultural and Food Policy
    Date: 2017–06–15
  4. By: Zhu, Manhong; Schmitz, Andrew; Schmitz, Troy G.
    Keywords: Research Methods/Statistical Methods, Food Consumption/Nutrition/Food Safety, Institutional and Behavioral Economics
  5. By: Michael Zimmert
    Abstract: We contribute to the theoretical literature on difference-in-differences estimation for policy evaluation by allowing the common trend assumption to hold conditional on a high-dimensional covariate set. In particular, the covariates can enter the difference-in-differences model in a very flexible form leading to estimation procedures that involve supervised machine learning methods. We derive asymptotic results for semiparametric and parametric estimators for repeated cross-sections and panel data and show desirable statistical properties. Notably, a non-standard semiparametric efficiency bound for difference-in-differences estimation that incorporates the repeated cross-section case is established. Our proposed semiparametric estimator is shown to attain this bound. The usability of the methods is assessed by replicating a study on an employment protection reform. We demonstrate that the notion of high-dimensional common trend confounding has implications for the economic interpretation of policy evaluation results via difference-in-differences.
    Date: 2018–09
  6. By: Mabe, Queen Magadi; Lin, Wei
    Abstract: The aim of this paper is to estimate the probability of default for JSE listed companies. Our distinctive contribution is to use the multi-sector approach in estimating corporate failure instead of estimating failure in one sector, as failing companies are faced with the same challenge regardless of the sectors they operate in. The study creates a platform to identify the effect of Book-value to Market-value ratio on the probability to default, as this variable is often used as a proxy for corporate default in asset pricing models. Moreover, the use of Classification and Regression Trees uncovers other variables as reliable predictors to estimate corporate failure as the model is designed to choose the covariates with respect to classification ability. Our study also serves to add to the literature on how Logistic model performance compares to Machine Learning methods such as Classification and Regression Trees and Support Vector Machines. The study is the first to apply Support Vector Machines to predict failure on South African listed companies.
    Keywords: Corporate default, Logistic Regression, Support Vector Machines, Classification and Regression Trees.
    JEL: C61 G33
    Date: 2018–08–08
  7. By: Larson, D.
    Abstract: In this paper we examine the consequences of extreme weather events on agricultural livelihood choices and welfare outcomes among rural households in Mozambique. We do so by first building a unique historical record of local (enumeration-area) weather event that we match with household survey data. We build the event history by drawing on daily spatial datasets for rainfall and temperature from 1981 to 2015. We build a spatial history of agricultural droughts in Mozambique that account for regional differences in growing seasons. We also utilize for the first time a dataset that maps the impact of all named tropical storms affecting Mozambique from 1968 to 2015. We use geo-referenced household data from 7,400 households in Mozambique to identify production technology choices and measure asset accumulations. Exploiting spatial cross-sectional variations, we show how weather risks adversely affect household choices about production technologies and input use. We show how past exposure to extreme weather events, including typhoons and droughts, adversely impact productive stock accumulations and household wealth.
    Keywords: Crop Production/Industries, Research and Development/Tech Change/Emerging Technologies
    Date: 2018–07
  8. By: Marian Gidea; Daniel Goldsmith; Yuri Katz; Pablo Roldan; Yonah Shmalo
    Abstract: We analyze the time series of four major cryptocurrencies (Bitcoin, Ethereum, Litecoin, and Ripple) before the digital market crash at the end of 2017 - beginning 2018. We introduce a methodology that combines topological data analysis with a machine learning technique -- $k$-means clustering -- in order to automatically recognize the emerging chaotic regime in a complex system approaching a critical transition. We first test our methodology on the complex system dynamics of a Lorenz-type attractor, and then we apply it to the four major cryptocurrencies. We find early warning signals for critical transitions in the cryptocurrency markets, even though the relevant time series exhibit a highly erratic behavior.
    Date: 2018–09
  9. By: Lei Zhang (University of Regina)
    Abstract: series. The ANN architecture is usually designed and optimized based on trial and error using a given training data set. It is generally required to obtain big data for ANN training in order to achieve good training performance. Financial time series are subject to highly complex conditions of external inputs and their dynamic features can change fast and unpredictably. The aim of this research is to design an adaptive ANN architecture, which can be trained in real time with short time series for near future prediction. ANN based chaotic system generator is designed for the simulation and analysis of the dynamic features in financial time series.
    Keywords: Aritificial Neural Network (ANN), chaotic generator, financial time series, prediction, optimizaiton
    JEL: C45 C52 C61
    Date: 2018–06
  10. By: Philipp Grohs; Fabian Hornung; Arnulf Jentzen; Philippe von Wurstemberger
    Abstract: Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, such numerical simulations indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are also a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of these mathematical results prove convergence without convergence rates and some of these mathematical results even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ and the PDE dimension $d \in \mathbb{N}$ and we thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs.
    Date: 2018–09
  11. By: Philipp Bach; Victor Chernozhukov; Martin Spindler
    Abstract: Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and efficient hypothesis tests for a potentially large number of coeffcients as well as the construction of simultaneous confidence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO.
    Date: 2018–09
  12. By: Prakhar Ganesh; Puneet Rakheja
    Abstract: The ability to give a precise and fast prediction for the price movement of stocks is the key to profitability in High Frequency Trading. The main objective of this paper is to propose a novel way of modeling the high frequency trading problem using Deep Reinforcement Learning and to argue why Deep RL can have a lot of potential in the field of High Frequency Trading. We have analyzed the model's performance based on it's prediction accuracy as well as prediction speed across full-day trading simulations.
    Date: 2018–09
  13. By: Anastasis Kratsios; Cody B. Hyndman
    Abstract: We introduce a meta-algorithm, called non-Euclidean upgrading (NEU), which learns algorithm-specific geometries to improve the training and validation set performance of a wide class of learning algorithms. Our approach is based on iteratively performing local reconfigurations of the space in which the data lie. These reconfigurations build universal approximation and universal reconfiguration properties into the new algorithm being learned. This allows any set of features to be learned by the new algorithm to arbitrary precision. The training and validation set performance of NEU is investigated through implementations predicting the relationship between select stock prices as well as finding low-dimensional representations of the German Bond yield curve.
    Date: 2018–08
  14. By: Crane-Droesch, Andrew
    Keywords: Research Methods/Statistical Methods, Land Economics/Use, Productivity Analysis
    Date: 2017–06–15
  15. By: WATANABE Toshiya; HIRAI Yuri; AKUTSU Masami; HIOKI Tomomi; NAGAI Norihito
    Abstract: As the Internet of Things (IoT), big data, and artificial intelligence (AI) progress in the Fourth Industrial Revolution, data are expected to bring innovative outcomes. Against this background, a questionnaire survey was conducted on 6,278 firms with the aim of grasping the actual situation of data utilization in Japan and the important factors in obtaining outcomes through its use, with 562 effective responses collected. The results of our analysis reveal that, in order to obtain outcomes by utilizing data, it is important to master the contract model, design data sufficiently, and interact smoothly with stakeholders. In addition, we prepare three model cases of businesses using machine learning. We then examine rational and practical contracts on data utilization to provide commercially useful services, and organize issues.
    Date: 2018–09

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.