nep-big New Economics Papers
on Big Data
Issue of 2018‒02‒05
nine papers chosen by
Tom Coupé
University of Canterbury

  1. Macroeconomic Nowcasting and Forecasting with Big Data By Bok, Brandyn; Caratelli, Daniele; Giannone, Domenico; Sbordone, Argia; Tambalotti, Andrea
  2. Planning ahead for better neighborhoods: long run evidence from Tanzania By Michaels, Guy; Nigmatulina, Dzhamilya; Rauch, Ferdinand; Regan, Tanner; Baruah, Neeraj; Dahlstrand-Rudin, Amanda
  3. Man vs. Machine in Predicting Successful Entrepreneurs: Evidence from a Business Plan Competition in Nigeria By McKenzie, David J.; Sansone, Dario
  4. Hedge Fund Return Prediction and Fund Selection: A Machine-Learning Approach By Chen, Jiaqi; Wu, Wenbo; Tindall, Michael
  5. Predicting Psychology Attributes of a Social Network User By Khayrullin, Rustem M.; Makarov, Ilya; Zhukov, Leonid E.
  6. The Right to Money as the Fundamental Right of Individuals in the Coming Digital Economy By Hegadekatti, Kartik
  7. Black-Box Classification Techniques for Demographic Sequences : from Customised SVM to RNN By Muratova, Anna; Sushko, Pavel; Espy, Thomas H.
  8. Using decision tree classifier to predict income levels By Bekena, Sisay Menji
  9. Part 1: Training Sets & ASG Transforms By Rilwan Adewoyin

  1. By: Bok, Brandyn; Caratelli, Daniele; Giannone, Domenico; Sbordone, Argia; Tambalotti, Andrea
    Abstract: Data, data, data ... Economists know their importance well, especially when it comes to monitoring macroeconomic conditions -- the basis for making informed economic and policy decisions. Handling large and complex data sets was a challenge that macroeconomists engaged in real-time analysis faced long before "big data" became pervasive in other disciplines. We review how methods for tracking economic conditions using big data have evolved over time and explain how econometric techniques have advanced to mimic and automate best practices of forecasters on trading desks, at central banks, and in other market-monitoring roles. We present in detail the methodology underlying the New York Fed Staff Nowcast, which employs these innovative techniques to produce early estimates of GDP growth, synthesizing a wide range of macroeconomic data as they become available.
    Keywords: business cycle analysis; high-dimensional data; monitoring economic conditions; real-time data flow
    JEL: C32 C53 E3
    Date: 2018–01
  2. By: Michaels, Guy; Nigmatulina, Dzhamilya; Rauch, Ferdinand; Regan, Tanner; Baruah, Neeraj; Dahlstrand-Rudin, Amanda
    Abstract: What are the long run consequences of planning and providing basic infrastructure in neighborhoods, where people build their own homes? We study "Sites and Services" projects implemented in seven Tanzanian cities during the 1970s and 1980s, half of which provided infrastructure in previously unpopulated areas (de novo neighborhoods), while the other half upgraded squatter settlements. Using satellite images and surveys from the 2010s, we find that de novo neighborhoods developed better housing than adjacent residential areas (control areas) that were also initially unpopulated. Specifically, de novo neighborhood are more orderly and their buildings have larger footprint areas and are more likely to have multiple stories, as well as connections to electricity and water, basic sanitation and access to roads. And though de novo neighborhoods generally attracted better educated residents than control areas, the educational difference is too small to account for the large difference in residential quality that we find. While we have no natural counterfactual for the upgrading areas, descriptive evidence suggests that they are if anything worse than the control areas
    Keywords: urban economics; economic development; slums; Africa
    JEL: O18 R14 R31
    Date: 2017–09–01
  3. By: McKenzie, David J.; Sansone, Dario
    Abstract: We compare the relative performance of man and machine in being able to predict outcomes for entrants in a business plan competition in Nigeria. The first human predictions are business plan scores from judges, and the second are simple ad-hoc prediction models used by researchers. We compare these (out-of-sample) performances to those of three machine learning approaches. We find that i) business plan scores from judges are uncorrelated with business survival, employment, sales, or profits three years later; ii) a few key characteristics of entrepreneurs such as gender, age, ability, and business sector do have some predictive power for future outcomes; iii) modern machine learning methods do not offer noticeable improvements; iv) the overall predictive power of all approaches is very low, highlighting the fundamental difficulty of picking winners; and v) our models can do twice as well as random selection in identifying firms in the top tail of performance.
    Keywords: business plans; entrepreneurship; Machine Learning; Nigeria
    JEL: C53 L26 M13 O12
    Date: 2017–12
  4. By: Chen, Jiaqi (Federal Reserve Bank of Dallas); Wu, Wenbo (University of Oregon); Tindall, Michael (Federal Reserve Bank of Dallas)
    Abstract: A machine-learning approach is employed to forecast hedge fund returns and perform individual hedge fund selection within major hedge fund style categories. Hedge fund selection is treated as a cross-sectional supervised learning process based on direct forecasts of future returns. The inputs to the machine-learning models are observed hedge fund characteristics. Various learning processes including the lasso, random forest methods, gradient boosting methods, and deep neural networks are applied to predict fund performance. They all outperform the corresponding style index as well as a benchmark model, which forecasts hedge fund returns using macroeconomic variables. The best results are obtained from machine-learning processes that utilize model averaging, model shrinkage, and nonlinear interactions among the factors.
    Keywords: Hedge fund selection; hedge fund return prediction; machine learning; the lasso; random forest; gradient boosting; deep neural networks
    Date: 2016–11–01
  5. By: Khayrullin, Rustem M.; Makarov, Ilya; Zhukov, Leonid E.
    Abstract: Nowadays, the number of people using social network site increases every day. The social networking sites, such as Facebook or Twitter, are sources of human interaction, where users are allowed to create and share their activities, thoughts and place di erent information about themselves. However, most of this information remains unnoticed. In this work, we propose a machine learning approach to predict Big-Five personality using information from users accounts from the social network. The predictions can be used in di erent areas such as psychology, business, marketing.
    Keywords: Social Networks, Machine Learning, Psychology, Big Five Personality, Shwartz Human Values
    JEL: D71 Z13
    Date: 2017–09–17
  6. By: Hegadekatti, Kartik
    Abstract: Poverty has been a common feature in all human societies since the dawn of civilization. Purchasing power of an individual decides her standard of living. In many cases, it even decides whether a person can live or not (eg: in starvation or malnourishment, victims have no purchasing power to buy calories). As such, the Right to Life philosophy of many National Constitutions comes to naught if the state cannot ensure adequate purchasing power for its people. Thus, an individual should have Right to Money in order to live with respect and dignity. In this paper, we will explore the concept of the Right to Money and how it is linked to the Right to Life. We will see how the Right to Money concept can ensure a continued economic expansion even in a scenario when automation has reached a critical point (i.e Technological Singularity). Right to Money can also ensure continued human dominance over Machine Intelligence as and when they arise. Interestingly the Right to Money leads to another advanced concept – The Right to Machines which will make certain that there is continued synergy between human and artificial intelligence in future and that the Human race stays relevant. The paper concludes as to how human society can be best prepared economically (or otherwise) for a Post-Technological Singularity scenario.
    Keywords: Right to Life, Economic Right, Money, Life
    JEL: D31 H55 O32 O33 P24 P36 P48 Q55
    Date: 2017–04–23
  7. By: Muratova, Anna; Sushko, Pavel; Espy, Thomas H.
    Abstract: Nowadays there is a large amount of demographic data which should be analysed and interpreted. From accumulated demographic data, more useful information can be extracted by applying modern methods of data mining. The aim of this study is to compare the methods of classification of demographic data by customising the SVM kernels using various similarity measures. Since demographers are interested in sequences without discontinuity, formulas for such sequences similarity measures were derived. Then they were used as kernels in the SVM method, which is the novelty of this study. Recurrent neural network algorithms, such as Simple RNN, GRU and LSTM, are also compared. The best classification result with SVM method is obtained using a special kernel function in SVM by transforming sequences into features, but recurrent neural network outperforms SVM.
    Keywords: data mining, demographics, support vector machines, neural networks, classification, sequences similarity
    JEL: C14 J11
    Date: 2017–09–17
  8. By: Bekena, Sisay Menji
    Abstract: In this study Random Forest Classifier machine learning algorithm is applied to predict income levels of individuals based on attributes including education, marital status, gender, occupation, country and others. Income levels are defined as a binary variable 0 for income
    Keywords: random-forest classifier, data science
    JEL: A10 D1 D10
    Date: 2017–07–30
  9. By: Rilwan Adewoyin
    Abstract: In this paper, I discuss a method to tackle the issues arising from the small data-sets available to data-scientists when building price predictive algorithms that use monthly/quarterly macro-financial indicators. I approach this by training separate classifiers on the equivalent dataset from a range of countries. Using these classifiers, a three level meta learning algorithm (MLA) is developed. I develop a transform, ASG, to create a country agnostic proxy for the macro-financial indicators. Using these proposed methods, I investigate the degree to which a predictive algorithm for the US 5Y bond price, predominantly using macro-financial indicators, can outperform an identical algorithm which only uses statistics deriving from previous price.
    Date: 2017–12

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.