nep-big New Economics Papers
on Big Data
Issue of 2018‒07‒09
twelve papers chosen by
Tom Coupé
University of Canterbury

  1. A Machine Learning Framework for Stock Selection By XingYu Fu; JinHong Du; YiFeng Guo; MingWen Liu; Tao Dong; XiuWen Duan
  2. Hedonic Recommendations: An Econometric Application on Big Data By Okay Gunes
  3. The Lights of Iraq: Electricity Usage and the Iraqi War-fare Regime By Cerami, Alfio
  4. Textual Sentiment, Option Characteristics, and Stock Return Predictability By Yi-Hsuan Chen, Cathy; Fengler, Matthias; Härdle, Wolfgang Karl; Liu, Yanchu
  5. The Night Lights of North Korea. Prosperity Shining and Public Policy Governance By Cerami, Alfio
  6. Using satellite data to track socio-economic outcomes: a case study of Namibia By Thomas Ferreira
  7. A Deep Learning Based Illegal Insider-Trading Detection and Prediction Technique in Stock Market By Sheikh Rabiul Islam
  8. The Rise of the Robot Reserve Army: Automation and the Future of Economic Development, Work, and Wages in Developing Countries By Lukas Schlogl; Andy Sumner
  9. Women across Subfields in Economics: Relative Performance and Beliefs By P. Beneito; J. E. Boscá; J. Ferri; M. García
  10. On the backtesting of trading strategies By Yen H. Lok
  11. E-commerce Development and Entrepreneurship in the People’s Republic of China By Huang, Bihong; Shaban, Mohamed; Song, Quanyun; Wu, Yu
  12. Geography Dictates, But How? Topography, Spatial Concentration and Sectoral Diversification By Chowdhury, Mohammad Tarequl Hasan; Rahman, Muhammad Habibur; Ulubasoglu, Mehmet Ali

  1. By: XingYu Fu; JinHong Du; YiFeng Guo; MingWen Liu; Tao Dong; XiuWen Duan
    Abstract: This paper demonstrates how to apply machine learning algorithms to distinguish good stocks from the bad stocks. To this end, we construct 244 technical and fundamental features to characterize each stock, and label stocks according to their ranking with respect to the return-to-volatility ratio. Algorithms ranging from traditional statistical learning methods to recently popular deep learning method, e.g. Logistic Regression (LR), Random Forest (RF), Deep Neural Network (DNN), and the Stacking, are trained to solve the classification task. Genetic Algorithm (GA) is also used to implement feature selection. The effectiveness of the stock selection strategy is validated in Chinese stock market in both statistical and practical aspects, showing that: 1) Stacking outperforms other models reaching an AUC score of 0.972; 2) Genetic Algorithm picks a subset of 114 features and the prediction performances of all models remain almost unchanged after the selection procedure, which suggests some features are indeed redundant; 3) LR and DNN are radical models; RF is risk-neutral model; Stacking is somewhere between DNN and RF. 4) The portfolios constructed by our models outperform market average in back tests.
    Date: 2018–06
  2. By: Okay Gunes (CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique)
    Abstract: This work will demonstrate how economic theory can be applied to big data analysis. To do this, I propose two layers of machine learning that use econometric models introduced into a recommender system. The reason for doing so is to challenge traditional recommendation approaches. These approaches are inherently biased due to the fact that they ignore the final preference order for each individual and under-specify the interaction between the socio-economic characteristics of the participants and the characteristics of the commodities in question. In this respect, our hedonic recommendation approach proposes to first correct the internal preferences with respect to the tastes of each individual under the characteristics of given products. In the second layer, the relative preferences across participants are predicted by socio-economic characteristics. The robustness of the model is tested with the MovieLens (100k data consists of 943 users over 1682 movies) run by GroupLens. Our methodology shows the importance and the necessity of correcting the data set by using economic theory. This methodology can be applied for all recommender systems using ratings based on consumer decisions.
    Abstract: Ce travail démontre comment la théorie économique peut être appliquée à l'analyse de Big Data. On propose deux couches d'apprentissage automatique qui utilisent des modèles économétriques introduits dans un système de recommandation. La raison de le faire est de remettre en question les approches de recommandation traditionnelles. Ces approches sont intrinsèquement biaisées en raison du fait qu'elles ignorent l'ordre de préférence final pour chaque individu et sous-spécifient l'interaction entre les caractéristiques socio-économiques des participants et les caractéristiques des produits en question. A cet égard, notre approche de recommandation hédonique propose de corriger d'abord les préférences internes par rapport aux go&ucric;ts de chaque individu en fonction des caractéristiques des produits donnés. Dans la deuxième couche, les préférences relatives entre les participants sont prédites par les caractéristiques socio-économiques. La robustesse du modèle est testée avec les MovieLens (100k données se composent de 943 utilisateurs sur 1682 films) gérés par GroupLens. Notre méthodologie montre l'importance et la nécessité de corriger l'ensemble de données en utilisant la théorie économique. Cette méthodologie peut être appliquée à tous les systèmes de recommandation qui utilisent des votes basées sur les décisions.
    Keywords: Big Data,Machine learning,Recommendation Engine,Econometrics,Données massives,Python,R,Apprentissage automatique,Système recommandation,Econométrie
    Date: 2017–12
  3. By: Cerami, Alfio
    Abstract: This article explores the lights of Iraq, Iraq's variety of capitalism (VoC) and its system of public and fiscal governance. The first section examines Iraq's VoC, which I define oil-led state-captured capitalism with associated oil-led state-captured war-fare regime. In formerly ISIS-occupied territories, war developments turned the system into an Insurgent ISIS-captured capitalism with associated Insurgent ISIS-captured war-fare regime. The second section investigates electricity usage. The nighttime lights analysis is based on near real-time big data. It includes high-resolution remote-sensing and satellite imagery from the NASA Earth Observatory. I use the Visible Infrared Imaging Radiometer Suite (VIIRS) sensor on the Suomi NPP satellite. Data on greenhouse gases are obtained through the AQUA and TERRA satellites derived from the Atmospheric Infrared Sounder (AIRS) and Moderate-resolution Imaging Spectroradiometer (MODIS) sensors. I also use the AURA satellite with the Ozone Monitoring Instrument (OMI) sensor, as well as the TERRA satellite with the Measurements of Pollution in the Troposphere (MOPITT) sensor. The third part discusses the repercussions of electricity usage for good governance, for good regulatory and for good fiscal practices, as well as for development and growth. The concluding part briefly discusses the “taxman approach” and the introduction of a new fiscal contract necessary to resolve negative incentives in oil-led war economies.
    Keywords: Iraq, political economy, ISIS, geo-spatial analysis, night lights, remote-sensing, satellite imagery, public governance, fiscal governance, oil-led state-captured capitalism, oil-led state-captured war-fare regime, state capture, policy capture.
    JEL: C1 O11 O12 P16 P45
    Date: 2018–06–11
  4. By: Yi-Hsuan Chen, Cathy; Fengler, Matthias; Härdle, Wolfgang Karl; Liu, Yanchu
    Abstract: We distill sentiment from a huge assortment of NASDAQ news articles by means of machine learning methods and examine its predictive power in single-stock option markets and equity markets. We provide evidence that single-stock options react to contemporaneous sentiment. Next, examining return predictability, we discover that while option variables indeed predict stock returns, sentiment variables add further informational content. In fact, both in a regression and a trading context, option variables orthogonalized to public and sentimental news are even more informative predictors of stock returns. Distinguishing further between overnight and trading-time news, we find the first to be more informative. From a statistical topic model, we uncover that this is attributable to the differing thematic coverage of the alternate archives. Finally, we show that sentiment disagreement commands a strong positive risk premium above and beyond market volatility and that lagged returns predict future returns in concentrated sentiment environments.
    Keywords: investor disagreement, option markets, overnight information, stock return predictability, textual sentiment, topic model, trading-time information
    JEL: C58 G12 G14
    Date: 2018–06
  5. By: Cerami, Alfio
    Abstract: This article looks into the night lights of North Korea and their relationship to prosperity shining. The first introductory section discusses the political economy of North Korea. It highlights its strengths and shortcomings. The second section introduces to new methods of geo-spatial micro and macro econometric analysis. The following night lights analysis is based on near real-time big data. It includes high-resolution remote-sensing and satellite imagery from the NASA (Earth Obervatory) Visible Infrared Imaging Radiometer Suite (VIIRS) sensor on the Suomi NPP satellite. The third and fourth section addresses important issues related to North Korea's prices, co-optation and mobilization of anger. The final section deal with problems in public policy administration.
    Keywords: North Korea, Varieties of Capitalism, Night Lights, Satellite Imagery, Political Economy, public policy administration, buyngjing line.
    JEL: C1 F1 F5 O11 O12 P2
    Date: 2018–06–11
  6. By: Thomas Ferreira (Department of Economics, Stellenbosch University)
    Abstract: Efforts to improve the livelihoods of the poor in sub-Saharan Africa are hindered by data deficiencies. Surveys on socio-economic outcomes, for example, are generally conducted infrequently and are only statistically representative for relatively large geographic areas. To overcome these data limitations, researchers are increasingly turning to satellites which capture data for small areas at high frequencies. Night lights satellite data has particularly drawn interest and growth in lights have been shown to be a useful proxy for GDP growth (Henderson et al., 2012). However, in poor agricultural regions, night lights data might be less useful in explaining variation in socio-economic outcomes because such regions are generally under-electrified. Daytime satellite data measuring land use and vegetation quality, have been used to model socio-economic outcomes across regions, but no studies have explored whether daytime satellite data can be used to track welfare longitudinally. This paper argues that indicators of vegetation quality can be used to track welfare over time in agriculturally dominant areas. Such indicators are used extensively to predict agricultural yields and thus should correlate with welfare, as agriculture is an important source of income. This paper presents results from a small study in Namibia, that explores whether this is the case. Firstly, it is shown using classification of cropland, that daytime satellite data can identify areas of economic activity where night lights cannot. Secondly the relationship between vegetation quality and welfare is studied. Cross-sectionally, increases in vegetation quality correlate negatively with welfare. This is expected as the poor are more likely to live in rural areas. Within rural areas, however, vegetation quality correlates positively with welfare. This study thus supports the hypothesis that satellite based indicators of vegetative health can be used to track welfare over time in areas where night lights are not present.
    Keywords: Satellites, Night Lights, Normalised Differenced Vegetation Index, Agriculture, Poverty, Namibia
    JEL: I32 O13 Q56
    Date: 2018
  7. By: Sheikh Rabiul Islam
    Abstract: The stock market is a nonlinear, nonstationary, dynamic, and complex system. There are several factors that affect the stock market conditions, such as news, social media, expert opinion, political transitions, and natural disasters. In addition, the market must also be able to handle the situation of illegal insider trading, which impacts the integrity and value of stocks. Illegal insider trading occurs when trading is performed based on non-public (private, leaked, tipped) information (e.g., new product launch, quarterly financial report, acquisition or merger plan) before the information is made public. Preventing illegal insider trading is a priority of the regulatory authorities (e.g., SEC) as it involves billions of dollars, and is very difficult to detect. In this work, we present different types of insider trading approaches, techniques and our proposed approach for detecting and predicting insider trader using a deep-learning based approach combined with discrete signal processing on time series data.
    Date: 2018–07
  8. By: Lukas Schlogl (King’s College London); Andy Sumner (King’s College London)
    Abstract: Employment generation is crucial to spreading the benefits of economic growth broadly and to reducing global poverty. And yet, emerging economies face a contemporary challenge to traditional pathways to employment generation: automation, digitalization, and labor-saving technologies. 1.8 billion jobs—or two-thirds of the current labor force of developing countries—are estimated to be susceptible to automation from today’s technological standpoint. Cumulative advances in industrial automation and labor-saving technologies could further exacerbate this trend. Or will they? In this paper we: (i) discuss the literature on automation; and in doing so (ii) discuss definitions and determinants of automation in the context of theories of economic development; (iii) assess the empirical estimates of employment-related impacts of automation; (iv) characterize the potential public policy responses to automation; and (v) highlight areas for further exploration in terms of employment and economic development strategies in developing countries. In an adaption of the Lewis model of economic development, the paper uses a simple framework in which the potential for automation creates “unlimited supplies of artificial labor” particularly in the agricultural and industrial sectors due to technological feasibility. This is likely to create a push force for labor to move into the service sector, leading to a bloating of service-sector employment and wage stagnation but not to mass unemployment, at least in the short-to-medium term.
    Date: 2018–07–02
  9. By: P. Beneito; J. E. Boscá; J. Ferri; M. García
    Abstract: The relative scarcity of female students enrolling in economics has become entrenched over the last decade. We provide evidence of gender differences in performance and in preferences across subfields of the discipline and explore students’ beliefs about the profession and their opinions on different subjects. The areas where women stand out relative to men are those that seem to be least well known to our students. We work on three fronts. First, using web scraping and machine learning techniques, we document the relative presence of women across subfields in recent AEA annual meetings. Macroeconomics and finance register the greatest scarcity of women. Second, using administrative records for economics students in a large public university in Spain from 2010 to 2014, we find that women outperform men in microeconomics, while men outperform women in macroeconomics, more evidently in the upper tail of the grades distribution. Finally, data gathered through a self-statement survey given to economics majors reveal that (i) they hold a macroeconomics-biased view of the economics profession; (ii) they exhibit gender differences in their perceptions of the interest and difficulty inherent in different subfields (macro vs. microeconomics); and (iii) their interests and performance are influenced differently by their male and female peers in macro and microeconomics subjects. Taken together, these three pieces of evidence provide a plausible explanation as to why women are relatively less attracted than men to economics, and suggest lines of action to redress the imbalance
    Date: 2018–06
  10. By: Yen H. Lok
    Abstract: The contribution of this paper is two-fold. The first contribution is the development of a filter-combine scheme for trading strategies to diversify model risk. Multiple statistical machine learning models are used to predict the price direction of multiple assets. We demonstrate the effectiveness of model-averaging after under-performing models are removed via a filtering algorithm. The second contribution is the identification of appropriate measures of performance for selecting models. In the literature, different measures are usually designed for different applications and purposes, and it is not always clear as to whether certain measures are relevant to a particular trading strategy. By identifying relevant measures, one can identify the key drivers underlying well-performing models, and allocate more resources in optimising and improving the appropriate models.
    JEL: C51 C52
    Date: 2018–06–22
  11. By: Huang, Bihong (Asian Development Bank Institute); Shaban, Mohamed (Asian Development Bank Institute); Song, Quanyun (Asian Development Bank Institute); Wu, Yu (Asian Development Bank Institute)
    Abstract: We utilize an e-commerce development indicator in tandem with big data to measure the variations of e-commerce development across counties in the People’s Republic of China and assess its impact on entrepreneurship in both rural and urban areas. We find that households living in counties with higher levels of e-commerce development are more likely to run their own businesses. Further study indicates that e-commerce development not only significantly increases the entry of new startups but also decreases the exit of incumbent businesses. We also find that e-commerce development induces sectoral change of household entrepreneurship. It promotes entrepreneurship in the manufacturing and wholesale sectors, but reduces the entrepreneurship in the retail, hotel, and catering sectors. We also show that e-commerce prosperity fuels entrepreneurship by alleviating the financial constraints and moderates the reliance of household entrepreneurship on social networks.
    Keywords: e-commerce development; big data; entrepreneurship
    JEL: L81
    Date: 2018–03–22
  12. By: Chowdhury, Mohammad Tarequl Hasan; Rahman, Muhammad Habibur; Ulubasoglu, Mehmet Ali
    Abstract: This study investigates the ways in which terrain ruggedness affects sectoral diversification. A cross-country analysis using data from 142 countries over the period 1970‒2007 documents an inverted U-shaped link between terrain ruggedness and sectoral diversification, which mainly works through the extensive margin of diversification. A within-country analysis based on United States (US) state-level data over the period 1997‒2011 confirms this non-monotonic relationship. The within-country analysis further reveals that an important mechanism through which terrain ruggedness affects sectoral diversification is the spatial concentration of economic activity, as measured by the concentration of satellite-based night lights.
    Keywords: sectoral diversification, spatial concentration, extensive margin, intensive margin, terrain ruggedness.
    JEL: O11 R12
    Date: 2018–05–31

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.