nep-big New Economics Papers
on Big Data
Issue of 2018‒02‒12
ten papers chosen by
Tom Coupé
University of Canterbury

  1. Colonial legacies: shaping African cities By Baruah, Neeraj G.; Henderson, J. Vernon; Peng, Cong
  2. Gender Stereotype in Academia: Evidence from Economics Job Market Rumors Forum By Alice Wu
  3. Machine learning for time series forecasting - a simulation study By Fischer, Thomas; Krauss, Christopher; Treichel, Alex
  4. Using Emotional Markers' Frequencies in Stock Market ARMAX-GARCH Model By Porshnev, Alexander V.; Lakshina, Valeriya V.; Redkin, Ilya E.
  5. Regulating the digital economy: Are we moving towards a 'win-win' or a 'lose-lose'? By Gehl Sampath, Padmashree
  6. Competition and the public interest in the digital market for information By Lombardi, Claudio
  7. Big Data and Machine Learning in Government Projects: Expert Evaluation Case By Nikitinsky, Nikita; Shashev, Sergey; Kachurina, Polina; Bespalov, Aleksander
  8. A portrait of innovative start-ups across countries By Stefano Breschi; Julie Lassébie; Carlo Menon
  9. An exploratory study towards applying and demystifying deep learning classification on behavioral big data By DE CNUDDE, Sofie; MARTENS, David; PROVOST, Foster
  10. Automation Processes and Blockchain Systems By Hegadekatti, Kartik

  1. By: Baruah, Neeraj G.; Henderson, J. Vernon; Peng, Cong
    Abstract: Differential institutions imposed during colonial rule continue to affect the spatial structure and urban interactions in African cities. Based on a sample of 318 cities across 28 countries using satellite data on built cover over time, Anglophone origin cities sprawl compared to Francophone ones. Anglophone cities have less intense land use and more irregular layout in the older colonial portions of cities, and more leapfrog development at the extensive margin. Results are impervious to a border experiment, many robustness tests, measures of sprawl, and sub-samples. Why would colonial origins matter? The British operated under indirect rule and a dual mandate within cities, allowing colonial and native sections to develop without an overall plan and coordination. In contrast, integrated city planning and land allocation mechanisms were a feature of French colonial rule, which was inclined to direct rule. The results also have public policy relevance. From the Demographic and Health Survey, similar households, which are located in areas of the city with more leapfrog development, have poorer connections to piped water, electricity, and landlines, presumably because of higher costs of providing infrastructure with urban sprawl.
    Keywords: colonialism; persistence; Africa; sprawl; urban form; urban planning; leapfrog
    JEL: H7 N97 O1 R5
    Date: 2017–11–01
  2. By: Alice Wu (Princeton University)
    Abstract: This paper examines whether people in academia portray and judge women and men differently in everyday “conversations†that take place online. I combine methods from text mining, machine learning and econometrics to study the existence and extent of gender stereotyping on the Economics Job Market Rumors forum. I first design a propensity score model to infer the gender a post mainly refers to from text, and simultaneously identify the individual words with the strongest association with gender. The words selected provide a direct look into the gender stereotyped language on this forum. Through a topic analysis of the posts, I find that when women are under discussion, the discourse tends to become significantly less academic or professionally oriented, and more about personal information and physical appearance. Moreover, a panel data analysis reveals the state dependence between the content of posts within a thread. In particular, once women are mentioned in a thread, the topic is likely to shift from academic to personal. Finally, I restrict the analysis to discussions about specific economists, and find that high-profile female economists tend to receive more attention on EJMR than their male counterparts.
    JEL: J16 J23 M51 J71 I23
    Date: 2017–09
  3. By: Fischer, Thomas; Krauss, Christopher; Treichel, Alex
    Abstract: We present a comprehensive simulation study to assess and compare the performance of popular machine learning algorithms for time series prediction tasks. Specifically, we consider the following algorithms: multilayer perceptron (MLP), logistic regression, naïve Bayes, k-nearest neighbors, decision trees, random forests, and gradient-boosting trees. These models are applied to time series from eight data generating processes (DGPs) - reflecting different linear and nonlinear dependencies (base case). Additional complexity is introduced by adding discontinuities and varying degrees of noise. Our findings reveal that advanced machine learning models are capable of approximating the optimal forecast very closely in the base case, with nonlinear models in the lead across all DGPs - particularly the MLP. By contrast, logistic regression is remarkably robust in the presence of noise, thus yielding the most favorable accuracy metrics on raw data, prior to preprocessing. When introducing adequate preprocessing techniques, such as first differencing and local outlier factor, the picture is reversed, and the MLP as well as other nonlinear techniques once again become the modeling techniques of choice.
    Date: 2018
  4. By: Porshnev, Alexander V.; Lakshina, Valeriya V.; Redkin, Ilya E.
    Abstract: We analyze the possibility of improving the prediction of stock market indicators by adding information about public mood expressed in Twitter posts. To estimate public mood, we analysed frequencies of 175 emotional markers - words, emoticons, acronyms and abbreviations - in more than two billion tweets collected via Twitter API over a period from 13.02.2013 to 22.04.2015. We explored the Granger causality relations between stock market returns of S&P500, DJIA, Apple, Google, Facebook, P zer and Exxon Mobil and emotional markers frequencies. We found that 17 emotional markers out of 175 are Granger causes of changes in returns without reverse e ect. These frequencies were tested by Bayes Information Criteria to determine whether they provide additional information to the baseline ARMAX-GARCH model. We found Twitter data can provide additional information and managed to improve prediction as compared to a model based solely on emotional markers.
    Keywords: Twitter, mood, emotional markers, stock market, volatility
    JEL: L86 O16
    Date: 2016–07–18
  5. By: Gehl Sampath, Padmashree (UNCTAD)
    Abstract: The digital economy has been growing by leaps and bounds in recent years, mostly as a result of new digital technologies that are promoting a global transformation to industry 4.0. The resulting expansion of digital trade has sparked off a political and policy controversy on digital economy and e-commerce, where its boundaries stand and how best to regulate it. Policy discussions on the topic however do not take into account the true expanse of digital trade, which encompasses hardware, software, networks, platforms, applications and data as its core elements, and stretches the boundaries of e-commerce policy to trade in goods, services and intellectual property protection. This article focuses on the challenges in regulating the digital economy, with a particular focus on development, and offers a discussion of the interdependency between the economic, social, personal and developmental aspects of digital trade for developing countries. Section II opens with a detailed discussion on key digital technologies and their plausible impacts on employment globally and industrial catch-up of particular importance to developing countries, to highlight the divisive nature of digital technologies. Section III then analyses the unfulfilled promise of a pro-development perspective at the WTO looking at how multilateralism has currently failed e-commerce. In this section, the incoherence between digital realities and the policy debates at the WTO are presented to show how the institution might have become a means to legitimise national policies of industrialised countries on a universal level in this important area of policymaking. Norm-setting through FTAs is also analysed at length in section III of the article, which provides a comprehensive review of the plurilateral and bilateral policy developments in e-commerce. The ramifications for developing countries are discussed in the form of a couple of examples. Section IV presents some options for developing countries for the future at the national and international level.
    Keywords: digital economy, e-commerce, industry 4.0, digital trade, robotics and process automation, artificial intelligence, 3D printing, manufacturing, development, trade, free trade agreements, digital industrial policy
    JEL: L11 L23 L25 L41 L51 L81 L86 O19 O31 O33 O34 O38
    Date: 2018–01–22
  6. By: Lombardi, Claudio
    Abstract: Our behaviour on the internet is continuously monitored and processed through the elaboration of big data. Complex algorithms categorize our choices and personalise our online environment, which is used to propose, inter alia, bespoke news and information. It is in this context, that the competition between sources of information in the "market for ideas", takes place. While these mechanisms bring efficiency benefits, they also have severe downsides that only very recently we have begun to uncover. These drawbacks regard not only deadweight losses caused by market distortions, but also public policy issues, in particular in case of politically relevant news. What are the public and private interest concerns impacted by this practice? Can this algorithm-driven selection of news be captured by competition laws? The digital news market, as constructed around online advertising, presents peculiarities which necessitate a reframing of standard approaches to traditional information markets, and of the creation and distribution of ideas.
    Keywords: competition law,antitrust,marketplace of ideas,online behavioural targeting,public interest,post-truth society,fake news,online environment
    Date: 2017
  7. By: Nikitinsky, Nikita; Shashev, Sergey; Kachurina, Polina; Bespalov, Aleksander
    Abstract: In this paper, we present the Expert Hub System, which was designed to help governmental structures find the best experts in different areas of expertise for better reviewing of the incoming grant proposals. In order to define the areas of expertise with topic modeling and clustering, and then to relate experts to corresponding areas of expertise and rank them according to their proficiency in certain areas of expertise, the Expert Hub approach uses the data from the Directorate of Science and Technology Programmes. Furthermore, the paper discusses the use of Big Data and Machine Learning in the Russian government project.
    Keywords: government project, Big Data, Machine Learning, expert evaluation, clustering
    JEL: O38
    Date: 2016–07–18
  8. By: Stefano Breschi (OECD); Julie Lassébie (OECD); Carlo Menon (OECD)
    Abstract: The report presents new cross-country descriptive evidence on innovative start-ups and related venture capital investments drawing upon Crunchbase, a new dataset that is unprecedented in terms of scope and comprehensiveness. The analysis employs a mix of different statistical techniques (descriptive graphics, econometric analysis, and machine learning) to highlight a number of findings. First, there are significant cross-country differences in the professional and educational background of start-ups’ founders, notably the share of founders with previous academic experience and in the share of “serial entrepreneurs”. Conversely, the founders’ average age is rather constant across countries, but shows a fair degree of variability across sectors. Second, IP assets, and in particular the presence of an inventor in the team of founders, are strongly associated with start-ups’ success. Finally, female founders are less likely to receive funding, receive lower amounts when they do receive financing, and have a lower probability of successful exit, when other factors are controlled for.
    Date: 2018–02–08
  9. By: DE CNUDDE, Sofie; MARTENS, David; PROVOST, Foster
    Abstract: The superior performance of deep learning algorithms in fields such as computer vision and natural language processing has fueled an increased interest towards these algorithms in both research and in practice. Ever since, many studies have applied these algorithms to other machine learning contexts with other types of data in the hope of achieving comparable superior performance. This study departs from the latter motivation and investigates the application of deep learning classification techniques on big behavioral data while comparing its predictive performance with 11 widely-used shallow classifiers. In addition to the application on a new type of data and a structured comparison of its performance with commonlyused classifiers, this study attempts to shed light onto when and why deep learning techniques perform better. Regarding the specific characteristics of applying deep learning on this unique class of data, we demonstrate that an unsupervised pretraining step does not improve classification performance and that a tanh nonlinearity achieves the best predictive performance. The results from applying deep learning on 15 big behavioral data sets demonstrate as good as or better results compared to traditionally-used, shallow classifiers. However, no significant performance improvement can be recorded. Investigating when deep learning performs better, we find that worse performance is obtained for data sets with low signal-from-noise separability. In order to gain insight into why deep learning generally performs well on this type of data, we investigate the value of the distributed, hierarchical characteristic of the learning process. The neurons in the distributed representation seem to identify more nuances in the many behavioral features as compared to shallow classifiers. We demonstrate these nuances in an intuitive manner and validate them through comparison with feature engineering techniques. This is the first study to apply and validate the use of nonlinear deep learning classification on fine-grained, human-generated data while proposing efficient con guration settings for its practical implementation. As deep learning classification is often characterized by being a black-box approach, we also provide a first attempt towards the disentanglement regarding when and why these techniques perform well.
    Date: 2018–01
  10. By: Hegadekatti, Kartik
    Abstract: Blockchain Systems and Ubiquitous computing are changing the way we do business and lead our lives. One of the most important applications of Blockchain technology is in automation processes and Internet-of-Things (IoT). Machines have so far been limited in ability primarily because they have restricted capacity to exchange value. Any monetary exchange of value has to be supervised by humans or human-based centralised ledgers. Blockchain technology changes all that. It allows machines to have unique identities and hence a virtual presence. Blockchain technology even allows for automated verification by the network of machines itself. It permits machines to exchange value and introduce the element of discretion in the hands of Machines. This can form the basis for ultimately developing IoT going on to Artificial Intelligence. This paper deals with the various interplays of Blockchain with Automation processes. Firstly, the concept of cryptocurrencies (also referred to as cryptocoins in this paper) is explained. Then the concept of Regulated and Sovereign Backed Cryptocurrencies (RSBCs) is discussed. Later on, I explain how Blockchain systems are related to IoT. Then we discuss the concept of Smart Mining that will lead to advanced Blockchain activity and Machine intelligence. Finally, the paper concludes as to how Blockchain technology will impact automation processes.
    Keywords: blockchain, bitcoin, cryptocurrency, rsbc, nationcoins, K-Y Protocol, automation, singularity
    JEL: L60 O14 O32 O33
    Date: 2017–02–06

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.