nep-big New Economics Papers
on Big Data
Issue of 2018‒11‒26
thirteen papers chosen by
Tom Coupé
University of Canterbury

  1. Predicting and Understanding Initial Play By Drew Fudenberg; Annie Liang
  2. Predicting and Understanding Initial Play By Drew Fudenberg; Annie Liang
  3. An Algorithmic Crystal Ball: Forecasts-based on Machine Learning By Jin-Kyu Jung; Manasa Patnam; Anna Ter-Martirosyan
  4. The impact of commuting time over educational achievement: A machine learning approach By Dante Contreras; Daniel Hojman; Manuel Matas; Patricio Rodríguez; Nicolás Suárez
  5. Predicting Adverse Media Risk using a Heterogeneous Information Network By Ryohei Hisano; Didier Sornette; Takayuki Mizuno
  6. Predicting Adverse Media Risk using a Heterogeneous Information Network By Ryohei Hisano; Didier Sornette; Takayuki Mizuno
  7. The Ensemble Method For Censored Demand Prediction By Evgeniy M. Ozhegov; Daria Teterina
  8. The Analysis of Big Data on Cites and Regions - Some Computational and Statistical Challenges By Schintler, Laurie A.; Fischer, Manfred M.
  9. The Theory is Predictive, but is it Complete? An Application to Human Perception of Randomness By Jon Kleinberg; Annie Liang; Sendhil Mullainathan
  10. The Theory is Predictive, but is it Complete? An Application to Human Perception of Randomness By Jon Kleinberg; Annie Liang; Sendhil Mullainathan
  11. Economic and Social Impacts of Deforestation reduction in Brazil By Ferreira, J.B. De Souza Filho; De Faria, V. Guidotti; Guedes Pinto, L.F.; Sparovek, G.
  12. China’s Response to Nuclear Safety Post-Fukushima: Genuine or Rhetoric? By Lam, J.; Cheung, L.; Han, Y.; Wang, S.
  13. L'IMPACT DE LA DIGITALISATION SUR LE ROLE DU CONTROLEUR DE GESTION By Florence Cavelius; Christoph Endenich; Adrian Zicari

  1. By: Drew Fudenberg (Department of Economics, MIT); Annie Liang (Department of Economics, University of Pennsylvania)
    Abstract: We take a machine learning approach to the problem of predicting initial play in strategicform games, with the goal of uncovering new regularities in play and improving the predictions of existing theories. The analysis is implemented on data from previous laboratory experiments, and also a new data set of 200 games played on Mechanical Turk. We ï¬ rst use machine learning algorithms to train prediction rules based on a large set of game features. Examination of the games where our algorithm predicts play correctly, but the existing models do not, leads us to introduce a risk aversion parameter that we ï¬ nd signiï¬ cantly improves predictive accuracy. Second, we augment existing empirical models by using play in a set of training games to predict how the models’ parameters vary across new games. This modiï¬ ed approach generates better out-of-sample predictions, and provides insight into how and why the parameters vary. These methodologies are not special to the problem of predicting play in games, and may be useful in other contexts.
    Date: 2017–11–14
    URL: http://d.repec.org/n?u=RePEc:pen:papers:18-009&r=big
  2. By: Drew Fudenberg (Department of Economics, MIT); Annie Liang (Department of Economics, University of Pennsylvania)
    Abstract: We take a machine learning approach to the problem of predicting initial play in strategic-form games, with the goal of uncovering new regularities in play and improving the predictions of existing theories. The analysis is implemented on data from previous laboratory experiments, and also a new data set of 200 games played on Mechanical Turk. We use two approaches to uncover new regularities in play and improve the predictions of existing theories. First, we use machine learning algorithms to train prediction rules based on a large set of game features. Examination of the games where our algorithm predicts play correctly, but the existing models do not, leads us to introduce a risk aversion parameter that we find significantly improves predictive accuracy. Second, we augment existing empirical models by using play in a set of training games to predict how the models' parameters vary across new games. This modified approach generates better out-of-sample predictions, and provides insight into how and why the parameters vary. These methodologies are not special to the problem of predicting play in games, and may be useful in other contexts.
    Date: 2017–11–14
    URL: http://d.repec.org/n?u=RePEc:pen:papers:17-026&r=big
  3. By: Jin-Kyu Jung; Manasa Patnam; Anna Ter-Martirosyan
    Abstract: Forecasting macroeconomic variables is key to developing a view on a country's economic outlook. Most traditional forecasting models rely on fitting data to a pre-specified relationship between input and output variables, thereby assuming a specific functional and stochastic process underlying that process. We pursue a new approach to forecasting by employing a number of machine learning algorithms, a method that is data driven, and imposing limited restrictions on the nature of the true relationship between input and output variables. We apply the Elastic Net, SuperLearner, and Recurring Neural Network algorithms on macro data of seven, broadly representative, advanced and emerging economies and find that these algorithms can outperform traditional statistical models, thereby offering a relevant addition to the field of economic forecasting.
    Date: 2018–11–01
    URL: http://d.repec.org/n?u=RePEc:imf:imfwpa:18/230&r=big
  4. By: Dante Contreras; Daniel Hojman; Manuel Matas; Patricio Rodríguez; Nicolás Suárez
    Abstract: Taking advantage of georeferenced data from Chilean students, we estimate the impact of commuting time over academic achievement. As the commuting time is an endogenous variable, we use instrumental variables and fixed effects at school level to overcome this problem. Also, as we don’t know which mode of transport the students use, we complement our analysis using machine learning methods to predict the transportation mode. Our findings suggest that the commuting time has a negative effect over academic performance, but this effect is not always significant.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:udc:wpaper:wp472&r=big
  5. By: Ryohei Hisano (Social ICT Research Center, Graduate School of Information Science and Technology, The University of Tokyo); Didier Sornette (ETH Z¨urich, Department of Management Technology and Economics); Takayuki Mizuno (National Institute of Informatics)
    Abstract: The media plays a central role in monitoring powerful institutions and identifying any activities harmful to the public interest. In the investing sphere constituted of 46,583 officially listed domestic firms on the stock exchanges worldwide, there is a growing interest “to do the right thing†, i.e., to put pressure on companies to improve their environmental, social and government (ESG) practices. However, how to overcome the sparsity of ESG data from non-reporting firms, and how to identify the relevant information in the annual reports of this large universe? Here, we construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, resulting in about 50 million nodes and 400 million edges in total. Exploiting this heterogeneous information network, we propose a model that can learn from past adverse media coverage patterns and predict the occurrence of future adverse media coverage events on the whole universe of firms. Our approach is tested using the adverse media coverage data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data in order to monitor dominant institutions on a global scale for more socially responsible investment, better risk management, and the surveillance of powerful institutions.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:cfi:fseres:cf449&r=big
  6. By: Ryohei Hisano (Social ICT Research Center, Graduate School of Information Science and Technology, The University of Tokyo); Didier Sornette (ETH Zurich, Department of Management Technology and Economics); Takayuki Mizuno (National Institute of Informatics)
    Abstract: The media plays a central role in monitoring powerful institutions and identifying any activities harmful to the public interest. In the investing sphere constituted of 46,583 officially listed domestic firms on the stock exchanges worldwide, there is a growing interest “to do the right thing”, i.e., to put pressure on companies to improve their environmental, social and government (ESG) practices. However, how to overcome the sparsity of ESG data from non-reporting firms, and how to identify the relevant information in the annual reports of this large universe? Here, we construct a vast heterogeneous information network that covers the necessary information surrounding each firm, which is assembled using seven professionally curated datasets and two open datasets, resulting in about 50 million nodes and 400 million edges in total. Exploiting this heterogeneous information network, we propose a model that can learn from past adverse media coverage patterns and predict the occurrence of future adverse media coverage events on the whole universe of firms. Our approach is tested using the adverse media coverage data of more than 35,000 firms worldwide from January 2012 to May 2018. Comparing with state-of-the-art methods with and without the network, we show that the predictive accuracy is substantially improved when using the heterogeneous information network. This work suggests new ways to consolidate the diffuse information contained in big data in order to monitor dominant institutions on a global scale for more socially responsible investment, better risk management, and the surveillance of powerful institutions.
    Date: 2018–11
    URL: http://d.repec.org/n?u=RePEc:upd:utmpwp:004&r=big
  7. By: Evgeniy M. Ozhegov (National Research University Higher School of Economics); Daria Teterina (National Research University Higher School of Economics)
    Abstract: Many economic applications, including optimal pricing and inventory management, require predictions of demand based on sales data and the estimation of the reaction of sales to price change. There is a wide range of econometric approaches used to correct biases in the estimates of demand parameters on censored sales data. These approaches can also be applied to various classes of machine learning (ML) models to reduce the prediction error of sales volumes. In this study we construct two ensemble models for demand prediction with and without accounting for demand censorship. Accounting for sales censorship is based on a censored quantile regression where the model estimation was split into two separate parts: a) a prediction of zero sales by the classification model; and b) a prediction of non-zero sales by the regression model. Models with and without censorship are based on the prediction aggregations of least squares, Ridge and Lasso regressions and the Random Forest model. Having estimated the predictive properties of both models, we empirically test the best predictive power of the model taking into account the censored nature of demand. We also show that ML with censorship provides bias corrected estimates of demand sensitivity to price change similar to econometric models
    Keywords: demand, censorship, machine learning, prediction.
    JEL: D12 C24 C53
    Date: 2018
    URL: http://d.repec.org/n?u=RePEc:hig:wpaper:200/ec/2018&r=big
  8. By: Schintler, Laurie A.; Fischer, Manfred M.
    Abstract: Big Data on cities and regions bring new opportunities and challenges to data analysts and city planners. On the one side, they hold great promise to combine increasingly detailed data for each citizen with critical infrastructures to plan, govern and manage cities and regions, improve their sustainability, optimize processes and maximize the provision of public and private services. On the other side, the massive sample size and high-dimensionality of Big Data and their geo-temporal character introduce unique computational and statistical challenges. This chapter provides overviews on the salient characteristics of Big Data and how these features impact on paradigm change of data management and analysis, and also on the computing environment.
    Keywords: massive sample size, high-dimensional data, heterogeneity and incompleteness, data storage, scalability, parallel data processing, visualization, statistical methods
    Date: 2018–10–28
    URL: http://d.repec.org/n?u=RePEc:wiw:wus046:6637&r=big
  9. By: Jon Kleinberg (Department of Computer Science, Cornell University); Annie Liang (Department of Economics, University of Pennsylvania); Sendhil Mullainathan (Department of Economics, Harvard University)
    Abstract: When testing a theory, we should ask not just whether its predictions match what we see in the data, but also about its \completeness": how much of the predictable variation in the data does the theory capture? Defining completeness is conceptually challenging, but we show how methods based on machine learning can provide tractable measures of completeness. We also identify a model domain - the human perception and generation of randomness - where measures of completeness can be feasibly analyzed; from these measures we discover there is significant structure in the problem that existing theories have yet to capture.
    Date: 2017–08–09
    URL: http://d.repec.org/n?u=RePEc:pen:papers:17-025&r=big
  10. By: Jon Kleinberg (Department of Computer Science, Cornell University); Annie Liang (Department of Economics, University of Pennsylvania); Sendhil Mullainathan (Department of Economics, Harvard University)
    Abstract: When testing a theory, we should ask not just whether its predictions match what we see in the data, but also about its “completeness†: how much of the predictable variation in the data does the theory capture? Deï¬ ning completeness is conceptually challenging, but we show how methods based on machine learning can provide tractable measures of completeness. We also identify a model domain—the human perception and generation of randomness—where measures of completeness can be feasibly analyzed; from these measures we discover there is signiï¬ cant structure in the problem that existing theories have yet to capture.
    Date: 2017–08–09
    URL: http://d.repec.org/n?u=RePEc:pen:papers:18-010&r=big
  11. By: Ferreira, J.B. De Souza Filho; De Faria, V. Guidotti; Guedes Pinto, L.F.; Sparovek, G.
    Abstract: In this paper, we analyze the economic and social impacts of different deforestation reduction scenarios in Brazil, using a detailed inter-regional, bottom-up, dynamic general equilibrium model. We build three deforestation scenarios using detailed information on land use in Brazil, from satellite imagery that comprises deforestation patterns and land use by state and biome. This information includes agricultural suitability of soil, by biome and state, as well as the classification of land between private and public lands. Results show, for the period under consideration, low aggregate economic losses of reducing deforestation in all scenarios, but those losses are much higher in the agricultural frontier states. Reducing deforestation has also a negative impact on welfare (as measured by household consumption), affecting disproportionately more the poorest households, both at national level and particularly in the frontier regions, both by the income and expenditure composition effects. We conclude that although important from an environmental point of view, those social losses must be taken into account for the policy to get general support in Brazil. Acknowledgement : The authors are grateful to Instituto Escolhas for funding this research.
    Keywords: Environmental Economics and Policy
    Date: 2018–07
    URL: http://d.repec.org/n?u=RePEc:ags:iaae18:277084&r=big
  12. By: Lam, J.; Cheung, L.; Han, Y.; Wang, S.
    Abstract: The Fukushima crisis has brought the nuclear safety problem to the world’s attention. China is the most ambitious country in the world in nuclear power development. How China perceives and responds to nuclear safety issues carries significant implications on its citizens’ safety and security. This paper examines the Chinese government’s promised and actual response to nuclear safety following the Fukushima crisis, based on (1) statistical analysis of newspaper coverage on nuclear energy, and (2) review of nuclear safety performance and safety governance. Our analysis shows that (i) the Chinese government’s concern over nuclear accidents and safety has surged significantly after Fukushima, (ii) China has displayed strengths in reactor technology design and safety operation, and (iii) China’s safety governance has been continuously challenged by institutional fragmentation, inadequate transparency, inadequate safety professionals, weak safety culture, and ambition to increase nuclear capacity by three-fold by 2050. We suggest that China should improve its nuclear safety standards, as well as safety management and monitoring, reform institutional arrangements to reduce fragmentation, improve information transparency, and public trust and participation, strengthen the safety culture, introduce process-based safety regulations, and promote international collaboration to ensure that China’s response to nuclear safety can be fully implemented in real-life.
    Keywords: nuclear safety, media focus, computational text analysis, regulatory governance, safety management
    JEL: C89 Q42 Q48
    Date: 2018–11–05
    URL: http://d.repec.org/n?u=RePEc:cam:camdae:1866&r=big
  13. By: Florence Cavelius (CEROS - Centre d'Etudes et de Recherches sur les Organisations et la Stratégie - UPN - Université Paris Nanterre); Christoph Endenich (LEM - Lille - Economie et Management - CNRS - Centre National de la Recherche Scientifique - UCL - Université catholique de Lille - Université de Lille); Adrian Zicari (ESSEC Business School - Essec Business School)
    Abstract: The digitization of the economic sphere is bound to transform the business models of the organizations radically. Through technological devices, with the Big Data Phenomenon, the emergence of new tools based on Artificial Intelligence, the development of the usage of Internet of Objects, the digitization leads equally the processus of the firm to be transformed. All the functions are bound to be impacted, particularly the Management Controller's. His role is at the same time to assure the dissemination of information inside the firm, but also to advice the managers. Through the framework of the Management Controller's roles, we put into light the tensions he is subject to. At a first place, the role of technician expert is reinforced, leading to a temporary recentralization of the function; but the Management Controller still has to play a major role, accompanying the operational people in putting in place new methods to manage the firm, recovering a an "augmented" business partner role.
    Abstract: La digitalisation de la sphère économique risque de transformer en profondeur les modes de travail des organisations. Se manifestant tout d'abord par des transformations technologiques, avec le phénomène du Big Data, l'apparition de nouveaux outils basés sur l'Intelligence Artificielle, le développement de l'usage des objets connectés, la digitalisation concoure également à une transformation en profondeur impactant tous les processus métiers de l'entreprise. Aucune fonction ne semble être épargnée, notamment pas le contrôle de gestion, dont le rôle est tout à la fois de pourvoir à l'information dans l'organisation mais aussi de conseiller les managers. Au travers du cadre d'analyse des rôles du contrôleur de gestion, nous mettons en évidence les tensions dont il est l'objet : un nécessaire renforcement du rôle de technicien expert, conduisant à une recentralisation provisoire de la fonction, mais aussi un rôle majeur à jouer pour participer avec les opérationnels à la mise en œuvre de méthodes nouvelles pour piloter l'entreprise, retrouvant alors un rôle de business partner « augmenté ».
    Keywords: management controller’s role,management controller’s activities,management controller’s power,business partner,digitization,5 V,rôle du contrôleur de gestion,activités du contrôleur de gestion,pouvoir du contrôleur de gestion,digitalisation
    Date: 2018–05–16
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-01907810&r=big

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.