nep-big New Economics Papers
on Big Data
Issue of 2018‒10‒22
nine papers chosen by
Tom Coupé
University of Canterbury

  1. Modern industrial organization theory of media markets and competition policy implications By Budzinski, Oliver; Kuchinke, Björn
  2. The Measurement and Macro-Relevance of Corruption: A Big Data Approach By Sandile Hlatshwayo; Anne Oeking; Manuk Ghazanchyan; David Corvino; Ananya Shukla; Lamin Y Leigh
  3. Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods By Schneider, Kerstin; Berens, Johannes; Oster, Simon; Burghoff, Julian
  4. A Thick ANN Model for Forecasting Inflation By Muhammad Nadim Hanif; Khurrum S. Mughal; Javed Iqbal
  5. DEEP LEARNING NEURAL NETWORKS AS A MODEL OF SACCADIC GENERATION By Sofia Krasovskaya; Georgiy Zhulikov; Joseph MacInnes
  6. A Complex Model of Consumer Food Acquisitions: Applying Machine Learning and Directed Acyclic Graphs to the National Household Food Acquisition and Purchase Survey (FoodAPS) By Senia, Mark C.; Dharmasena, Senarath; Todd, Jessica E.
  7. An Efficient Approach for Removing Look-ahead Bias in the Least Square Monte Carlo Algorithm: Leave-One-Out By Jaehyuk Choi; Chenru Liu; Jeechul Woo
  8. A Machine Learning-based Recommendation System for Swaptions Strategies By Adriano Soares Koshiyama; Nick Firoozye; Philip Treleaven
  9. Enhancing perceived safety in human–robot collaborative construction using immersive virtual environments By You, Sangseok; Kim, Jeong-Hwan; Lee, SangHyun; Kamat, Vineet; Robert, Lionel

  1. By: Budzinski, Oliver; Kuchinke, Björn
    Abstract: This paper outlines the modern industrial organization theory of media markets including competition policy implications. After recapturing fundamentals of industrial organization theory in a non-technical way, the state of the art of (i) modern platform economics, (ii) the economics of the so-called sharing economy, and (iii) the economics of data-based business models and data-driven markets is summarized in a detailed way and illustrated by modern media examples.
    Keywords: industrial organization,media economics,industrial economics,platform economics,sharing economy,digital economy,digitization,big data,economics of privacy,competition policy,antitrust economics
    JEL: L0 L82 L10 A2 K21
    Date: 2018
  2. By: Sandile Hlatshwayo; Anne Oeking; Manuk Ghazanchyan; David Corvino; Ananya Shukla; Lamin Y Leigh
    Abstract: Corruption is macro-relevant for many countries, but is often hidden, making measurement of it—and its effects—inherently difficult. Existing indicators suffer from several weaknesses, including a lack of time variation due to the sticky nature of perception-based measures, reliance on a limited pool of experts, and an inability to distinguish between corruption and institutional capacity gaps. This paper attempts to address these limitations by leveraging news media coverage of corruption. We contribute to the literature by constructing the first big data, cross-country news flow indices of corruption (NIC) and anti-corruption (anti-NIC) by running country-specific search algorithms over more than 665 million international news articles. These indices correlate well with existing measures of corruption but offer additional richness in their time-series variation. Drawing on theory from the corporate finance and behavioral economics literature, we also test to what extent news about corruption and anti-corruption efforts affects economic agents’ assessments of corruption and, in turn, economic outcomes. We find that NIC shocks appear to negatively impact both financial (e.g., stock market returns and yield spreads) and real variables (e.g., growth), albeit with some country heterogeneity. On average, NIC shocks lower real per capita GDP growth by 3 percentage points over a two-year period, illustrating persistence in the effect of such shocks. Conversely, there is suggestive evidence that anti-NIC efforts appear to have a sustained positive macro impact only when paired with meaningful institutional strengthening, proxied by capacity development efforts.
    Keywords: Corruption;anti-corruption, capacity development, big data, text analysis
    Date: 2018–08–31
  3. By: Schneider, Kerstin; Berens, Johannes; Oster, Simon; Burghoff, Julian
    Abstract: High rates of student attrition in tertiary education are a major concern for universities and public policy, as dropout is not only costly for the students but also wastes public funds. To successfully reduce student attrition, it is imperative to understand which students are at risk of dropping out and what are the underlying determinants of dropout. We develop an early detection system (EDS) that uses machine learning and classic regression techniques to predict student success in tertiary education as a basis for a targeted intervention. The method developed in this paper is highly standardized and can be easily implemented in every German institution of higher education, as it uses student performance and demographic data collected, stored, and maintained by legal mandate at all German universities and therefore self-adjusts to the university where it is employed. The EDS uses regression analysis and machine learning methods, such as neural networks, decision trees and the AdaBoost algorithm to identify student characteristics which distinguish potential dropouts from graduates. The EDS we present is tested and applied on a medium-sized state university with 23,000 students and a medium-sized private university of applied sciences with 6,700 students. Our results indicate a prediction accuracy at the end of the 1st semester of 79% for the state university and 85% for the private university of applied sciences. Furthermore, accuracy of the EDS increases with each completed semester as new performance data becomes available. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences.
    Keywords: student attrition,early detection,administrative data,higher education,machine learning,AdaBoost
    JEL: I23 C45 H52
    Date: 2018
  4. By: Muhammad Nadim Hanif (State Bank of Pakistan); Khurrum S. Mughal (State Bank of Pakistan); Javed Iqbal (State Bank of Pakistan)
    Abstract: Inflation forecasting is an essential activity at central banks to formulate forward looking monetary policy stance. Like in other fields, machine learning is finding its way to forecasting; inflation forecasting is not any exception. In machine learning, most popular tool for forecasting is artificial neural network (ANN). Researchers have used different performance measures (including RMSE) to optimize set of characteristics - architecture, training algorithm and activation function - of an ANN model. However, any chosen ‘optimal’ set may not remain reliable on realization of new data. We suggest use of ‘mode’ or most appearing set from a simulation based distribution of optimum ‘set of characteristics of ANN model’; selected from a large number of different sets. Here again, we may have a different trained network in case we re-run this ‘modal’ optimal set since initial weights in training process are assigned randomly. To overcome this issue, we suggest use of ‘thickness’ to produce stable and reliable forecasts using modal optimal set. Using January 1958 to December 2017 year on year (YoY) inflation data of Pakistan, we found that our YoY inflation forecasts (based on aforementioned multistage forecasting scheme) outperform those from a number of inflation forecasting models of Pakistan economy.
    Keywords: Artificial Neural Networks, Inflation Forecasting
    JEL: C45 E31 E37
    Date: 2018–10
  5. By: Sofia Krasovskaya (National Research University Higher School of Economics); Georgiy Zhulikov (National Research University Higher School of Economics); Joseph MacInnes (National Research University Higher School of Economics)
    Abstract: Approximately twenty years ago, Laurent Itti and Christof Koch created a model of saliency in visual attention in an attempt to recreate the work of biological pyramidal neurons by mimicking neurons with centre-surround receptive fields. The Saliency Model has launched many studies that contributed to the understanding of layers of vision and the sphere of visual attention. The aim of the current study is to improve this model by using an artificial neural network that generates saccades similar to how humans make saccadic eye movements. The proposed model uses a Leaky Integrate-and-Fire layer for temporal predictions, and replaces parallel feature maps with a deep learning neural network in order to create a generative model that is precise for both spatial and temporal predictions. Our deep neural network was able to predict eye movements based on unsupervised learning from raw image input, as well as supervised learning from fixation maps retrieved during an eye-tracking experiment conducted with 35 participants at later stages in order to train a 2D softmax layer. The results imply that it is possible to match the spatial and temporal distributions of the model to spatial and temporal human distributions.
    Keywords: saccade generation, salience model, deep learning neural network, visual search, leaky integrate and fire
    JEL: Z
    Date: 2018
  6. By: Senia, Mark C.; Dharmasena, Senarath; Todd, Jessica E.
    Abstract: Complex causal relationships among a large set of variables that affect the U.S. households’ food acquisition and purchase decisions were estimated using machine learning algorithms and directed acyclic graphs. Asians and Hispanics live in an environment with high concentrations of fast- and non-fast food restaurants. Obesity is less prevalent among Asians. Being Hispanic makes one to be more food insecure. Those with higher incomes are food secure and obesity is less prevalent among them. Being Black positively causes to be a SNAP participant and food insecure. Obesity is positively caused by fair/poor health and diet status.
    Keywords: Consumer/Household Economics, Food Consumption/Nutrition/Food Safety
    Date: 2018–01–15
  7. By: Jaehyuk Choi; Chenru Liu; Jeechul Woo
    Abstract: The least square Monte Carlo (LSM) algorithm proposed by Longstaff and Schwartz [2001] is the most widely used method for pricing options with early exercise features. The LSM estimator contains look-ahead bias, and the conventional technique of removing it necessitates an independent set of simulations. This study proposes a new approach for efficiently eliminating look-ahead bias by using the leave-one-out method, a well-known cross-validation technique for machine learning applications. The leave-one-out LSM (LOOLSM) method is illustrated with examples, including multi-asset options whose LSM price is biased high. The asymptotic behavior of look-ahead bias is also discussed with the LOOLSM approach.
    Date: 2018–10
  8. By: Adriano Soares Koshiyama; Nick Firoozye; Philip Treleaven
    Abstract: Derivative traders are usually required to scan through hundreds, even thousands of possible trades on a daily basis. Up to now, not a single solution is available to aid in their job. Hence, this work aims to develop a trading recommendation system, and apply this system to the so-called Mid-Curve Calendar Spread (MCCS), an exotic swaption-based derivatives package. In summary, our trading recommendation system follows this pipeline: (i) on a certain trade date, we compute metrics and sensitivities related to an MCCS; (ii) these metrics are feed in a model that can predict its expected return for a given holding period; and after repeating (i) and (ii) for all trades we (iii) rank the trades using some dominance criteria. To suggest that such approach is feasible, we used a list of 35 different types of MCCS; a total of 11 predictive models; and 4 benchmark models. Our results suggest that in general linear regression with lasso regularisation compared favourably to other approaches from a predictive and interpretability perspective.
    Date: 2018–10
  9. By: You, Sangseok (HEC Paris); Kim, Jeong-Hwan (University of Michigan at Ann Arbor - Department of Civil and Environmental Engineering); Lee, SangHyun (University of Michigan at Ann Arbor - Department of Civil and Environmental Engineering); Kamat, Vineet (University of Michigan at Ann Arbor - Department of Civil and Environmental Engineering); Robert, Lionel (University of Michigan at Ann Arbor - School of Information)
    Abstract: Advances in robotics now permit humans to work collaboratively with robots. However, humans often feel unsafe working alongside robots. Our knowledge of how to help humans overcome this issue is limited by two challenges. One, it is difficult, expensive and time-consuming to prototype robots and set up various work situations needed to conduct studies in this area. Two, we lack strong theoretical models to predict and explain perceived safety and its influence on human–robot work collaboration (HRWC). To address these issues, we introduce the Robot Acceptance Safety Model (RASM) and employ immersive virtual environments (IVEs) to examine perceived safety of working on tasks alongside a robot. Results from a between-subjects experiment done in an IVE show that separation of work areas between robots and humans increases perceived safety by promoting team identification and trust in the robot. In addition, the more participants felt it was safe to work with the robot, the more willing they were to work alongside the robot in the future.
    Keywords: Human–Robot Work Collaboration (HRWC); Immersive Virtual; Environment (IVE); Robot Acceptance Safety Model (RASM); Masonry; Safety; Trust; Team; Identification; Intention to Work with Robot
    JEL: J24 O30
    Date: 2018–09–12

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.