nep-big New Economics Papers
on Big Data
Issue of 2019‒12‒02
nineteen papers chosen by
Tom Coupé
University of Canterbury

  1. Administration by Algorithm? Public Management meets Public Sector Machine Learning By Veale, Michael; Brass, Irina
  2. Big Data-Based Peer-to-Peer Lending Fintech: Surveillance System through Utilization of Google Play Review By Pranata, Nika; Farandy, Alan Ray
  3. The Promise and Pitfalls of Conflict Prediction: Evidence from Colombia and Indonesia By Bazzi, Samuel; Blair, Robert A.; Blattman, Chris; Dube, Oeindrila; Gudgeon, Matthew; Peck, Richard
  4. Central bank tone and the dispersion of views within monetary policy committes By Paul Hubert; Fabien Labondance
  5. Artificial intelligence approach to momentum risk-taking By Ivan Cherednik
  6. Data Markets in Making: The Role of Technology Giants By Koski, Heli; Pantzar, Mika
  7. Forecasting Bitcoin Returns: Is there a Role for the U.S. – China Trade War? By Vasilios Plakandaras; Elie Bouri; Rangan Gupta
  8. News and consumer card payments By Guerino Ardizzi; Simone Emiliozzi; Juri Marcucci; Libero Monteforte
  9. Bayesian regularized artificial neural networks for the estimation of the probability of default By Sariev, Eduard; Germano, Guido
  10. Deep Reinforcement Learning in Cryptocurrency Market Making By Jonathan Sadighian
  11. How Unique is "E-stonia"? A Cross-Country Comparison of E-Services Usage in Europe By Stephany, Fabian
  12. Back to the Future - Changing Job Profiles in the Digital Age By Stephany, Fabian; Lorenz, Hanno
  13. Displaying spatial epistemologies on web GIS: using visual materials from the Chinese local gazetteers as an example By Lin, Nung-yao; Chen, Shih-Pei; Wang, Sean H.; Yeh, Calvin
  14. Imitation in the Imitation Game By Ravi Kashyap
  15. Creación de un nuevo bien común para las cooperativas agrícolas: Big data, TIC e intercambio de datos By Cynthia GIAGNOCAVO; Daniel HERNÃ NDEZ CÃ CERES
  16. Can Declines in Fertility During Floods Be Explained by Increased Demands on the Farm? By Chen, Joyce; Mueller, Valerie; Thiede, Brian
  17. Coding Together - Coding Alone: The Role of Trust in Collaborative Programming By Stephany, Fabian; Braesemann, Fabian; Graham, Mark
  18. Estimating Treatment Heterogeneity of International Monetary Fund Programs on Child Poverty with Generalized Random Forest By Daoud, Adel; Johansson, Fredrik
  19. Rhetorics of Radicalism By Karell, Daniel; Freedman, Michael Raphael

  1. By: Veale, Michael; Brass, Irina
    Abstract: Public bodies and agencies increasingly seek to use new forms of data analysis in order to provide 'better public services'. These reforms have consisted of digital service transformations generally aimed at 'improving the experience of the citizen', 'making government more efficient' and 'boosting business and the wider economy'. More recently however, there has been a push to use administrative data to build algorithmic models, often using machine learning, to help make day-to-day operational decisions in the management and delivery of public services rather than providing general policy evidence. This chapter asks several questions relating to this. What are the drivers of these new approaches? Is public sector machine learning a smooth continuation of e-Government, or does it pose fundamentally different challenge to practices of public administration? And how are public management decisions and practices at different levels enacted when machine learning solutions are implemented in the public sector? Focussing on different levels of government: the macro, the meso, and the 'street-level', we map out and analyse the current efforts to frame and standardise machine learning in the public sector, noting that they raise several concerns around the skills, capacities, processes and practices governments currently employ. The forms of these are likely to have value-laden, political consequences worthy of significant scholarly attention.
    Date: 2019–04–19
  2. By: Pranata, Nika (Asian Development Bank Institute); Farandy, Alan Ray (Asian Development Bank Institute)
    Abstract: Peer-to-peer lending (P2PL) FinTech is growing rapidly in Indonesia. With its flexibility and simplicity, P2PL reduces the financing gap that cannot be fulfilled by banks. However, the rapid development of P2PL also raises a number of problems that burden users such as unethical debt collection methods and the imposition of excessive interest rate and other costs that potentially threaten national financial system stability. Therefore, by utilizing big data, which in this case is 40,650 reviews from 110 P2PLs obtained from Google Play from March 2016 to August 2018, we build a big data-based P2PL surveillance system based on four aspects: legality, review rating, debt collection methods, and level of interest rates and other costs. By using relational database, structured query language (SQL), and text analysis, we found that (i) the majority of P2PL in Google Play are unauthorized; (ii) on average, authorized P2PL receives a better review rating; (iii) there are a lot of negative reviews related to unethical debt collection methods and excessive imposition of interest rate; and (iv) four P2PLs required special supervision from the Indonesia Financial Service Authority (OJK). Furthermore, the OJK should not passively wait for official reports to be filed by the public regarding violations of P2PL businesses. Through this big data-based system, the OJK can find these violations proactively because the system can act as an early warning system for the OJK in terms of P2PL surveillance.
    Keywords: fintech; peer to peer lending; big data; review; Google Play
    JEL: G23 G24 G28
    Date: 2019–04–12
  3. By: Bazzi, Samuel; Blair, Robert A.; Blattman, Chris; Dube, Oeindrila; Gudgeon, Matthew; Peck, Richard
    Abstract: Policymakers can take actions to prevent local conflict before it begins, if such violence can be accurately predicted. We examine the two countries with the richest available sub-national data: Colombia and Indonesia. We assemble two decades one fine- grained violence data by type, alongside hundreds of annual risk factors. We predict violence one year ahead with a range of machine learning techniques. Models reliably identify persistent, high-violence hot spots. Violence is not simply autoregressive, as detailed histories of disaggregated violence perform best. Rich socio-economic data also substitute well for these histories. Even with such unusually rich data, however, the models poorly predict new outbreaks or escalations of violence. \Best case" scenarios with panel data fall short of workable early-warning systems.
    Date: 2019–06–11
  4. By: Paul Hubert (Sciences Po - OFCE); Fabien Labondance (Université de Bourgogne Franche-Comté, CRESE)
    Abstract: Does policymakers’ choice of words matter? We explore empirically whether central bank tone conveyed in FOMC statements contains useful information for financial market participants. We quantify central bank tone using computational linguistics and identify exogenous shocks to central bank tone orthogonal to the state of the economy. Using an ARCH model and a high-frequency approach, we find that positive central bank tone increases interest rates at the 1-year maturity. We therefore investigate which potential pieces of information could be revealed by central bank tone. Our tests suggest that it relates to the dispersion of views among FOMC members. This information may be useful to financial markets to understand current and future policy decisions. Finally, we show that central bank tone helps predicting future policy decisions.
    Date: 2019–11
  5. By: Ivan Cherednik
    Abstract: We propose a mathematical model of momentum risk-taking, which is real-time risk management, and discuss its implementation: an automated momentum equity trading system. Risk-taking is one of the key components of general decision-making, a challenge for artificial intelligence and machine learning. We begin with a simple continuous model of news impact and then perform its discretization, adjusting it to dealing with discontinuous functions. Stock charts are the main examples for us; stock markets are quite a test for any risk management theories. An entirely automated trading system based on our approach proved to be successful in extensive historical and real-time experiments. Its preimage is a new contract card game presented at the end of the paper.
    Date: 2019–11
  6. By: Koski, Heli; Pantzar, Mika
    Abstract: Abstract This paper focuses on the role of large technology companies’ entry and expansion to the data-intensive market areas via their technological development and strategic acquisitions of companies. We analyze the evolvement of personal data related innovation in various data-intensive domains. We find that the ideas related to personal data are increasingly protected by patents. The growth in the numbers of personal data related patents was relatively modest from 2005 to the early 2010s, but it has intensified since 2011. Large technology companies’ entry to various new market areas is reflected in an exponential increase in patent applications particularly in the artificial intelligence domain. Furthermore, we find that the number of artificial intelligence/data analytics companies acquired by the data giants has escalated during the 2010s. Patent and acquisition data further echo technology giants’ intentions to expand their activities into the financial and personal health services. Overall, the data show the data giants’ buyouts are frequently targeted to companies active in the markets outside their core business. Our analysis illustrates how the divergencies in the data giants’ innovation activities and strategic acquisitions have led them to each conquer their specific areas of dominance in the global markets for data.
    Keywords: Data economy, Innovation, Patents, Acquisitions, Technology giants
    JEL: G34 L12 L25 O33
    Date: 2019–11–19
  7. By: Vasilios Plakandaras (Department of Economics, Democritus University of Thrace, University Campus, Komotini, Greece); Elie Bouri (USEK Business School, Holy Spirit University of Kaslik, Jounieh, Lebanon); Rangan Gupta (Department of Economics, University of Pretoria, Pretoria, 0002, South Africa)
    Abstract: Previous studies provide evidence that trade related uncertainty tends to predict an increase in Bitcoin returns. In this paper, we extend the related literature by examining whether the information on the U.S. – China trade war can be used to forecast the future path of Bitcoin returns controlling for various explanatory variables. We apply ordinary least square (OLS) regression, support vector regression (SVR), and the least absolute shrinkage and selection operator (LASSO) techniques that stem from the field of machine learning, and find weak evidence of the role of the trade war in forecasting Bitcoin returns. Given that out-of-sample tests are more reliable than in-sample tests, our results tend to suggest that future Bitcoin returns are unaffected by trade related uncertainties, and investors can use Bitcoin as a safe haven in this context.
    Keywords: Bitcoin, forecasting, machine learning, U.S. – China trade war
    JEL: C53 G11 G17
    Date: 2019–11
  8. By: Guerino Ardizzi (Bank of Italy); Simone Emiliozzi (Bank of Italy); Juri Marcucci (Bank of Italy); Libero Monteforte (Bank of Italy and Parliamentary Budget Office)
    Abstract: We exploit a unique daily data set on debit card expenditures to study the reaction of consumers to daily news relating to Economic Policy Uncertainty (EPU). Payments with debit cards are a proxy for consumption in the quarterly national accounts. Using big data techniques we construct daily EPU indexes, using either articles from Bloomberg news-wire or tweets from Twitter. Our empirical analysis at high frequency required estimates of daily seasonal components, finding strong patterns both within the week and within the month. Using local projections we find that daily shocks to EPU temporarily reduce debit card purchases, especially during the recent crisis; the main results are confirmed using monthly data and controlling for financial uncertainty and macroeconomic surprises. Furthermore, economic policy uncertainty affects the ratio between ATM withdrawals and debit card purchases, signaling an increase in households' preference for cash.
    Keywords: consumption, payment system, policy uncertainty, big data, daily seasonality, local projections
    JEL: C11 C32 C43 C52 C55 E52 E58
    Date: 2019–10
  9. By: Sariev, Eduard; Germano, Guido
    Abstract: Artificial neural networks (ANN) have been extensively used for classification problems in many areas such as gene, text and image recognition. Although ANN are popular also to estimate the probability of default in credit risk, they have drawbacks; a major one is their tendency to overfit the data. Here we propose an improved Bayesian regularization approach to train ANN and compare it to the classical regularization that relies on the back-propagation algorithm for training feed-forward networks. We investigate different network architectures and test the classification accuracy on three data sets. Profitability, leverage and liquidity emerge as important financial default driver categories.
    Keywords: Artificial neural networks; Bayesian regularization; Credit risk; Probability of default; ES/K002309/1
    JEL: C11 C13
    Date: 2019–10–31
  10. By: Jonathan Sadighian
    Abstract: This paper sets forth a framework for deep reinforcement learning as applied to market making (DRLMM) for cryptocurrencies. Two advanced policy gradient-based algorithms were selected as agents to interact with an environment that represents the observation space through limit order book data, and order flow arrival statistics. Within the experiment, a forward-feed neural network is used as the function approximator and two reward functions are compared. The performance of each combination of agent and reward function is evaluated by daily and average trade returns. Using this DRLMM framework, this paper demonstrates the effectiveness of deep reinforcement learning in solving stochastic inventory control challenges market makers face.
    Date: 2019–11
  11. By: Stephany, Fabian
    Abstract: User data fuel the digital economy, while individual privacy is at stake. Governments react differently to this challenge. Estonia, a small Baltic state, has become a role model for the renewal of the social contract in times of big data (hence, often ironically referred to as "E-stonia"). While e-governance usage has been growing in many parts of Europe during the last ten years, some regions are lagging behind. The Estonian example suggests that online governance is most accepted in a small state, with a young population, trustworthy institutions and the need of technological renewal. This work examines the development of e-governance usage (citizens interacting digitally with the government) during the last decade in Europe from a comprehensive cross-country perspective: Size, age and trust are relevant for the usage of digital government services in Europe. However, the quality of past communication infrastructure is not related to e-governance popularity.
    Date: 2019–08–25
  12. By: Stephany, Fabian; Lorenz, Hanno
    Abstract: The uniqueness of human labour is at question in times of smart technologies. The 250 years-old discussion on technological unemployment reawakens. Frey and Osborne (2012) estimate that half of US employment will be automated by algorithms within the next 20 years. Other follow-up studies conclude that only a small fraction of workers will be replaced by digital technologies. The main contribution of our work is to show that the diversity of previous findings regarding the degree of job automation is, to a large extent, driven by model selection and not by controlling for personal characteristics or tasks. For our case study, we consult Austrian experts in machine learning and industry professionals on the susceptibility to digital technologies in the Austrian labour market. Our results indicate that, while clerical computer-based routine jobs are likely to change in the next decade, professional activities, such as the processing of complex information, are less prone to digital change.
    Date: 2019–08–16
  13. By: Lin, Nung-yao; Chen, Shih-Pei; Wang, Sean H.; Yeh, Calvin
    Abstract: In this paper, we introduce a web GIS platform created expressly for exploring and researching a set of 63,497 historical maps and illustrations extracted from 4,000 titles of Chinese local gazetteers. We layer these images with a published, geo-referenced collection of Land Survey Maps of China (1903-1948), which includes the earliest large-scaled maps of major cities and regions in China that are produced with modern cartographic techniques. By bringing together historical illustrations depicting spatial configurations of localities and the earliest modern cartographic maps, researchers of Chinese history can study the different spatial epistemologies represented in both collections. We report our workflow for creating this web GIS platform, starting from identifying and extracting visual materials from local gazetteers, tagging them with keywords and categories to facilitate content search, to georeferencing them based on their source locations. We also experimented with neural networks to train a tagger with positive results. Finally, we display them in the web GIS platform with two modes, Images in Map (IIM) and Maps in Map (MIM), and with content- and location-based filtering. These features together enable researchers easy and quick exploration and comparison of these two large sets of geospatial and visual materials of China.
    Date: 2019–07–09
  14. By: Ravi Kashyap
    Abstract: We discuss the objectives of automation equipped with non-trivial decision making, or creating artificial intelligence, in the financial markets and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. A consideration of these requirements allows us to propose a test of intelligence for trading programs, on the lines of the Turing Test, long the benchmark for intelligent machines. We discuss the application of this methodology to the dilemma in finance, which is whether, when and how much to Buy, Sell or Hold.
    Date: 2019–11
  15. By: Cynthia GIAGNOCAVO (Universidad de Almería (Spain)); Daniel HERNÃ NDEZ CÃ CERES (Universidad de Almería (Spain))
    Abstract: Creating a new commons for agricultural cooperatives: Big data, ICT and data sharing. The utilisation of Big Data and ICT technologies on a large scale in agriculture is seen to be a solution for dealing with climate change, environmental degradation, land and water constraints, the necessity to optimise resources, reduce costs, and increase traceability and food safety, amongst other compelling arguments. However, it has also resulted in imbalances in power, investment barriers, reduced access to knowledge and the decreasing ability of farmers and SMEs to control and benefit from their agricultural related activities. This paper considers the legal, governance, institutional and economic issues that may arise in developing a data cooperative or other equitable data sharing structures, taking into account public and private sources of data, and multi-stakeholders involved. A review of successful data sharing examples, including cooperatives, is presented and a test case from the cooperatives of Almería, Spain is considered. This research falls within the context of the EU H2020 project Internet of Food and Farm (IoF2020) and the development of innovative data sharing business models. Rather than falling back on classical contracting arrangements for data sharing, as proposed by Copa-Cogeca, amongst others, it is proposed that a “data commons†approach in keeping with Elinor Ostrom’s SocialEcological Systems Framework be used to frame a cooperative solution to this complex, systems based, challenge. By choosing a cooperative approach, benefits to farmers may go beyond “monetization†of data, and contribute to safeguarding environmental goods.
    Keywords: Data Sharing; Social-Ecological-Technical Systems; Agricultural Cooperatives; Multi-stakeholder cooperatives; Big Data and ICT; Public-private initiatives
    JEL: K22 O13 Q13 Q16
    Date: 2019
  16. By: Chen, Joyce; Mueller, Valerie; Thiede, Brian
    Abstract: Projections of sea-level rise and coastal flooding place Bangladesh as one of the countries most vulnerable to climate change by the end of this century. These changes are expected to have widespread consequences, including for population dynamics. We build upon a growing economic demography literature to estimate the effect of flooding on fertility in rural Bangladesh, using satellite-based measures of flooding and vital registration data on the infant population (2003-2011). We additionally perform parallel analyses of the socio-economic effects of flooding to explore whether prevailing labor market opportunities during a flooding episode shape the decision to conceive. We find the odds of having a child under age 1 in a household declines 3 percent when the extent of flooding in a sub-district increases by one standard deviation. There are no differential effects on the sex ratio. Flood-induced declines in fertility coincide with increased labor force participation by men, but maternal health, fetal vulnerability at gestation and/or increased health risks post birth seem to play a larger role. Future research differentiating how climate change affects the opportunity cost of worker’s time versus physiological factors related to human fertility is thus a key component to projecting the future stock of rural workers.
    Keywords: Consumer/Household Economics, Environmental Economics and Policy
    Date: 2019
  17. By: Stephany, Fabian; Braesemann, Fabian; Graham, Mark
    Abstract: In the digital economy, innovation processes increasingly rely on highly specialised know-how and open-source software shared on digital platforms on collaborative programming. The information that feeds into the content on these platforms is provided voluntarily by a vast crowd of knowledgeable users from all over the world. In contributing to the platforms, users invest their time and share knowledge with strangers to add to the rising body of digital knowledge.This requires an open mindset and trust. In this study, we argue that such a mindset is not just an individual asset, but determined by the local communities the users are embedded in. We, therefore, hypothesise that places with higher levels of trust should contribute more to StackOverflow, the world’s largest question-and-answer platform for programming questions. In relating the city-level contributions of 266 OECD metropolitan areas to infrastructure, economic, and trust measures, we find this hypothesis confirmed. In contrast, click rates to the platform are solely driven by infrastructure and economic variables, but not by trust. These findings highlight the importance of societal values in the 21st century knowledge economy: if policy-makers want to develop a lively local digital economy, it is not enough to provide fast Internet access and business opportunities. Instead, it is equally important to establish a trust-building environment that fosters sharing of innovative ideas, collaborations, and knowledge spillovers.
    Date: 2019–05–03
  18. By: Daoud, Adel; Johansson, Fredrik
    Abstract: A flourishing group of scholars of family sociology study how macroeconomic shockwaves propagate via households dynamics and landing a blow on children’s living conditions; simultaneously, scholars of political economy unravel impacts of such shockwaves on population outcomes. Since these two strands of literature have evolved independently, little is know about the relative importance of societal and family features moderating this impact on children’s material living conditions. In this article, we synthesize insights from these two strands by examining the effect of economic austerity following International Monetary Fund programs—a type of economic shock—on child poverty across a sample representative of about half the world’s population of mainly the Global South. This article addresses the following fundamental sociological questions: to what extent do the pathways of economic austerity propagate through families’ living conditions and societies’ structural and political characteristics. To capture these multiple non-linear heterogeneous relationships between macro and micro traits, we deploy machine learning in the service of policy evaluation. First, our analysis identifies an adverse average treatment effect (ATE) following the implementation of IMF programs on children’s probability of falling into poverty: 0.14, 95% CI 0.03- 0.24. Second, our algorithms identify substantial impact heterogeneity distributed about this ATE. Macro constellation moderate about half of the impact variation on children, and families’ capabilities moderate the other half of this variation. We named this finding the 50-50 impact-moderation rule of thumb. Our algorithm identified family wealth closely followed by governments’ education spending as the critical moderating factors. IMF program affects children residing in the middle of the social stratification more than compared to their peers residing in both the top and bottom of this stratification; for those children residing in societies that have selected into IMF programs and have historically spent most on education, are at a higher risk of falling into poverty. These findings identify the value of combining family sociology and political economy perspectives. Scholars will likely cross-fertilize this research further by testing this 50-50 rule of thumb to other types of economic shocks.
    Date: 2019–02–07
  19. By: Karell, Daniel; Freedman, Michael Raphael
    Abstract: What rhetorics run throughout radical discourse, and why do some gain prominence over others? The scholarship on radicalism largely portrays radical discourse as opposition to powerful ideas and enemies, but radicals often evince great interest in personal and local concerns. To shed light on how radicals use and adopt rhetoric, we analyze an original corpus of more than 23,000 pages produced by Afghan radical groups between 1979 and 2001 using a novel computational abductive approach. We first identify how radicalism not only attacks dominant ideas, actors, and institutions using a rhetoric of subversion, but also how it can use a rhetoric of reversion to urge intimate transformations in morals and behavior. Next, we find evidence that radicals’ networks of support affect the rhetorical mixture they espouse, due to social ties drawing radicals into encounters with backers’ social domains. Our study advances a relational understanding of radical discourse, while also showing how a combination of computational and abductive methods can help theorize and analyze discourses of contention.
    Date: 2019–04–17

This nep-big issue is ©2019 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.