nep-big New Economics Papers
on Big Data
Issue of 2021‒05‒17
eighteen papers chosen by
Tom Coupé
University of Canterbury

  1. Preaching to Social Media: Turkey’s Friday Khutbas and Their Effects on Twitter By Aksoy, Ozan
  2. A modern take on market efficiency: The impact of Trump's tweets on financial markets By Abdi, Farshid; Kormanyos, Emily; Pelizzon, Loriana; Getmansky, Mila; Simon, Zorka
  3. Artificial Intelligence and Industrial Innovation: Evidence from Firm-Level Data By Christian Rammer; Gastón P Fernández; Dirk Czarnitzki
  4. Answering the Queen: Machine Learning and Financial Crises By FOULIARD, Jeremy; Howell, Michael J.; Rey, Hélène
  5. Applied Algorithmic Machine Learning for Intelligent Project Prediction: Towards an AI Framework of Project Success By Hsu, Ming-Wei; Dacre, Nicholas; Senyo, PK
  6. Bankruptcy Prediction Model Based on Business Risk Reports : Use of Natural Language Processing Techniques By Rasolomanana, Onjaniaina Mianin'Harizo
  7. Safety in Smart, Livable Cities: Acknowledging the Human Factor By Steve J. Bickley; Alison Macintyre; Benno Torgler
  8. Reinforcement Learning with Expert Trajectory For Quantitative Trading By Sihang Chen; Weiqi Luo; Chao Yu
  9. Urban Analytics: History, Trajectory, and Critique By Boeing, Geoff; Batty, Michael; Jiang, Shan; Schweitzer, Lisa
  10. El interés por la innovación financiera en España. Un análisis con google trends By José Manuel Carbó; Esther Diez García
  11. Using social network analysis to prevent money laundering By A. Fronzetti Colladon; E. Remondi
  12. Endogenous Prediction of Bankruptcy using a Support Vector Machine By Zazueta, Jorge; Heredia, Andrea Chavez; Zazueta-Hernández, Jorge
  13. Measuring Inequality from Above By José García-Montalvo; Marta Reynal-Querol; Juan Carlos Muñoz Mora
  14. Exploring the Antecedents of Consumer Confidence through Semantic Network Analysis of Online News By A. Fronzetti Colladon; F. Grippa; B. Guardabascio; F. Ravazzolo
  15. Algorithmic collusion with imperfect monitoring By Calvano, Emilio; Calzolari, Giacomo; Denicolò, Vincenzo; Pastorello, Sergio
  16. Testing the Change in Correlation Structure across Markets : High-Dimensional Data By Abhilash S Nair; Suresh Kalagnanam
  17. Modeling and predicting agricultural land use in England based on spatially high-resolution data By W. Saart, Patrick; Kim, Namhyun; Bateman, Ian
  18. Sharing News Left and Right: The Effects of Policies Targeting Misinformation on Social Media By Daniel Ershov; Juan S. Morales

  1. By: Aksoy, Ozan
    Abstract: In this study I analyse through machine learning the content of all Friday khutbas (sermons) read to millions of citizens in thousands of Mosques of Turkey since 2015. I focus on six non-religious and recurrent topics that feature in the sermons, namely business, family, nationalism, health, trust, and patience. I demonstrate that the content of the sermons respond strongly to events of national importance. I then link the Friday sermons with ~4.8 million tweets on these topics to study whether and how the content of sermons affects social media behaviour. I find generally large effects of the sermons on tweets, but there is also heterogeneity by topic. It is strongest for nationalism, patience, and health and weakest for business. Overall, these results show that religious institutions in Turkey are influential in shaping the public’s social media content and that this influence is mainly prevalent on salient issues. More generally, these results show that mass offline religious activity can have strong effects on social media behavior.
    Date: 2021–05–12
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:ngdrv&r=
  2. By: Abdi, Farshid; Kormanyos, Emily; Pelizzon, Loriana; Getmansky, Mila; Simon, Zorka
    Abstract: We focus on the role of social media as a high-frequency, unfiltered mass information transmission channel and how its use for government communication affects the aggregate stock markets. To measure this effect, we concentrate on one of the most prominent Twitter users, the 45th President of the United States, Donald J. Trump. We analyze around 1,400 of his tweets related to the US economy and classify them by topic and textual sentiment using machine learning algorithms. We investigate whether the tweets contain relevant information for financial markets, i.e. whether they affect market returns, volatility, and trading volumes. Using high-frequency data, we find that Trump's tweets are most often a reaction to pre-existing market trends and therefore do not provide material new information that would influence prices or trading. We show that past market information can help predict Trump's decision to tweet about the economy.
    Keywords: Market efficiency,Social media,Twitter,High-frequency event study,Machine learning,ETFs
    JEL: G10 G14 C58
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:zbw:safewp:314&r=
  3. By: Christian Rammer; Gastón P Fernández; Dirk Czarnitzki
    Abstract: Artificial Intelligence (AI) represents a set of techniques that enable new ways of innovation and allows firms to offer new features of products and services, to improve production, marketing and administration processes, and to introduce new business models. This paper analyses the extent to which the use of AI contributes to the innovation performance of firms. Based on firmlevel data from the German part of the Community Innovation Survey (CIS) 2018, we examine the contribution of different AI methods and applications to product and process innovation outcomes. The representative nature of the survey allows extrapolating the findings to the macroeconomic level. The results show that 5.8% of firms in Germany were actively using AI in their business operations or products and services in 2019. The use of AI generated additional sales with world-first product innovations in these firms of about €16 billion, which corresponds to 18% of total sales of world-first innovations in the German business sector. Firms that developed AI by combining in-house and external resources obtained significantly higher innovation results. The same is true for firms that apply AI in a broad way and have already several years of experience in using AI.
    Keywords: Artificial Intelligence, Innovation, CIS data, Germany
    Date: 2021–04–30
    URL: http://d.repec.org/n?u=RePEc:ete:msiper:674605&r=
  4. By: FOULIARD, Jeremy; Howell, Michael J.; Rey, Hélène
    Abstract: Financial crises cause economic, social and political havoc. Macroprudential policies are gaining traction but are still severely under-researched compared to monetary policy and fiscal policy. We use the general framework of sequential predictions also called online machine learning to forecast crises out-of-sample. Our methodology is based on model averaging and is meta-statistic since we can incorporate any predictive model of crises in our set of experts and test its ability to add information. We are able to predict systemic financial crises twelve quarters ahead out-of-sample with high signal-to-noise ratio in most cases. We analyse which experts provide the most information for our predictions at each point in time and for each country, allowing us to gain some insights into economic mechanisms underlying the building of risk in economies.
    Date: 2020–12
    URL: http://d.repec.org/n?u=RePEc:cpr:ceprdp:15618&r=
  5. By: Hsu, Ming-Wei; Dacre, Nicholas; Senyo, PK
    Abstract: A growing number of emerging studies have been undertaken to examine the mediating dynamics between intelligent agents, activities, and cost within allocated budgets, in order to predict the outcomes of complex projects in dint of their significant uncertain nature in achieving a successful outcome. For example, prior studies have used machine learning models to calculate and perform predictions. Artificial neural networks are the most frequently used machine learning model with support vector machine, and genetic algorithm and decision trees are sometimes used in several related studies. Furthermore, most machine learning algorithms used in prior studies generally assume that inputs and outputs are independent of each other, which suggests that a project's success is expected to be independent of other projects. As the datasets used to train in prior studies often contain projects from different clients across industries, this theoretical assumption remains tenable. However, in practice projects are often interrelated across several different dimensions, for example through distributed overlapping teams. An ongoing ethnographic study at a leading project management artificial intelligence consultancy, referred to in this research as Company Alpha, suggests that projects within the same portfolio frequently share overlapping characteristics. To capture the emergent inter-project relationships, this study aims to compare two specific types of artificial neural network prediction performances; (i) multilayer perceptron and; (ii) recurrent neural networks. The multilayer perceptron has been found to be one of the most widely used artificial neural networks in the project management literature, and recurrent networks are distinguished by the memory they take from prior inputs to influence input and output. Through this comparison, this research will examine whether recurrent neural networks can capture the potential inter-project relationship towards achieving improved performance in contrast to multilayer perceptron. Our empirical investigation using ethnographic practice-based exploration at Company Alpha will contribute to project management knowledge and support developing an intelligent project prediction AI framework with future applications for project practice.
    Date: 2021–03–15
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:6hfje&r=
  6. By: Rasolomanana, Onjaniaina Mianin'Harizo
    Abstract: The purpose of this study is to assess how useful risk information is in bankruptcy prediction, by performing a sentiment analysis of the texts. The proposed method involves the use of Natural Language Processing (NLP) and machine learning techniques. The results show that neural networks performed better than other classifiers, with a classification accuracy of 96.15% for this particular text classification problem. This work demonstrates that business risks reports carry information that helps predict the likelihood of bankruptcy.
    Keywords: Bankruptcy prediction, Business risk, Natural language processing, NLP, Sentiment analysis, Neural Networks,
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:hok:dpaper:358&r=
  7. By: Steve J. Bickley; Alison Macintyre; Benno Torgler
    Abstract: AI and Big Data provide opportunities and challenges with respect to how we achieve safety in livable smart cities. In this contribution, we look at set of aspects that are important at the city level; namely, how urban analytics and digital technologies can be used; how crime safety is influenced by predictive policing; how city planning and urban development can use real- time data; how complexity is connected to traffic safety; how AI offers opportunities for public health; and what are the societal implications of using, applying, or implementing new technologies. A core argument of the paper is the significance of acknowledging the ‘human factor’ when using smart technologies to design a safe and livable smart city.
    Keywords: Artificial Intelligence; Big Data; Smart City; Sustainability; Human Factors
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:cra:wpaper:2021-17&r=
  8. By: Sihang Chen; Weiqi Luo; Chao Yu
    Abstract: In recent years, quantitative investment methods combined with artificial intelligence have attracted more and more attention from investors and researchers. Existing related methods based on the supervised learning are not very suitable for learning problems with long-term goals and delayed rewards in real futures trading. In this paper, therefore, we model the price prediction problem as a Markov decision process (MDP), and optimize it by reinforcement learning with expert trajectory. In the proposed method, we employ more than 100 short-term alpha factors instead of price, volume and several technical factors in used existing methods to describe the states of MDP. Furthermore, unlike DQN (deep Q-learning) and BC (behavior cloning) in related methods, we introduce expert experience in training stage, and consider both the expert-environment interaction and the agent-environment interaction to design the temporal difference error so that the agents are more adaptable for inevitable noise in financial data. Experimental results evaluated on share price index futures in China, including IF (CSI 300) and IC (CSI 500), show that the advantages of the proposed method compared with three typical technical analysis and two deep leaning based methods.
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2105.03844&r=
  9. By: Boeing, Geoff (Northeastern University); Batty, Michael; Jiang, Shan; Schweitzer, Lisa
    Abstract: Urban analytics combines spatial analysis, statistics, computer science, and urban planning to understand and shape city futures. While it promises better policymaking insights, concerns exist around its epistemological scope and impacts on privacy, ethics, and social control. This chapter reflects on the history and trajectory of urban analytics as a scholarly and professional discipline. In particular, it considers the direction in which this field is going and whether it improves our collective and individual welfare. It first introduces early theories, models, and deductive methods from which the field originated before shifting toward induction. It then explores urban network analytics that enrich traditional representations of spatial interaction and structure. Next it discusses urban applications of spatiotemporal big data and machine learning. Finally, it argues that privacy and ethical concerns are too often ignored as ubiquitous monitoring and analytics can empower social repression. It concludes with a call for a more critical urban analytics that recognizes its epistemological limits, emphasizes human dignity, and learns from and supports marginalized communities.
    Date: 2021–05–14
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:bwhx2&r=
  10. By: José Manuel Carbó (Banco de España); Esther Diez García (Banco de España)
    Abstract: En este artículo estudiamos el interés que despiertan distintos conceptos relacionados con la innovación financiera en España. Realizamos nuestro análisis a través de Google Trends, una herramienta que permite analizar la intensidad con la que se consulta un término determinado en el motor de búsqueda de Google. Si bien la herramienta no muestra las búsquedas en términos absolutos, resulta de utilidad para entender las diferencias relativas entre distintos términos y regiones. Los resultados apuntan a que existen diferencias entre Comunidades Autónomas (CCAA), aunque estas varían dependiendo de la categoría de innovación financiera. La diferencia es mucho mayor en los términos relacionados con regulación y las nuevas tecnologías aplicadas a la financiación sostenible, mientras que es casi inexistente en las categorías de criptomonedas y métodos de pago. En cuanto a los factores que pueden explicar estos resultados, identificamos la renta y la edad como posibles determinantes de las diferencias entre CCAA. Estos resultados pueden ser de utilidad de cara a desarrollar programas de educación financiera y orientar mejor los esfuerzos hacia las iniciativas regulatorias en el ámbito digital en las que exista más disparidad en interés y conocimiento.
    Keywords: innovación financiera, empresas fintech, inteligencia artificial, criptomonedas, Google Trends
    JEL: C21 O31 O33
    Date: 2021–04
    URL: http://d.repec.org/n?u=RePEc:bde:opaper:2112&r=
  11. By: A. Fronzetti Colladon; E. Remondi
    Abstract: This research explores the opportunities for the application of network analytic techniques to prevent money laundering. We worked on real world data by analyzing the central database of a factoring company, mainly operating in Italy, over a period of 19 months. This database contained the financial operations linked to the factoring business, together with other useful information about the company clients. We propose a new approach to sort and map relational data and present predictive models, based on network metrics, to assess risk profiles of clients involved in the factoring business. We find that risk profiles can be predicted by using social network metrics. In our dataset, the most dangerous social actors deal with bigger or more frequent financial operations; they are more peripheral in the transactions network; they mediate transactions across different economic sectors and operate in riskier countries or Italian regions. Finally, to spot potential clusters of criminals, we propose a visual analysis of the tacit links existing among different companies who share the same owner or representative. Our findings show the importance of using a network-based approach when looking for suspicious financial operations and potential criminals.
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2105.05793&r=
  12. By: Zazueta, Jorge; Heredia, Andrea Chavez; Zazueta-Hernández, Jorge
    Abstract: We build a global bankruptcy prediction model using a support vector machine trained only on firms' endogenous information in the form of financial ratios. The model is tested not only on entirely random unseen data but on samples taken from specific global regions and industries to test for prediction bias, achieving satisfactory prediction performance in all cases. While support vector machines are not easily interpretable, we explore variable importance and find it consistent with economic intuition.
    Date: 2021–05–06
    URL: http://d.repec.org/n?u=RePEc:osf:socarx:ehpt7&r=
  13. By: José García-Montalvo; Marta Reynal-Querol; Juan Carlos Muñoz Mora
    Abstract: Recent research has shown the usefulness of nighttime light (NTL) data as a proxy for growth and economic activity. This paper explores the potential of using luminosity at night, recorded by satellite imagery, to construct measures of inequality. We develop a new methodology to construct a Gini index for each country using the nighttime light per capita over millions of small pixels. To assess the usefulness of our procedure, we check the correlation of our measure with the common factor extracted from the analysis of several Gini indices calculated using traditional data sources. Finally, we show two specific applications of our methodology: the calculation of within and between inequality across regions and ethnic groups.
    Keywords: Inequality, inequality between and within decomposition, nighttime light, development
    JEL: O10
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:bge:wpaper:1252&r=
  14. By: A. Fronzetti Colladon; F. Grippa; B. Guardabascio; F. Ravazzolo
    Abstract: This article studies the impact of online news on social and economic consumer perceptions through the application of semantic network analysis. Using almost 1.3 million online articles on Italian media covering a period of four years, we assessed the incremental predictive power of economic-related keywords on the Consumer Confidence Index. We transformed news into networks of co-occurring words and calculated the semantic importance of specific keywords, to see if words appearing in the articles could anticipate consumers' judgements about the economic situation. Results show that economic-related keywords have a stronger predictive power if we consider the current households and national situation, while their predictive power is less significant with regards to expectations about the future. Our indicator of semantic importance offers a complementary approach to estimate consumer confidence, lessening the limitations of traditional survey-based methods.
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2105.04900&r=
  15. By: Calvano, Emilio; Calzolari, Giacomo; Denicolò, Vincenzo; Pastorello, Sergio
    Abstract: We show that if they are allowed enough time to complete the learning, Q-learning algorithms can learn to collude in an environment with imperfect monitoring adapted from Green and Porter (1984), without having been instructed to do so, and without communicating with one another. Collusion is sustained by punishments that take the form of "price wars" triggered by the observation of low prices. The punishments have a finite duration, being harsher initially and then gradually fading away. Such punishments are triggered both by deviations and by adverse demand shocks.
    Keywords: artificial intelligence; Collusion; Imperfect Monitoring; Q-Learning
    JEL: D43 D83 L13 L41
    Date: 2021–01
    URL: http://d.repec.org/n?u=RePEc:cpr:ceprdp:15738&r=
  16. By: Abhilash S Nair (Indian Institute of Management Kozhikode); Suresh Kalagnanam (Edwards School of Business, University of Saskatchewan)
    Abstract: The purpose of this study is to test whether CSR engagement provides insurance like effect even after the firm has faced an integrity questioning event. The impact of this integrity based negative event is measured as the media sentiment while reporting the event. Accordingly, we test whether prior CSR engagement prompts media to give the firm the benefit of doubt when it is accused of ‘grand corruption’. The study employs techniques of textual analysis combining various dictionaries and multiple media sources to estimate the sentiment score. Accordingly, we analyse 45,710 media reports, covering firms allegedly involved in ‘grand corruption’. The hypotheses are tested following standard panel data analysis techniques. The study finds no evidence of CSR providing an insurance like effect, particularly in the context of integrity-based negative events. Rather, the results suggest that the media may have viewed the CSR activity of sample firms as a public relations exercise and penalized them for being involved in grand corruption.
    Date: 2021–03
    URL: http://d.repec.org/n?u=RePEc:iik:wpaper:431&r=
  17. By: W. Saart, Patrick (Cardiff Business School); Kim, Namhyun (University of Exeter Business School); Bateman, Ian (University of Exeter Business School)
    Abstract: This paper addresses various statistical and empirical challenges associated with modelling farmers' decision-making processes concerning agricultural land-use. These include (i) use of spatially high-resolution data so that idiosyncratic effects of physical environment drivers, e.g. soil textures, can be explicitly modelled; (ii) modelling land-use shares as censored responses, which enables consistent estimation of the unknown parameters; (iii) incorporating spatial error dependence and heterogeneity, which leads to accurate formulation of the variances for the parameter estimates and more effective statistical inferences; and (iv) reducing the computational burden and improving estimation accuracy by introducing an alternative GMM/QML hybrid estimation procedure. We also provide extensive evidence, which suggests that our approach can construct more accurate land-use predictions than existing methods in the literature. We then apply our method to empirically investigate how the climatic, economic, policy and physical determinants influence the land-use patterns in England over time and spatial space. We are also interested in examining whether environmental schemes and grants have assisted in freeing up land used for arable, rough grazing, temporary and permanent grasslands and converting it to bio-energy crops to help to achieve deep emission reductions and prepare for climate change.
    Keywords: Agro-environmental policy, land-use, multivariate Tobit, system of censored equation, spatial model, error component model.
    JEL: C13 C21 C23 C34 Q15 Q53
    Date: 2021–05
    URL: http://d.repec.org/n?u=RePEc:cdf:wpaper:2021/7&r=
  18. By: Daniel Ershov; Juan S. Morales
    Abstract: We study Facebook’s and Twitter’s policy interventions which aimed to reduce the spread of misinformation during the 2020 US election. Facebook changed its news feed algorithm to reduce the visibility of content, while Twitter changed its user interface, nudging users to be thoughtful about sharing content. Using data on tweets and Facebook posts published by news media outlets, we show both policies significantly reduced news sharing, but the reductions varied heterogeneously by outlets’ factualness and political slant. On Facebook, content sharing fell relatively more for low-factualness outlets. On Twitter, content sharing fell relatively more for left-wing and high-factualness outlets.
    Keywords: social media, news sharing, media slant, fake news, misinformation.
    JEL: D72 L82 L86 O33
    Date: 2021
    URL: http://d.repec.org/n?u=RePEc:cca:wpaper:651&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.