nep-big New Economics Papers
on Big Data
Issue of 2017‒09‒24
ten papers chosen by
Tom Coupé
University of Canterbury

  1. Homeowner Preferences after September 11th, a Microdata Approach By Adam Nowak; Juan Sayago-Gomez
  2. Reading Between the Lines: Prediction of Political Violence Using Newspaper Text By Hannes Mueller; Christopher Rauh
  3. Asset returns, news topics, and media effects By Vegard Høghaug Larsen; Leif Anders Thorsrud
  4. Provision of Personal Information and the Willingness-to-Pay for Receiving Critical Information in Time of an Unprecedented Disaster By Sakurai, Naoko; Otsuka, Tokio; Mitomo, Hitoshi
  5. Combining experimental evidence with machine learning to assess anti-corruption educational campaigns among Russian university students By Denisova-Schmidt, Elena; Huber, Martin; Leontyeva, Elvira; Solovyeva, Anna
  7. The best of two worlds: Balancing model strength and comprehensibility in business failure prediction using spline-rule ensembles By Koen De Bock
  8. Mind the gap: Platform ethics and competition issues By Nicholls, Rob
  9. The impact of Digitalization on Business Models: How IT Artefacts, Social Media, and Big Data Force Firms to Innovate Their Business Model By Bouwman, Harry; de Reuver, Mark; Nikou, Shahrokh
  10. Big Data et conception d'un système d'information d'aide à la décision clinique : vers une gestion sociocognicielle de la responsabilité médicale ? By Christine Sybord

  1. By: Adam Nowak (West Virginia University, Department of Economics); Juan Sayago-Gomez (West Virginia University, Department of Economics)
    Abstract: The existence of homeowner preferences - specifically homeowner preferences for neighbors - is fundamental to economic models of sorting. This paper investigates whether or not the terrorist attacks of September 11, 2001 (9/11) impacted local preferences for Arab neighbors. We test for changes in preferences using a differences-in-differences approach in a hedonic pricing model. Relative to sales before 9/11, we find properties within 0.1 miles of an Arab homeowner sold at a 1.4% discount in the 180 days after 9/11. The results are robust to a number of specifications including time horizon, event date, distance, time, alternative ethnic groups, and the presence of nearby mosques. Previous research has shown price effects at neighborhood levels but has not identified effects at the micro or individual property level, and for good reason: most transaction level data sets do not include ethnic identifiers. Applying methods from the machine learning and biostatistics literature, we develop a binomial classifier using a supervised learning algorithm and identify Arab homeowners based on the name of the buyer. We train the binomial classifier using names from Summer Olympic Rosters for 221 countries during the years 1948-2012. We demonstrate the flexibility of our methodology and perform an interesting counterfactual by identifying Hispanic and Asian homeowners in the data; unlike the statistically significant results for Arab homeowners, we find no meaningful results for Hispanic and Asian homeowners following 9/11.
    Keywords: house prices, ethnicity, homeowner preferences, terrorism, September 11th
    JEL: R21 R23 R31 J15
    Date: 2017–09
  2. By: Hannes Mueller; Christopher Rauh
    Abstract: This article provides a new methodology to predict armed conflict by using newspaper text. Through machine learning, vast quantities of newspaper text are reduced to interpretable topics. These topics are then used in panel regressions to predict the onset of conflict. We propose the use of the within-country variation of these topics to predict the timing of conflict. This allows us to avoid the tendency of predicting conflict only in countries where it occurred before. We show that the within-country variation of topics is a good predictor of conflict and becomes particularly useful when risk in previously peaceful countries arises. Two aspects seem to be responsible for these features. Topics provide depth because they consist of changing, long lists of terms which makes them able to capture the changing context of conflict. At the same time topics provide width because they are summaries of the full text, including stabilizing factors.
    Keywords: Civil War, conflict, early-warning, topic model, forecasting, machine learning, news, prediction, panel regression
    JEL: O11 O43
    Date: 2017–09
  3. By: Vegard Høghaug Larsen; Leif Anders Thorsrud
    Abstract: We decompose the textual data in a daily Norwegian business newspaper into news topics and investigate their predictive and causal role for asset prices. Our three main findings are: (1) a one unit innovation in the news topics predict roughly a 1 percentage point increase in close-to-open returns and significant continuation patterns peaking at 4 percentage points after 15 business days, with little sign of reversal; (2) simple zero-cost news-based investment strategies yield significant annualized risk-adjusted returns of up to 20 percent; and (3) during a media shortage, due to an exogenous strike, returns for firms particularly exposed to our news measure experience a substantial fall. Our estimates suggest that between 20 to 40 percent of the news topics’ predictive power is due to the causal media effect. Together these findings lend strong support for a rational attention view where the media alleviate information frictions and disseminate fundamental information to a large population of investors.
    Keywords: Stock returns, News, Machine learning, Latent Dirichlet Allocation (LDA)
    Date: 2017–09
  4. By: Sakurai, Naoko; Otsuka, Tokio; Mitomo, Hitoshi
    Abstract: Followingthe Great East Japan Earthquake, information and communications technology (ICT) is expected to play an important role in future pioneeringdisaster prevention programs and post-disaster reconstruction. The increase insmartphone users allows big data to be accumulated from such diverse sourcesaspersonal information posted on social networking services(SNS), location data, and communication histories. Japan is willing to promote the useof this big data forvarious business opportunities; however,people remain anxiousabout personal information being divulged.To use big data duringdisasters, it is important to conduct research and surveys focused on Internet users’perspectives. The purpose of this paper is to analyze internet users’ evaluation about providing personal information and to measure the willingness-to-pay (WTP) for receiving information services at the time of large-scale disasters. In order to quantify internet users’ evaluation about providing personal information, we drew up a questionnaire and conducted a survey. To obtain more accurate and practical estimation and evaluation, we adopted the Contingent Valuation Method (CVM). To identify the factors that impacted recipients’ WTP for providing personal information and to estimate the marginal contribution of each attribute to the value of WTP. We assessed the value of providing personal information at the time of large-scale disastersfor ordinary internet users quantitatively. In the case of location information, the average WTP was found to be JPY2,943. The estimated average WTP for the five cases ranged between JPY2,202 and JPY3,618, in which the highest amount was found to be for "Medical history" and the lowest for "Measurements." In order to identify the factors that impacted the respondents’ WTP, we estimated the marginal contribution of the 21 attributes to the value of WTP.
    Keywords: Unprecedented Disaster,Personal Information,CVM,WTP,Quantitative Assessment
    Date: 2017
  5. By: Denisova-Schmidt, Elena; Huber, Martin; Leontyeva, Elvira; Solovyeva, Anna
    Abstract: This paper examines how anti-corruption educational campaigns affect the attitudes of Russian university students towards corruption and academic integrity. About 2,000 survey participants were randomly assigned to one of four different information materials (brochures or videos) about the negative consequences of corruption or to a control group. Using machine learning to detect effect heterogeneity, we find that various groups of students react to the same information differently. Those who commonly plagiarize, who receive excellent grades, and whose fathers are highly educated develop stronger negative attitudes towards corruption in the aftermath of our intervention. However, some information materials lead to more tolerant views on corruption among those who rarely plagiarize, who receive average or above average grades, and whose fathers are less educated. Therefore, policy makers aiming to implement anti-corruption education at a larger scale should scrutinize the possibility of (undesired) heterogeneous effects across student groups.
    Keywords: Anti-Corruption Campaigns, Experiments, Corruption, Academic Integrity, University, Students, Russia
    JEL: D73 I23 C93
    Date: 2017–09–16
  6. By: Baris Soybilgen (Istanbul Bilgi University)
    Abstract: We propose a factor augmented neural network model to obtain short-term predictions of U.S. business cycle regimes. First, dynamic factors are extracted from a large-scale data set consisting of 122 variables. Then, these dynamic factors are fed into neural network models for predicting recession and expansion periods. We show that the neural network model provides good in sample and out of sample fits compared to the popular Markov switching dynamic factor model. We also perform a pseudo real time out of sample forecasting exercise and show that neural network models produce accurate short-term predictions of U.S. business cycle phases.
    Keywords: Dynamic Factor Model; Neural Network; Recession
    JEL: E37 E31
    Date: 2017–08
  7. By: Koen De Bock (Audencia Recherche - Audencia Business School)
    Abstract: Numerous organizations and companies rely upon business failure prediction to assess and minimize the risk of initiating business relationships with partners, clients, debtors or suppliers. Advances in research on business failure prediction have been largely dominated by algorithmic development and comparisons led by a focus on improvements in model accuracy. In this context, ensemble learning has recently emerged as a class of particularly well-performing methods, albeit often at the expense of increased model complexity. However, in practice, model choice is rarely based on predictive performance alone. Models should be comprehensible and justifiable to assess their compliance with common sense and business logic, and guarantee their acceptance throughout the organization. A promising ensemble classification algorithm that has been shown to reconcile performance and comprehensibility are rule ensembles. In this study, an extension entitled spline-rule ensembles is introduced and validated in the domain of business failure prediction. Spline-rule ensemble complement rules and linear terms found in conventional rule ensembles with smooth functions with the aim of better accommodating nonlinear simple effects of individual features on business failure. Experiments on a large selection of 21 datasets of European companies in various sectors and countries (i) demonstrate superior predictive performance of spline-rule ensembles over a set of well-established yet powerful benchmark methods, (ii) show the superiority of spline-rule ensembles over conventional rule ensembles and thus demonstrate the value of the incorporation of smoothing splines, (iii) investigate the impact of alternative term regularization procedures and (iv) illustrate the comprehensibility of the resulting models through a case study. In particular, the ability of the technique to reveal the extent and the way in which predictors impact business failure, and if and how variables interact, are exemplified.
    Keywords: Bankruptcy prediction,business failure prediction,data mining,ensemble learning,model comprehensibility,penalized cubic regression splines,rule ensembles,spline-rule ensembles,risk management
    Date: 2017
  8. By: Nicholls, Rob
    Abstract: The algorithm driven conduct of platform operators ; as the expert handlers of big data ; is starting to challenge the way in which competition law needs to be enforced. Businesses ; especially platform operators ; acquire data and particularly pricing information from other businesses in real-time. This leads to specific potential problems with autonomous actors engaged in algorithmic tacit collusion. These problems are compounded when usual legal tests for collusive price fixing require both a meeting of the minds of the colluding firms and a commitment to the price fixing conduct. It is not clear that bots meet either of these tests. The paper finds that price fixing is unethical using multiple analytical lenses but that the illegality of algorithmic tacit collusion is less clear. By considering the issues associated with concerted practices from a legal and ethical perspective ; the paper charts some approaches that might be applied. It uses changes in competition law in Australia to highlight potential ways of dealing with algorithmic tacit collusion ; but also highlights the potential unintended consequences associated with such changes.
    Keywords: Algorithmic tacit collusion,bots,business ethics,cartel conduct,concerted practices,price fixing
    Date: 2017
  9. By: Bouwman, Harry; de Reuver, Mark; Nikou, Shahrokh
    Abstract: Digital technology has forced entrepreneurs to reconsider their business models (BMs). Although research on entrepreneurial intention and business models is gaining attention, there is still a large knowledge gap on both fields. In this paper, we specifically address the impact of digitalization on business model innovation (BMI). Based on data collected from 338 European small- to medium-sized enterprises (SMEs) actively using IT artefacts, social media, or big data to innovate their business model, we study antecedents of BM experimentation and BM innovation practices, as well as overall business performance. We carried out four in-depth case studies of companies in which BM innovation is related to IT artefacts and more specifically to social media and big data. The findings from the quantitative study show that BMI is related to IT artefacts, social media, and big data. Use of IT artefacts, social media, and big data is mainly driven by strategic and innovation-related internal motives, although external technology turbulence plays a role too. BM innovation driven by IT artefacts, social media, and big data has an impact on performance. Although the case studies show that this is more evident for IT artefacts and big data than for social media.
    Keywords: big data,business model innovation,digitalization,IT artefacts,social media
    Date: 2017
  10. By: Christine Sybord (COACTIS - UL2 - Université Lumière - Lyon 2 - UJM - Université Jean Monnet [Saint-Etienne])
    Abstract: Avec les défis de stockage, d'analyse, de protection des données de santé à caractère personnel posés par le Big Data, la question de l'évaluation d'un Système (d'information) d'Aide à la Décision Médicale (SADM) devient cruciale autant pour le médecin que pour le patient. Dans ces conditions, l'article aborde la conception d'un SADM en questionnant ses conditions d'intégration dans les pratiques médicales de la décision clinique. La première partie, après avoir caractérisé le Big data sur un plan économique, présente le cadre cognitif et technique des SADM. Le cadre et le statut d'un SADM étant posés, la deuxième partie analyse les fondements juridiques du triptyque patient – médecin – SADM/Big Data. Cette analyse conduit à une remise en question strictement physique ou morale de la responsabilité médicale à l'avantage d'une réflexion sur les conditions épistémologiques et éthiques de la gestion d'une responsabilité médicale étendue. La troisième partie présente ainsi une analyse critique des systèmes de connaissances (Ermine, 1996), en référence à la Théorie du Système Général (Le Moigne, 1994). Cette analyse critique amène le cadrage théorique de la conception systémique sociocognicielle d'un SADM. Cette conception permet une gestion éthique de la responsabilité médicale étendue et facilite ainsi la communication médecin-patient. Le cadrage théorique étant posé, la quatrième partie présente le modèle sociocogniciel d'un SADM en référence à une approche sociocognitive. Le modèle instrumente l'organisation des relations (médecin, patient, SADM) qui se co-construisent et interviennent dans la prise de décision clinique.
    Keywords: Big Data, Décision clinique, Système d'Aide à la Décision Médicale, Responsabilité médicale,Systèmes complexes , Systèmes de connaissances, Approche socio-cognitive
    Date: 2016–05

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.