nep-big 2020-02-10 papers

on Big Data

Issue of 2020‒02‒10
27 papers chosen by
Tom Coupé
University of Canterbury

Stock Price Prediction Using Convolutional Neural Networks on a Multivariate Timeseries By Sidra Mehtab; Jaydip Sen
Tax dredger on social networks: new learning algorithms to track fraud By D. Desbois
Mapping the risk terrain for crime using machine learning By Wheeler, Andrew Palmer; Steenbeek, Wouter
The digital layer: How innovative firms relate on the web By Krüger, Miriam; Kinne, Jan; Lenz, David; Resch, Bernd
Extracting Statistical Factors When Betas are Time-Varying By Patrick Gagliardini; Hao Ma
Economic Predictions with Big Data: The Illusion of Sparsity By Giorgio E. Primiceri; Michele Lenza; Domenico Giannone
The mirror for (artificial) intelligence: Working in whose reflection? By Moore, Phoebe V.
Approche d'Intelligence Artificielle (AI) (Machine Learning) pour un Système d'Orientation des Consommateurs des Produits Ecolabel (Smart Eco-Adviser) Vers une Consommation Verte et Intelligente By Chehbi Gamoura Chehbi
Market Efficiency in the Age of Big Data By Ian Martin; Stefan Nagel
Web-based innovation indicators: Which firm website characteristics relate to firm-level innovation activity? By Axenbeck, Janna; Breithaupt, Patrick
Forecasting GDP growth from outer space By Jaqueson K. Galimberti
Approche d'Intelligence Artificielle (AI) et Big Data (une des Technologies 4.0) pour des Ecolabels Tagués (Eco-Tagged label) Vers une Consommation Verte et Intelligente By Chehbi Gamoura Chehbi
The Editor vs. the Algorithm: Returns to Data and Externalities in Online News By Jörg Claussen; Christian Peukert; Ananya Sen
Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective By Tine Van Calster; Filip Van den Bossche; Bart Baesens; Wilfried Lemahieu
Demand Shocks, Procurement Policies, and the Nature of Medical Innovation: Evidence from Wartime Prosthetic Device Patents By Jeffrey Clemens; Parker Rogers
The Allocation of Decision Authority to Human and Artificial Intelligence By Susan C. Athey; Kevin A. Bryan; Joshua S. Gans
Using Networks and Partial Differential Equations to Predict Bitcoin Price By Yufang Wang; Haiyan Wang
Urban Street Network Analysis in a Computational Notebook By Boeing, Geoff
China: Challenges and Prospects from an Industrial and Innovation Powerhouse By ALVES DIAS Patricia; AMOROSO Sara; ANNONI Alessandro; ASENSIO BERMEJO Jose Miguel; BELLIA Mario; BLAGOEVA Darina; DE PRATO Giuditta; DOSSO Mafini; FAKO Peter; FIORINI Alessandro; GEORGAKAKI Aliki; GKOTSIS Petros; GOENAGA BELDARRAIN Xabier; GREGORI Wildmer; HRISTOV Hristo; JAEGER-WALDAU Arnulf; JONKERS Koen; LEWIS Adam; MARMIER Alain; MARSCHINSKI Robert; MARTINEZ TUREGANO David; MUNOZ-PINEIRO Maria Amalia; NARDO Michela; NDACYAYISENGA Nathalie; PASIMENI Francesco; PREZIOSI Nadir; RANCAN Michela; RUEDA CANTUCHE Jose; RONDINELLA Vincenzo; TANARRO COLODRON Jorge; TELSNIG Thomas; TESTA Giuseppina; THIEL Christian; TRAVAGNIN Martino; TUEBKE Alexander
Forecasting Realized Volatility of Bitcoin: The Role of the Trade War By Elie Bouri; Konstantinos Gkillas; Rangan Gupta; Christian Pierdzioch
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability By Yafei Han; Christopher Zegras; Francisco Camara Pereira; Moshe Ben-Akiva
Mitigating Bias in Big Data for Transportation By Griffin, Greg Phillip; Mulhall, Megan; Simek, Chris; Riggs, William W.
A Bayesian Long Short-Term Memory Model for Value at Risk and Expected Shortfall Joint Forecasting By Zhengkun Li; Minh-Ngoc Tran; Chao Wang; Richard Gerlach; Junbin Gao
Justice is in the Eyes of the Beholder – Eye Tracking Evidence on Balancing Normative Concerns in Torts Cases By Christoph Engel; Rima Maria Rahal
Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning By Hans Buehler; Lukas Gonon; Josef Teichmann; Ben Wood; Baranidharan Mohan; Jonathan Kochems
Event Studies in Merger Analysis: Review and an Application Using U.S. TNIC Data By Timo Klein
The Structure of Economic News By Leland Bybee; Bryan T. Kelly; Asaf Manela; Dacheng Xiu

Stock Price Prediction Using Convolutional Neural Networks on a Multivariate Timeseries

By:	Sidra Mehtab; Jaydip Sen
Abstract:	Prediction of future movement of stock prices has been a subject matter of many research work. In this work, we propose a hybrid approach for stock price prediction using machine learning and deep learning-based methods. We select the NIFTY 50 index values of the National Stock Exchange of India, over a period of four years, from January 2015 till December 2019. Based on the NIFTY data during the said period, we build various predictive models using machine learning approaches, and then use those models to predict the Close value of NIFTY 50 for the year 2019, with a forecast horizon of one week. For predicting the NIFTY index movement patterns, we use a number of classification methods, while for forecasting the actual Close values of NIFTY index, various regression models are built. We, then, augment our predictive power of the models by building a deep learning-based regression model using Convolutional Neural Network with a walk-forward validation. The CNN model is fine-tuned for its parameters so that the validation loss stabilizes with increasing number of iterations, and the training and validation accuracies converge. We exploit the power of CNN in forecasting the future NIFTY index values using three approaches which differ in number of variables used in forecasting, number of sub-models used in the overall models and, size of the input data for training the models. Extensive results are presented on various metrics for all classification and regression models. The results clearly indicate that CNN-based multivariate forecasting model is the most effective and accurate in predicting the movement of NIFTY index values with a weekly forecast horizon.
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2001.09769&r=all

Tax dredger on social networks: new learning algorithms to track fraud

By:	D. Desbois (ECO-PUB - Economie Publique - INRA - Institut National de la Recherche Agronomique - AgroParisTech)
Abstract:	In France, estimates of tax evasion vary between 2 and 80 billion euros (€ bn) according to the parliamentary report of Bénédicte Peyrol. This would explain the injunction addressed by President Emmanuel Macron to the Court of Auditors on April 25 to shed light on this controversial issue in a context of tensions over public finances and a decline in tax compliance. In a letter sent on May 9 to Didier Migaud, president of this institution, Prime Minister Édouard Philippe indicates that "the time has come to take stock of the scale of tax fraud in the country and to assess the action state services and the tools that are put in place. " A recent interview with Gérald Darmanin, Minister of Action and Public Accounts, has just revealed the French government's plan to use machine learning algorithms to better target tax audits based on the information that taxpayers disclose to them. - even on social networks. Illegal trade and false tax domiciliations are particularly targeted by article 57 of the 2020 finance bill, which provides for the use of artificial intelligence in the service of the fight against tax fraud, adopted on November 13 by MEPs 3. Thus, this project plans to strengthen the IT resources to improve the targeting of tax audit operations thanks to an investment of 20 million euros by 2022. Artificial intelligence, often invoked to discuss technologies Numeric, is a misleading term, because it evokes the capacities of machines fantasized by the works of science fiction popularized by the seventh art. In tax matters, nothing like this: among the myriad of behaviors observed, the use of machine learning techniques aims to detect recurrent ones that are specific to certain types of fraud (VAT, bleaching, false domiciliation and illicit optimization). However, the tax administration concedes a weak point: "Today, nearly one audit in four results in only a small recovery. "
Abstract:	En France, les estimations de fraude fiscale varieraient entre 2 et 80 milliards d'euros (Md€) selon le rapport parlementaire de Bénédicte Peyrol. Ce qui expliquerait l'injonction adressée par le Président Emmanuel Macron à la Cour des comptes, le 25 avril dernier, pour faire la lumière sur cette question controversée dans un contexte de tensions sur les finances publiques et de baisse du consentement à l'impôt. Dans un courrier adressé le 9 mai à Didier Migaud, président de cette institution, le Premier ministre Édouard Philippe indique que « le moment est venu de dresser un bilan de l'ampleur de la fraude fiscale dans le pays et d'évaluer l'action des services de l'État et les outils qui sont mis en place ». Une interview récente de Gérald Darmanin, ministre de l'Action et des Comptes publics, vient de révéler le projet du gouvernement français d'utiliser des algorithmes d'apprentissage automatique pour mieux cibler les contrôles fiscaux sur la base des informations que les contribuables dévoilent eux-mêmes sur les réseaux sociaux. Le commerce illicite et les fausses domiciliations fiscales sont particulièrement visés par l'article 57 du projet de loi de finances 2020, qui prévoit un usage de l'intelligence artificielle au service de la lutte contre la fraude fiscale, adopté le 13 novembre dernier par les députés 3. Ainsi, ce projet prévoit de renforcer les moyens informatiques pour améliorer le ciblage des opérations de contrôle fiscal grâce à un investissement de 20 millions d'euros d'ici à 2022. L'intelligence artificielle, souvent invoquée pour discourir sur les technologies numériques, est un terme trompeur, car il évoque les capacités de machines fantasmées par les oeuvres de science-fiction popularisées par le septième art. En matière fiscale, rien de tel : parmi la myriade des comportements observés, l'utilisation de techniques d'apprentissage automatique a pour objectif de détecter ceux récurrents qui seraient spécifiques à certains types de fraudes (à la TVA, au blanchiment, à la fausse domiciliation et à l'optimisation illicite). Cependant, l'administration fiscale concède un point faible : « Aujourd'hui, près d'une vérification sur quatre n'aboutit qu'à un redressement peu élevé. »
Date:	2019–12–12
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-02406386&r=all

Mapping the risk terrain for crime using machine learning

By:	Wheeler, Andrew Palmer (University of Texas at Dallas); Steenbeek, Wouter
Abstract:	Objectives: We illustrate how a machine learning algorithm, Random Forests, can provide accurate long-term predictions of crime at micro places relative to other popular techniques. We also show how recent advances in model summaries can help to open the ‘black box’ of Random Forests, considerably improving their interpretability. Methods: We generate long-term crime forecasts for robberies in Dallas at 200 by 200 feet grid cells that allow spatially varying associations of crime generators and demographic factors across the study area. We then show how using interpretable model summaries facilitate understanding the model’s inner workings. Results: We find that Random Forests greatly outperform Risk Terrain Models and Kernel Density Estimation in terms of forecasting future crimes using different measures of predictive accuracy, but only slightly outperform using prior counts of crime. We find different factors that predict crime are highly non-linear and vary over space. Conclusions: We show how using black-box machine learning models can provide accurate micro placed based crime predictions, but still be interpreted in a manner that fosters understanding of why a place is predicted to be risky. Data and code to replicate the results can be downloaded from https://www.dropbox.com/sh/b3n9a6z5xw14r d6/AAAjqnoMVKjzNQnWP9eu7M1ra?dl=0
Date:	2020–01–18
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:xc538&r=all

The digital layer: How innovative firms relate on the web

By:	Krüger, Miriam; Kinne, Jan; Lenz, David; Resch, Bernd
Abstract:	In this paper, we introduce the concept of a Digital Layer to empirically investigate inter-firm relations at any geographical scale of analysis. The Digital Layer is created from large-scale, structured web scraping of firm websites, their textual content and the hyperlinks among them. Using text-based machine learning models, we show that this Digital Layer can be used to derive meaningful characteristics for the over seven million firm-to-firm relations, which we analyze in this case study of 500,000 firms based in Germany. Among others, we explore three dimensions of relational proximity: (1) Cognitive proximity is measured by the similarity between firms' website texts. (2) Organizational proximity is measured by classifying the nature of the firms' relationships (business vs. non-business) using a text-based machine learning classification model. (3) Geographical proximity is calculated using the exact geographic location of the firms. Finally, we use these variables to explore the differences between innovative and non-innovative firms with regard to their location and relations within the Digital Layer. The firm-level innovation indicators in this study come from traditional sources (survey and patent data) and from a novel deep learning-based approach that harnesses firm website texts. We find that, after controlling for a range of firm-level characteristics, innovative firms compared to non-innovative firms maintain more numerous relationships and that their partners are more innovative than partners of non-innovative firms. Innovative firms are located in dense areas and still maintain relationships that are geographically farther away. Their partners share a common knowledge base and their relationships are business-focused. We conclude that the Digital Layer is a suitable and highly cost-efficient method to conduct large-scale analyses of firm networks that are not constrained to specific sectors, regions, or a particular geographical level of analysis. As such, our approach complements other relational datasets like patents or survey data nicely.
Keywords:	Web Mining,Innovation,Proximity,Network,Natural Language Processing
JEL:	O30 R10 C80
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:zewdip:20003&r=all

Extracting Statistical Factors When Betas are Time-Varying

By:	Patrick Gagliardini (USI Università della Svizzera italiana; Swiss Finance Institute); Hao Ma (USI Università della Svizzera italiana; Swiss Finance Institute, Students)
Abstract:	This paper deals with identification and inference on the unobservable conditional factor space and its dimension in large unbalanced panels of asset returns. The model specification is nonparametric regarding the way the loadings vary in time as functions of common shocks and individual characteristics. The number of active factors can also be time-varying as an effect of the changing macroeconomic environment. The method deploys Instrumental Variables (IV) which have full-rank covariation with the factor betas in the cross-section. It allows for a large dimension of the vector generating the conditioning information by machine learning techniques. In an empirical application, we infer the conditional factor space in the panel of monthly returns of individual stocks in the CRSP dataset between January 1971 and December 2017.
Keywords:	Large Panel, Unobservable Factors, Conditioning Information, Instrumental Variables, Machine Learning, Post-Lasso, Artificial Neural Networks
JEL:	G12
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp1965&r=all

Economic Predictions with Big Data: The Illusion of Sparsity

By:	Giorgio E. Primiceri (Northwestern University; Princeton University; National Bureau of Economic Research; Centre for Economic Policy Research (CEPR)); Michele Lenza (University of Minnesota); Domenico Giannone (Solvay Brussels School of Economics and Management; Federal Reserve Bank of New York; La Trobe University; Université Libre de Bruxelles; Libera Universität Internazionale degli Studi Sociali; European Central Bank; University of Aston in Birmingham; European Centre for Advanced Research in Economics and Statistics; Centre for Economic Policy Research (CEPR))
Abstract:	The availability of large data sets, combined with advances in the fields of statistics, machine learning, and econometrics, have generated interest in forecasting models that include many possible predictive variables. Are economic data sufficiently informative to warrant selecting a handful of the most useful predictors from this larger pool of variables? This post documents that they usually are not, based on applications in macroeconomics, microeconomics, and finance.
Keywords:	Shrinkage; High Dimensional Data; Model Selection
JEL:	C1 C5 C5
URL:	http://d.repec.org/n?u=RePEc:fip:fednls:87258&r=all

The mirror for (artificial) intelligence: Working in whose reflection?

By:	Moore, Phoebe V.
Abstract:	The mirror for (artificial) intelligence: In whose reflection?' sets out the parameters for caution in considering as-yet relatively un-debated issues in artificial intelligence (AI) research, which is the concept itself of 'intelligence'. After the AI 'winters' ending in the late 1990s, during which AI development met substantive obstacles, a new AI summer commences. What is still missing is a careful consideration of the historical significance of the weighting that has been placed on particular aspects of consciousness and surrounding seemingly human-like workplace behaviour which takes increasing significance given the interest in machinic autonomous intelligence. The discussion paper argues that a series of machinic and technological invention and related experiments show how machines facilitate not only the processes of normalization of what are considered intelligent behaviours, via both human and machinic intelligence, but also facilitate and enable the integration of autonomous machines into everyday work and life. Today, ideas of autonomous machinic intelligence, seen in the ways AI-augmented tools and applications in human resources, robotics, and gig work are incorporated into workplaces, facilitate workplace relations via machinic intelligent behaviours, that are explicitly assistive, prescriptive, descriptive, collaborative, predictive and affective. The question is, given these now autonomous forms of intelligence attributed to machines, who/what is looking in the mirror at whose/which reflection?
Keywords:	Cybernetics,Artificial Intelligence,Robotics,Autonomous Machines,Workplace Relations,Human-Machine interaction,History of Technology,Kybernetik,Künstliche Intelligenz,Robotik,Autonome Maschinen,Beziehungen am Arbeitsplatz,Mensch-Maschine-Interaktion,Innovationsgeschichte
JEL:	O30 J81 L00 I15
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:zbw:wzbgwp:spiii2019302&r=all

Approche d'Intelligence Artificielle (AI) (Machine Learning) pour un Système d'Orientation des Consommateurs des Produits Ecolabel (Smart Eco-Adviser) Vers une Consommation Verte et Intelligente

By:	Chehbi Gamoura Chehbi (EM Strasbourg - Ecole de Management de Strasbourg)
Abstract:	Dans le secteur de consommation écologique des produits écolabel, l'orientation des consommateurs vers les produits recherchés soulève une complexité non négligeable face aux chercheurs et industriels. Les espaces spatiaux des magasins et la multiplicité des produits disposés ainsi que le manque de repérage des consommateurs forme un système complexe dynamique nécessitant des heuristiques exploratoires avancées. Dans cet article, nous proposons une approche basée sur l'apprentissage automatique (Machine Learning), un des sous-domaines appartenant à la classe des algorithmes de recommandation (recommendation systems). L'objectif étant de pouvoir orienter les consommateurs des produits écolabel dans un espace spatial d'exposition (magasin ou grande surface). L'idée de base est celle de joindre-dans un seul espace de données exploratoires-les données provenant des consommateurs profilés (Data-Consumers) et les données provenant des produits éco-labellisés (Data-products). Le matching intelligent entre les deux se fait donc via le Machine Learning par recommandation. Les systèmes intelligents par recommandation sont les systèmes d'Intelligence Artificielle les plus utilisées dans les modèles applicatifs dans les industries et notamment dans le commerce. Nous pensons que cette approche aurait un rôle très clé dans l'orientation du choix du consommateur de l'écolabel. L'Intelligence Artificielle considéré aujourd'hui comme le nouvel Eldorado des Technologies 4.0 en plein boom applicatif actuellement nous ouvre le potentiel de recherche de solutions efficaces en consommation écologique dans une démarche que nous initions et que nous appelons 'la consommation verte et intelligente'.
Keywords:	écolabel,profil consommateur,orientation consommateur,Intelligence Artificielle,Consommation écologique,Machine Learning,Technologie 40,recommendation system
Date:	2019–12–24
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02423632&r=all

Market Efficiency in the Age of Big Data

By:	Ian Martin; Stefan Nagel
Abstract:	Modern investors face a high-dimensional prediction problem: thousands of observable variables are potentially relevant for forecasting. We reassess the conventional wisdom on market efficiency in light of this fact. In our model economy, which resembles a typical machine learning setting, N assets have cash flows that are a linear function of J firm characteristics, but with uncertain coefficients. Risk-neutral Bayesian investors impose shrinkage (ridge regression) or sparsity (Lasso) when they estimate the J coefficients of the model and use them to price assets. When J is comparable in size to N, returns appear cross-sectionally predictable using firm characteristics to an econometrician who analyzes data from the economy ex post. A factor zoo emerges even without p-hacking and data-mining. Standard in-sample tests of market efficiency reject the no-predictability null with high probability, despite the fact that investors optimally use the information available to them in real time. In contrast, out-of-sample tests retain their economic meaning.
Keywords:	Bayesian learning, high-dimensional prediction problems, return predictability, out-of-sample tests
JEL:	G14 G12 C11
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_8015&r=all

Web-based innovation indicators: Which firm website characteristics relate to firm-level innovation activity?

By:	Axenbeck, Janna; Breithaupt, Patrick
Abstract:	Web-based innovation indicators may provide new insights into firm-level innovation activities. However, little is known yet about the accuracy and relevance of web-based information. In this study, we use 4,485 German firms from the Mannheim Innovation Panel (MIP) 2019 to analyze which website characteristics are related to innovation activities at the firm level. Website characteristics are measured by several text mining methods and are used as features in different Random Forest classification models that are compared against each other. Our results show that the most relevant website characteristics are the website's language, the number of subpages, and the total text length. Moreover, our website characteristics show a better performance for the prediction of product innovations and innovation expenditures than for the prediction of process innovations.
Keywords:	text as data,innovation indicators,machine learning
JEL:	C53 C81 C83 O30
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:zbw:zewdip:19063&r=all

Forecasting GDP growth from outer space

By:	Jaqueson K. Galimberti (School of Economics, Auckland University of Technology)
Abstract:	We evaluate the usefulness of satellite-based data on night-time lights for forecasting GDP growth across a global sample of countries, proposing innovative location-based indicators to extract new predictive information from the lights data. Our findings are generally favorable to the use of night lights data to improve the accuracy of model-based forecasts. We also find a substantial degree of heterogeneity across countries in the relationship between lights and economic activity: individually-estimated models tend to outperform panel specifications. Key factors underlying the night lights performance include the country’s size and income level, logistics infrastructure, and the quality of national statistics.
Keywords:	night lights, remote sensing, big data, business cycles, leading indicators
JEL:	C55 C82 E01 E37 R12
Date:	2019–12
URL:	http://d.repec.org/n?u=RePEc:aut:wpaper:202002&r=all

Approche d'Intelligence Artificielle (AI) et Big Data (une des Technologies 4.0) pour des Ecolabels Tagués (Eco-Tagged label) Vers une Consommation Verte et Intelligente

By:	Chehbi Gamoura Chehbi (EM Strasbourg - Ecole de Management de Strasbourg)
Abstract:	En consommation écologique dite 'verte' ou encore 'responsable', les systèmes d'information prennent une place importante dans la traçabilité et l'optimisation du flux. Cependant la complexité des flux qui se croisent et la multiplicité des organismes qui délivrent les labels, rendent le rôle de ces systèmes très complexe. La recherche scientifique elle, se trouve contrainte de trouver des approches et outils d'analytique avancée afin de rendre cette complexité intelligible aux managers et analystes. Ces dernières années, l'un des domaines les plus puissants servant l'analytique des données est Intelligence Artificielle (IA). L'IA venant des sciences dures tend à se trouver une place dans le milieu du management mais avec un rythme ralenti pour cause de la nature des outils employés venant des sciences dites exactes. Cette difficulté se fait constatée également dans le monde de la consommation malgré les applications multiples qui voient le jour. Dans ce travail, nous essayons de trouver une issue à cette difficulté où nous employons conjointement l'IA avec l'une des autres Technologie 4.0 qui est la technologie des Big Data en exploitant le cycle de vie du produit dans le système de l'éco-labélisation (Life Cycle Assessment-LCA). Mots Clés : Consommation écologique, écolabel, tag, traçabilité, Intelligence Artificielle, Big Data, Technologie 4.0, Cycle de vie du produit éco-labélisé, Life Cycle Assessment-LCA.
Date:	2019–12–21
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-02422378&r=all

The Editor vs. the Algorithm: Returns to Data and Externalities in Online News

By:	Jörg Claussen; Christian Peukert; Ananya Sen
Abstract:	We run a field experiment to quantify the economic returns to data and informational ex-ternalities associated with algorithmic recommendation relative to human curation in the context of online news. Our results show that personalized recommendation can outperform human curation in terms of user engagement, though this crucially depends on the amount of personal data. Limited individual data or breaking news leads the editor to outperform the algorithm. Additional data helps algorithmic performance but diminishing economic returns set in rapidly. Investigating informational externalities highlights that personalized recommendation reduces consumption diversity. Moreover, users associated with lower levels of digital literacy and more extreme political views engage more with algorithmic recommendations.
Keywords:	field experiment, economics of AI, returns to data, filter bubbles
JEL:	L82 L51 J24
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_8012&r=all

Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

By:	Tine Van Calster; Filip Van den Bossche; Bart Baesens; Wilfried Lemahieu
Abstract:	Choosing the technique that is the best at forecasting your data, is a problem that arises in any forecasting application. Decades of research have resulted into an enormous amount of forecasting methods that stem from statistics, econometrics and machine learning (ML), which leads to a very difficult and elaborate choice to make in any forecasting exercise. This paper aims to facilitate this process for high-level tactical sales forecasts by comparing a large array of techniques for 35 times series that consist of both industry data from the Coca-Cola Company and publicly available datasets. However, instead of solely focusing on the accuracy of the resulting forecasts, this paper introduces a novel and completely automated profit-driven approach that takes into account the expected profit that a technique can create during both the model building and evaluation process. The expected profit function that is used for this purpose, is easy to understand and adaptable to any situation by combining forecasting accuracy with business expertise. Furthermore, we examine the added value of ML techniques, the inclusion of external factors and the use of seasonal models in order to ascertain which type of model works best in tactical sales forecasting. Our findings show that simple seasonal time series models consistently outperform other methodologies and that the profit-driven approach can lead to selecting a different forecasting model.
Date:	2020–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2002.00949&r=all

Demand Shocks, Procurement Policies, and the Nature of Medical Innovation: Evidence from Wartime Prosthetic Device Patents

By:	Jeffrey Clemens; Parker Rogers
Abstract:	We analyze wartime prosthetic device patents to investigate how procurement policy affects the cost, quality, and quantity of medical innovation. Analyzing whether inventions emphasize cost and/or quality requires generating new data. We do this by first hand-coding the economic traits emphasized in 1,200 patent documents. We then train a machine learning algorithm and apply the trained models to a century's worth of medical and mechanical patents that form our analysis sample. In our analysis of these new data, we find that the relatively stingy, fixed-price contracts of the Civil War era led inventors to focus broadly on reducing costs, while the less cost-conscious procurement contracts of World War I did not. We provide a conceptual framework that highlights the economic forces that drive this key finding. We also find that inventors emphasized dimensions of product quality (e.g., a prosthetic's appearance or comfort) that aligned with differences in buyers' preferences across wars. Finally, we find that the Civil War and World War I procurement shocks led to substantial increases in the quantity of prosthetic device patenting relative to patenting in other medical and mechanical technology classes. We conclude that procurement environments can significantly shape the scientific problems with which inventors engage, including the choice to innovate on quality or cost.
JEL:	H57 I1 O31
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26679&r=all

The Allocation of Decision Authority to Human and Artificial Intelligence

By:	Susan C. Athey; Kevin A. Bryan; Joshua S. Gans
Abstract:	The allocation of decision authority by a principal to either a human agent or an artificial intelligence (AI) is examined. The principal trades off an AI’s more aligned choice with the need to motivate the human agent to expend effort in learning choice payoffs. When agent effort is desired, it is shown that the principal is more likely to give that agent decision authority, reduce investment in AI reliability and adopt an AI that may be biased. Organizational design considerations are likely to impact on how AI’s are trained.
JEL:	C7 M54 O32 O33
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26673&r=all

Using Networks and Partial Differential Equations to Predict Bitcoin Price

By:	Yufang Wang; Haiyan Wang
Abstract:	Over the past decade, the blockchain technology and its Bitcoin cryptocurrency have received considerable attention. Bitcoin has experienced significant price swings in daily and long-term valuations. In this paper, we propose a partial differential equation (PDE) model on the bitcoin transaction network for predicting bitcoin price. Through analysis of bitcoin subgraphs or chainlets, the PDE model captures the influence of transaction patterns on bitcoin price over time and combines the effect of all chainlet clusters. In addition, Google Trends Index is incorporated to the PDE model to reflect the effect of bitcoin market sentiment. The experiment shows that the average accuracy of daily bitcoin price prediction is 0.82 for 362 consecutive days in 2017. The results demonstrate the PDE model is capable of predicting bitcoin price. The paper is the first attempt to apply a PDE model to the bitcoin transaction network for predicting bitcoin price.
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2001.03099&r=all

Urban Street Network Analysis in a Computational Notebook

By:	Boeing, Geoff (Northeastern University)
Abstract:	Computational notebooks offer researchers, practitioners, students, and educators the ability to interactively conduct analytics and disseminate reproducible workflows that weave together code, visuals, and narratives. This article explores the potential of computational notebooks in urban analytics and planning, demonstrating their utility through a case study of OSMnx and its tutorials repository. OSMnx is a Python package for working with OpenStreetMap data and modeling, analyzing, and visualizing street networks anywhere in the world. Its official demos and tutorials are distributed as open-source Jupyter notebooks on GitHub. This article showcases this resource by documenting the repository and demonstrating OSMnx interactively through a synoptic tutorial adapted from the repository. It illustrates how to download urban data and model street networks for various study sites, compute network indicators, visualize street centrality, calculate routes, and work with other spatial data such as building footprints and points of interest. Computational notebooks help introduce methods to new users and help researchers reach broader audiences interested in learning from, adapting, and remixing their work. Due to their utility and versatility, the ongoing adoption of computational notebooks in urban planning, analytics, and related geocomputation disciplines should continue into the future.
Date:	2020–01–13
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:dxtq3&r=all

China: Challenges and Prospects from an Industrial and Innovation Powerhouse

By:	ALVES DIAS Patricia; AMOROSO Sara; ANNONI Alessandro; ASENSIO BERMEJO Jose Miguel; BELLIA Mario; BLAGOEVA Darina; DE PRATO Giuditta; DOSSO Mafini; FAKO Peter; FIORINI Alessandro; GEORGAKAKI Aliki; GKOTSIS Petros; GOENAGA BELDARRAIN Xabier; GREGORI Wildmer; HRISTOV Hristo; JAEGER-WALDAU Arnulf; JONKERS Koen; LEWIS Adam; MARMIER Alain; MARSCHINSKI Robert; MARTINEZ TUREGANO David; MUNOZ-PINEIRO Maria Amalia; NARDO Michela; NDACYAYISENGA Nathalie; PASIMENI Francesco; PREZIOSI Nadir; RANCAN Michela; RUEDA CANTUCHE Jose; RONDINELLA Vincenzo; TANARRO COLODRON Jorge; TELSNIG Thomas; TESTA Giuseppina; THIEL Christian; TRAVAGNIN Martino; TUEBKE Alexander
Abstract:	China is rapidly becoming a major industrial competitor in high tech and growth sectors. Its economic success and related industrial policies have received a high degree of attention, especially in light of its capacity to challenge the leading position of advanced economies in several fields. China aims, through the 'Made in China 2025' strategy, to become a world leader in key industrial sectors. In these sectors, it strives to strengthen its domestic innovation capacity, to reduce its reliance on foreign technologies while moving up in global value chains. This report analyses China's approach to attain a dominant position in international markets through a combination of industrial, R&I, trade and foreign direct investment policies. It offers an assessment of China's current position compared to the EU and US innovation systems across a range of dimensions. It concludes that China has become a major industrial competitor in several rapidly expanding high tech sectors, which may well result in attaining China's goal of becoming an innovation leader in specific areas. As a response, the EU will need to boost its industrial and R&I performance and develop a trade policy that can ensure a level playing field for EU companies in China and for Chinese companies in the EU.
Keywords:	China, Global Value Chains, M&As, FDI, Venture Capital, R&I, Genomics, Artificial Intelligence, Robotics, Quantum, Nuclear, New Vehicles, Wind Energy, Solar Photovoltaics, Industrial Leadership
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc116516&r=all

Forecasting Realized Volatility of Bitcoin: The Role of the Trade War

By:	Elie Bouri (USEK Business School, Holy Spirit University of Kaslik, Jounieh, Lebanon); Konstantinos Gkillas (Department of Business Administration, University of Patras – University Campus, Rio, P.O. Box 1391, 26500 Patras, Greece); Rangan Gupta (Department of Economics, University of Pretoria, Pretoria, 0002, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany)
Abstract:	We analyze the role of the US-China trade war in predicting, both in- and out-of-sample, daily realized volatility of Bitcoin returns. We study intraday data spanning from 1st July 2017 to 30th June 2019. We use the heterogeneous autoregressive realized volatility model (HAR-RV) as the benchmark model to capture stylized facts such as heterogeneity and long-memory. We then extend the HAR-RV model to include a metric of US-China trade tensions. This is our primary predictor of interest, and it is based on Google Trends. We also control for jumps, realized skewness, and realized kurtosis. For our empirical analysis, we use a machine-learning technique which is known as random forests. Our findings reveal that US-China trade uncertainty does improve forecast accuracy for various configurations of random forests and forecast horizons.
Keywords:	Bitcoin, Realized volatility, Trade war, Random forests
JEL:	G17 Q02 Q47
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202003&r=all

A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability

By:	Yafei Han; Christopher Zegras; Francisco Camara Pereira; Moshe Ben-Akiva
Abstract:	Discrete choice models (DCMs) and neural networks (NNs) can complement each other. We propose a neural network embedded choice model - TasteNet-MNL, to improve the flexibility in modeling taste heterogeneity while keeping model interpretability. The hybrid model consists of a TasteNet module: a feed-forward neural network that learns taste parameters as flexible functions of individual characteristics; and a choice module: a multinomial logit model (MNL) with manually specified utility. TasteNet and MNL are fully integrated and jointly estimated. By embedding a neural network into a DCM, we exploit a neural network's function approximation capacity to reduce specification bias. Through special structure and parameter constraints, we incorporate expert knowledge to regularize the neural network and maintain interpretability. On synthetic data, we show that TasteNet-MNL can recover the underlying non-linear utility function, and provide predictions and interpretations as accurate as the true model; while examples of logit or random coefficient logit models with misspecified utility functions result in large parameter bias and low predictability. In the case study of Swissmetro mode choice, TasteNet-MNL outperforms benchmarking MNLs' predictability; and discovers a wider spectrum of taste variations within the population, and higher values of time on average. This study takes an initial step towards developing a framework to combine theory-based and data-driven approaches for discrete choice modeling.
Date:	2020–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2002.00922&r=all

Mitigating Bias in Big Data for Transportation

By:	Griffin, Greg Phillip (The University of Texas at San Antonio); Mulhall, Megan; Simek, Chris; Riggs, William W.
Abstract:	Emerging big data resources and practices provide opportunities to improve transportation safety planning and outcomes. However, researchers and practitioners recognise that big data from mobile phones, social media, and on-board vehicle systems include biases in representation and accuracy, related to transportation safety statistics. This study examines both the sources of bias and approaches to mitigate them through a review of published studies and interviews with experts. Coding of qualitative data enabled topical comparisons and reliability metrics. Results identify four categories of bias and mitigation approaches that concern transportation researchers and practitioners: sampling, measurement, demographics, and aggregation. This structure for understanding and working with bias in big data supports research with practical approaches for rapidly evolving transportation data sources.
Date:	2020–01–18
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:trbv9&r=all

A Bayesian Long Short-Term Memory Model for Value at Risk and Expected Shortfall Joint Forecasting

By:	Zhengkun Li; Minh-Ngoc Tran; Chao Wang; Richard Gerlach; Junbin Gao
Abstract:	Value-at-Risk (VaR) and Expected Shortfall (ES) are widely used in the financial sector to measure the market risk and manage the extreme market movement. The recent link between the quantile score function and the Asymmetric Laplace density has led to a flexible likelihood-based framework for joint modelling of VaR and ES. It is of high interest in financial applications to be able to capture the underlying joint dynamics of these two quantities. We address this problem by developing a hybrid model that is based on the Asymmetric Laplace quasi-likelihood and employs the Long Short-Term Memory (LSTM) time series modelling technique from Machine Learning to capture efficiently the underlying dynamics of VaR and ES. We refer to this model as LSTM-AL. We adopt the adaptive Markov chain Monte Carlo (MCMC) algorithm for Bayesian inference in the LSTM-AL model. Empirical results show that the proposed LSTM-AL model can improve the VaR and ES forecasting accuracy over a range of well-established competing models.
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2001.08374&r=all

Justice is in the Eyes of the Beholder – Eye Tracking Evidence on Balancing Normative Concerns in Torts Cases

By:	Christoph Engel (Max Planck Institute for Research on Collective Goods); Rima Maria Rahal (Tilburg School of Social and Behavioral Sciences)
Abstract:	Frequently deciding legal cases requires an assessment in multiple, conceptually incompatible dimensions. Often one normative concern would call for one decision, and another normative concern for a different decision. The decision-maker must engage in balancing, with no help from overarching normative theory. A typical situation is torts. The decision must regularly balance concerns on behalf of the victim, the tort feasor and society at large, both on utilitarian and deontological grounds. In this paper we use eye tracking to investigate in which ways laypersons' thought processes react to normative conflict in a set of 16 torts vignettes. If normative conflict is present, participants are less likely to agree with the likely outcome if the case were tried in a German court; they take longer to decide, and they fixate longer on normative concerns presented on a decision screen. Eye movements show that participants indeed consider multiple normative concerns in competition.
Keywords:	torts, fundamental normative relativity, compensation, deterrence, utilitarian and deontological concerns, balancing, eye tracking, machine learning
JEL:	D01 D81 D91 K13 K40
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:mpg:wpaper:2020_03&r=all

Deep Hedging: Hedging Derivatives Under Generic Market Frictions Using Reinforcement Learning

By:	Hans Buehler (JP Morgan); Lukas Gonon (ETH Zurich); Josef Teichmann (ETH Zurich; Swiss Finance Institute); Ben Wood (JP Morgan Chase); Baranidharan Mohan (JP Morgan); Jonathan Kochems (JP Morgan)
Abstract:	This article discusses a new application of reinforcement learning: to the problem of hedging a portfolio of “over-the-counter” derivatives under under market frictions such as trading costs and liquidity constraints. It is an extended version of our recent work https://www.ssrn.com/abstract=3120710, here using notation more common in the machine learning literature. The objective is to maximize a non-linear risk-adjusted return function by trading in liquid hedging instruments such as equities or listed options. The approach presented here is the first efficient and model-independent algorithm which can be used for such problems at scale.
Keywords:	Reinforcement Learning, Imperfect Hedging, Derivatives Pricing, Derivatives Hedging, Deep Learning
JEL:	C61 C58
Date:	2019–05
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp1980&r=all

Event Studies in Merger Analysis: Review and an Application Using U.S. TNIC Data

By:	Timo Klein (University of Amsterdam)
Abstract:	There is a growing concern that U.S. merger control may have been too lenient, but empirical evidence remains limited. Event studies have been used as one method to acquire empirical insights into the competitive effects of mergers. However, existing work suffers from strong identifying assumptions, unreliable competitor identification or small samples. After reviewing the use and challenges of event studies in merger analysis, I use a novel application of Hoberg-Phillips (2010, 2016) Text-Based Network Industry Classification (TNIC) data to readily proxy a ranking of competitors to 1,751 of the largest U.S. mergers between 1997 and 2017. I document that following a merger announcement, the most likely competitors experience on average an abnormal return of around one percent. These abnormal returns are also associated with concerns of market power, which suggests that results are at least in part driven by an anticipation of anti-competitive effects, and hence insufficient merger control.
Keywords:	Mergers, Antitrust, Event Studies, Text-Based Network Industry Classification
JEL:	G14 G34 L13 L40
Date:	2020–01–27
URL:	http://d.repec.org/n?u=RePEc:tin:wpaper:20200005&r=all

The Structure of Economic News

By:	Leland Bybee; Bryan T. Kelly; Asaf Manela; Dacheng Xiu
Abstract:	We propose an approach to measuring the state of the economy via textual analysis of business news. From the full text content of 800,000 Wall Street Journal articles for 1984{2017, we estimate a topic model that summarizes business news as easily interpretable topical themes and quantifies the proportion of news attention allocated to each theme at each point in time. We then use our news attention estimates as inputs into statistical models of numerical economic time series. We demonstrate that these text-based inputs accurately track a wide range of economic activity measures and that they have incremental forecasting power for macroeconomic outcomes, above and beyond standard numerical predictors. Finally, we use our model to retrieve the news-based narratives that underly “shocks” in numerical economic data.
JEL:	C43 C55 C58 C82 E0 E17 E32 G0 G1
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26648&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.