nep-big 2021-02-22 papers

on Big Data

Issue of 2021‒02‒22
nineteen papers chosen by
Tom Coupé
University of Canterbury

Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel By Barbara Felderer; Jannis Kueck; Martin Spindler
On Technical Trading and Social Media Indicators in Cryptocurrencies' Price Classification Through Deep Learning By Marco Ortu; Nicola Uras; Claudio Conversano; Giuseppe Destefanis; Silvia Bartolucci
"Nowcasting and forecasting GDP growth with machine-learning sentiment indicators". By Oscar Claveria; Enric Monte; Salvador Torra
Manifold Learning with Approximate Nearest Neighbors By Fan Cheng; Rob J Hyndman; Anastasios Panagiotelis
Intelligence artificielle et contrôle de gestion : un rapport aux chiffres revisité et des enjeux organisationnels By Nicolas Berland; Christian Moinard
KI in der Finanzbranche: Im Spannungsfeld zwischen technologischer Innovation und regulatorischer Anforderung By Bauer, Kevin; Hinz, Oliver; Weber, Patrick
Stratégie & Intelligence artificielle By Henri Isaac
The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter? By Anthony Strittmatter; Conny Wunsch
The corruptive force of AI-generated advice By Margarita Leib; Nils C. K\"obis; Rainer Michael Rilke; Marloes Hagens; Bernd Irlenbusch
Uncertainty and Forecastability of Regional Output Growth in the United Kingdom: Evidence from Machine Learning By Mehmet Balcilar; David Gabauer; Rangan Gupta; Christian Pierdzioch
The Future of Healthcare around the World: Four indices integrating Technology, Productivity, Anti-Corruption, Healthcare and Market Financialization By Julia M. Puaschunder; Dirk Beerbaum
A Core of E-Commerce Customer Experience based on Conversational Data using Network Text Methodology By Andry Alamsyah; Nurlisa Laksmiani; Lies Anisa Rahimi
Understanding algorithmic collusion with experience replay By Bingyan Han
Use of Big Data in Transport Modelling By Luis Willumsen
A data-driven approach to measuring epidemiological susceptibility risk around the world By Alessandro Bitetto; Paola Cerchiello; Charilaos Mertzanis
Multi-Horizon Equity Returns Predictability via Machine Learning By Lenka Nechvatalova
Deep Learning for Market by Order Data By Zihao Zhang; Bryan Lim; Stefan Zohren
Artificial Intelligence, Robotics, Work and Productivity: The Role of Firm Heterogeneity By Heyman, Fredrik; Norbäck, Pehr-Johan; Persson, Lars
Common pool resource management and risk perceptions By Can Askan Mavi; Nicolas Querou

Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel

By:	Barbara Felderer; Jannis Kueck; Martin Spindler
Abstract:	Survey scientists increasingly face the problem of high-dimensionality in their research as digitization makes it much easier to construct high-dimensional (or "big") data sets through tools such as online surveys and mobile applications. Machine learning methods are able to handle such data, and they have been successfully applied to solve \emph{predictive} problems. However, in many situations, survey statisticians want to learn about \emph{causal} relationships to draw conclusions and be able to transfer the findings of one survey to another. Standard machine learning methods provide biased estimates of such relationships. We introduce into survey statistics the double machine learning approach, which gives approximately unbiased estimators of causal parameters, and show how it can be used to analyze survey nonresponse in a high-dimensional panel setting.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.08994&r=all

On Technical Trading and Social Media Indicators in Cryptocurrencies' Price Classification Through Deep Learning

By:	Marco Ortu; Nicola Uras; Claudio Conversano; Giuseppe Destefanis; Silvia Bartolucci
Abstract:	This work aims to analyse the predictability of price movements of cryptocurrencies on both hourly and daily data observed from January 2017 to January 2021, using deep learning algorithms. For our experiments, we used three sets of features: technical, trading and social media indicators, considering a restricted model of only technical indicators and an unrestricted model with technical, trading and social media indicators. We verified whether the consideration of trading and social media indicators, along with the classic technical variables (such as price's returns), leads to a significative improvement in the prediction of cryptocurrencies price's changes. We conducted the study on the two highest cryptocurrencies in volume and value (at the time of the study): Bitcoin and Ethereum. We implemented four different machine learning algorithms typically used in time-series classification problems: Multi Layers Perceptron (MLP), Convolutional Neural Network (CNN), Long Short Term Memory (LSTM) neural network and Attention Long Short Term Memory (ALSTM). We devised the experiments using the advanced bootstrap technique to consider the variance problem on test samples, which allowed us to evaluate a more reliable estimate of the model's performance. Furthermore, the Grid Search technique was used to find the best hyperparameters values for each implemented algorithm. The study shows that, based on the hourly frequency results, the unrestricted model outperforms the restricted one. The addition of the trading indicators to the classic technical indicators improves the accuracy of Bitcoin and Ethereum price's changes prediction, with an increase of accuracy from a range of 51-55% for the restricted model, to 67-84% for the unrestricted model.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.08189&r=all

"Nowcasting and forecasting GDP growth with machine-learning sentiment indicators".

By:	Oscar Claveria (AQR–IREA, Department of Econometrics, Statistics and Applied Economics, University of Barcelona, Diagonal 690, 08034 Barcelona, Spain.); Enric Monte (Department of Signal Theory and Communications, Polytechnic University of Catalunya (UPC).); Salvador Torra (Riskcenter–IREA, Department of Econometrics, Statistics and Applied Economics, University of Barcelona (UB).)
Abstract:	We apply the two-step machine-learning method proposed by Claveria et al. (2021) to generate country-specific sentiment indicators that provide estimates of year-on-year GDP growth rates. In the first step, by means of genetic programming, business and consumer expectations are evolved to derive sentiment indicators for 19 European economies. In the second step, the sentiment indicators are iteratively re-computed and combined each period to forecast yearly growth rates. To assess the performance of the proposed approach, we have designed two out-of-sample experiments: a nowcasting exercise in which we recursively generate estimates of GDP at the end of each quarter using the latest survey data available, and an iterative forecasting exercise for different forecast horizons We found that forecasts generated with the sentiment indicators outperform those obtained with time series models. These results show the potential of the methodology as a predictive tool.
Keywords:	Forecasting, Economic growth, Business and consumer expectations, Symbolic regression, Evolutionary algorithms, Genetic programming. JEL classification: C51, C55, C63, C83, C93.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:ira:wpaper:202103&r=all

Manifold Learning with Approximate Nearest Neighbors

By:	Fan Cheng; Rob J Hyndman; Anastasios Panagiotelis
Abstract:	Manifold learning algorithms are valuable tools for the analysis of high-dimensional data, many of which include a step where nearest neighbors of all observations are found. This can present a computational bottleneck when the number of observations is large or when the observations lie in more general metric spaces, such as statistical manifolds, which require all pairwise distances between observations to be computed. We resolve this problem by using a broad range of approximate nearest neighbor algorithms within manifold learing algorithms and evaluating their impact on embedding accuracy. We use approximate nearest neighbors for statistical maifolds by exploiting the connection between Hellinger/Total variation distance for discrete distributions and the L2/L1 norm. Via a thorough empirical investigation based on the benchmark MNIST dataset, it is shown that approximate nearest neighbors lead to substantial improvements in computational time with little to no loss in the accuracy of the embedding produced by a manifold learning algorithm. This result is robust to the use of different manifold learning algorithms, to the use of different approximate nearest neighbor algorithms, and to the use of different measures of embedding accuracy. The proposed method is applied to learning statistical manifolds data on distributions of electricity usage. This application demonstrates how the proposed methods can be used to visualize and identify anomalies and uncover underlying structure within high-dimensional data in a way that is scalable to large datasets.
Keywords:	statistical manifold, dimension reduction, anomaly detection, k-d trees, Hellinger distance, smart meter data
JEL:	C55 C65 C80
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:msh:ebswps:2021-3&r=all

Intelligence artificielle et contrôle de gestion : un rapport aux chiffres revisité et des enjeux organisationnels

By:	Nicolas Berland (DRM - Dauphine Recherches en Management - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CNRS - Centre National de la Recherche Scientifique); Christian Moinard (Audencia Recherche - Audencia Business School)
Abstract:	The coming of all forms of technology called "cognitive computing" (AI, big data, etc.) could upend current assessments of corporate performance. More than a new way to analyze performance thanks to new indicators, this technology is leading to a new relation to statistical data while also bringing along risks. To avoid the dangers of algorithmic black-box models and respond to issues of interpretability, occupations (mainly, that of comptrollers) and organizations must undergo a transformation. Since "numbers" are conventions (i.e., social constructs), any change in the ways of producing them implies changing the social systems on which they act.
Abstract:	L'arrivée de l'ensemble de technologies qualifiées parfois « d'informatique cognitive », d'IA, de big data… (Shivam et al., 2018) pourrait bouleverser l'approche de la performance des entreprises. Plus qu'une nouvelle manière d'analyser la performance (à travers l'apparition de nouveaux indicateurs), c'est un nouveau rapport aux données chiffrées que ces technologies induisent. Mais ces approches ne sont pas sans risque. Afin d'éviter les dangers de modèles algorithmiques qui seraient des boites noires et afin de répondre aux enjeux de schémas d'interprétation mis sous tension, c'est une transformation des métiers (et principalement de celui des contrôleurs de gestion) et des organisations qui est attendue. Les chiffres étant des conventions, des construits sociaux, toute transformation des modalités de production des chiffres implique des transformations des systèmes sociaux sur lesquels ils agissent.
Keywords:	big data,performance des entreprises
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03114008&r=all

KI in der Finanzbranche: Im Spannungsfeld zwischen technologischer Innovation und regulatorischer Anforderung

By:	Bauer, Kevin; Hinz, Oliver; Weber, Patrick
Abstract:	Die Künstliche Intelligenz (KI) gilt als Basistechnologie des 21. Jahrhunderts und führt, wenn auch in unterschiedlichen Geschwindigkeiten, zu drastischen Veränderungen in praktisch allen Industrien. Die Finanzbranche gehört dabei zu den Industrien, welche bereits heute mit am stärksten von diesen Umbrüchen betroffen sind. Unter anderem wird das klassische, relationale Bankgeschäft, aber auch das klassische Investmentgeschäft zunehmend durch KI-basierte Anwendungen verdrängt. Dabei befindet sich die Branche in einem komplexen Spannungsfeld zwischen den regulatorischen Anforderungen des Datenschutzes und dem Recht auf Information der Marktteilnehmer auf der einen (bspw. durch die DSGVO) und dem technologischen Innovationsdruck auf der anderen Seite. Dies führt dazu, dass eine Reihe von Besonderheiten bei der Konzeption, Entwicklung und Integration von KI-Anwendungen beachtet werden muss. Das vorliegende Whitepaper bietet eine Übersicht über den aktuellen Stand, Trends und die Potenziale von KI-Technologien in der Finanzbranche. Dabei wird ein besonderes Augenmerk auf mögliche Problemstellungen und Herausforderungen für Regulatorik gelegt, insbesondere die mit komplexen KI-Anwendungen verbundene Black-Box Problematik. Vor diesem Hintergrund wird die Notwendigkeit einer stärkeren Fokussierung auf eXplainable Artificial Intelligence (XAI) betont, die eine große Chance darstellt potentielle gravierende Probleme heutiger KI-Anwendungen zu beheben und gleichzeitig die Vorteile zu bewahren.
Keywords:	Künstliche Intelligenz,Regulatorik,Finanzbranche,eXplainable Artificial Intelligence,Machine Learning
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:safewh:80&r=all

Stratégie & Intelligence artificielle

By:	Henri Isaac (DRM - Dauphine Recherches en Management - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CNRS - Centre National de la Recherche Scientifique)
Abstract:	Given the rapid development over the past decade of methods qualified as "artificial intelligence" (AI), questions arise about how these methods might fit into a firm's strategies, or even replace them. This view overlooks the aspects of strategy-making that are marked with a high degree of uncertainty and many an ambiguity. The limitation inherent in building tools for decision-making that massively rely on sets of data restricts somewhat the possibility of this happening. Although it is unlikely that AI will some day steer a firm's strategic decisions, its use in corporate strategies is already a reality that is modifying the architecture of resources and qualifications within firms. This new architecture of the creation of value requires an internal reorganization for it to be deployed in business process strategies. Given the nature of the decisions automated by AI, it is imperative for firms to set up a body of governance that will define the doctrine for using such a technology.
Abstract:	Le développement rapide depuis une décennie des méthodes dites d'intelligence artificielle (IA) a contribué à interroger la possibilité que celles-ci puissent participer à la décision stratégique d'une entreprise, voire s'y substituer totalement. Une telle vision méconnaît les particularités de la décision stratégique en entreprise, caractérisée par un haut degré d'incertitude et de nombreuses ambiguïtés. Les limites consubstantielles à la construction de tels outils de décision, reposant massivement sur des jeux de données, obèrent quelque peu la possibilité d'une telle éventualité. S'il est donc peu probable qu'une IA quelconque pilote un jour les décisions stratégiques d'une entreprise, son utilisation dans les stratégies d'entreprise en revanche est déjà une réalité qui modifie l'architecture des ressources et des compétences au sein de l'entreprise. Cette nouvelle architecture de la création de valeur nécessite donc des réorganisations internes pour la déployer au sein des stratégies métiers. Par la nature des décisions que l'IA automatise, il devient impératif pour les entreprises de se doter d'un organe de gouvernance définissant la doctrine d'usage de telles technologies.
Keywords:	prise de décision stratégique,gestion de l'innovation,machine learning,changement organisationnel
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03068380&r=all

The Gender Pay Gap Revisited with Big Data: Do Methodological Choices Matter?

By:	Anthony Strittmatter; Conny Wunsch
Abstract:	The vast majority of existing studies that estimate the average unexplained gender pay gap use unnecessarily restrictive linear versions of the Blinder-Oaxaca decomposition. Using a notably rich and large data set of 1.7 million employees in Switzerland, we investigate how the methodological improvements made possible by such big data affect estimates of the unexplained gender pay gap. We study the sensitivity of the estimates with regard to i) the availability of observationally comparable men and women, ii) model flexibility when controlling for wage determinants, and iii) the choice of different parametric and semi-parametric estimators, including variants that make use of machine learning methods. We find that these three factors matter greatly. Blinder-Oaxaca estimates of the unexplained gender pay gap decline by up to 39\% when we enforce comparability between men and women and use a more flexible specification of the wage equation. Semi-parametric matching yields estimates that when compared with the Blinder-Oaxaca estimates, are up to 50\% smaller and also less sensitive to the way wage determinants are included.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.09207&r=all

The corruptive force of AI-generated advice

By:	Margarita Leib; Nils C. K\"obis; Rainer Michael Rilke; Marloes Hagens; Bernd Irlenbusch
Abstract:	Artificial Intelligence (AI) is increasingly becoming a trusted advisor in people's lives. A new concern arises if AI persuades people to break ethical rules for profit. Employing a large-scale behavioural experiment (N = 1,572), we test whether AI-generated advice can corrupt people. We further test whether transparency about AI presence, a commonly proposed policy, mitigates potential harm of AI-generated advice. Using the Natural Language Processing algorithm, GPT-2, we generated honesty-promoting and dishonesty-promoting advice. Participants read one type of advice before engaging in a task in which they could lie for profit. Testing human behaviour in interaction with actual AI outputs, we provide first behavioural insights into the role of AI as an advisor. Results reveal that AI-generated advice corrupts people, even when they know the source of the advice. In fact, AI's corrupting force is as strong as humans'.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.07536&r=all

Uncertainty and Forecastability of Regional Output Growth in the United Kingdom: Evidence from Machine Learning

By:	Mehmet Balcilar (Eastern Mediterranean University, Famagusta, via Mersin 10, Northern Cyprus, Turkey); David Gabauer (Data Analysis Systems, Software Competence Center Hagenberg, Austria); Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield, 0028, South Africa); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany)
Abstract:	Utilizing a machine-learning technique known as random forests, we study whether regional output growth uncertainty helps to improve the accuracy of forecasts of regional output growth for twelve regions of the United Kingdom using monthly data for the period from 1970 to 2020. We use a stochastic-volatility model to measure regional output growth uncertainty. We document the importance of interregional stochastic volatility spillovers and the direction of the transmission mechanism. Given this, our empirical results shed light on the contribution to forecast performance of own uncertainty associated with a particular region, output growth uncertainty of other regions, and output growth uncertainty as measured for London as well. We find that output growth uncertainty significantly improves forecast performance in several cases, where we also document cross-regional heterogeneity in this regard.
Keywords:	Regional Output Growth, Uncertainty, United Kingdom, Forecasting, Machine Learning
JEL:	C22 C53 D8 E32 E37 R11
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202111&r=all

The Future of Healthcare around the World: Four indices integrating Technology, Productivity, Anti-Corruption, Healthcare and Market Financialization

By:	Julia M. Puaschunder (The New School, USA); Dirk Beerbaum (Frankfurt School of Finance and Management, Frankfurt am Main)
Abstract:	The currently ongoing COVID-19 crisis has challenged healthcare around the world. The global solution against global pandemic spreads but also to provide essential healthcare is likely to feature components of technological advancement and economic productivity as a starting ground for vital solution finding. Anti-corruption is a necessary prerequisite for access to and quality of healthcare provision in the public sphere. Market innovation financialization of a society raises private sector funds for research and development but also funds the market-oriented implementation of healthcare, which appears beneficial and efficient in combating future healthcare crises. Technology-driven growth, corruption free-healthcare and well-funded markets fostering innovation account for the most prospective public and private sector remedies of the global COVID-19 crisis. These ingredients differ vastly around the world. This paper innovatively combines the mentioned facets in four indices. Highlighting international differences in economic starting positions as well as public and private sector healthcare provision potential around the world serves as indicator where in the world global pandemic medical solutions may thrive in the future. Reflecting the different pandemic crisis alleviation ingredients concurrently allows to capture unknown interaction effects. Pegging remedy credentials to certain regions of the world also holds invaluable insights on what territories of the world should take the lead in different sectors when bundling our common world efforts to overcome the COVID-19 pandemic together. Index 1 highlights the connectedness of Artificial Intelligence (AI) â€“ as operationalized by internet connectivity â€“ with economic productivity â€“ measured in Gross Domestic Products (GDP) â€“ around the world. Index 2 captures the degree of anti-corruption in its relation with a strong public healthcare sector over an entire world sample. Index 3 integrates internet connectivity with anti-corruption and promising healthcare internationally. Index 4 shows the impact of internet connectivity, GDP, anti-corruption, healthcare in light of market capitalization prospects with special attention to technological innovations in the digital age. In its entirety, the four indices highlight different facets of the future of medical care in order to bundle our common efforts strategically in overcoming COVID-19 and thriving in a healthier and more digitalized world to come.
Keywords:	Access to healthcare, Advancements, AI-GDP Index, Apps, Artificial Intelligence (AI), Coronavirus, Corruption-free maximization of excellence and precision, Corruption Perception
Date:	2020–08
URL:	http://d.repec.org/n?u=RePEc:smo:apaper:021jpmd&r=all

A Core of E-Commerce Customer Experience based on Conversational Data using Network Text Methodology

By:	Andry Alamsyah; Nurlisa Laksmiani; Lies Anisa Rahimi
Abstract:	E-commerce provides an efficient and effective way to exchange goods between sellers and customers. E-commerce has been a popular method for doing business, because of its simplicity of having commerce activity transparently available, including customer voice and opinion about their own experience. Those experiences can be a great benefit to understand customer experience comprehensively, both for sellers and future customers. This paper applies to e-commerces and customers in Indonesia. Many Indonesian customers expressed their voice to open social network services such as Twitter and Facebook, where a large proportion of data is in the form of conversational data. By understanding customer behavior through open social network service, we can have descriptions about the e-commerce services level in Indonesia. Thus, it is related to the government's effort to improve the Indonesian digital economy ecosystem. A method for finding core topics in large-scale internet unstructured text data is needed, where the method should be fast but sufficiently accurate. Processing large-scale data is not a straightforward job, it often needs special skills of people and complex software and hardware computer system. We propose a fast methodology of text mining methods based on frequently appeared words and their word association to form network text methodology. This method is adapted from Social Network Analysis by the model relationships between words instead of actors.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.09107&r=all

Understanding algorithmic collusion with experience replay

By:	Bingyan Han
Abstract:	In an infinitely repeated pricing game, pricing algorithms based on artificial intelligence (Q-learning) may consistently learn to charge supra-competitive prices even without communication. Although concerns on algorithmic collusion have arisen, little is known on underlying factors. In this work, we experimentally analyze the dynamics of algorithms with three variants of experience replay. Algorithmic collusion still has roots in human preferences. Randomizing experience yields prices close to the static Bertrand equilibrium and higher prices are easily restored by favoring the latest experience. Moreover, relative performance concerns also stabilize the collusion. Finally, we investigate the scenarios with heterogeneous agents and test robustness on various factors.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.09139&r=all

Use of Big Data in Transport Modelling

By:	Luis Willumsen (Nommon Solutions and Technologies)
Abstract:	This paper guides transport planners in making the best use of mobile phone traces, derived either from mobile network data or from smartphone app data. It suggests combining such new data sources with conventional travel surveys whose sample size and cost could ultimately be reduced. In the context of a rapidly evolving mobility landscape, with new modes and new services available, big data can help monitor behaviour change, learn from quasi-experiments and develop next-generation travel demand modelling tools.
Date:	2021–01–27
URL:	http://d.repec.org/n?u=RePEc:oec:itfaab:2021/05-en&r=all

A data-driven approach to measuring epidemiological susceptibility risk around the world

By:	Alessandro Bitetto (University of Pavia); Paola Cerchiello (University of Pavia); Charilaos Mertzanis (University of Pavia)
Abstract:	Epidemic outbreaks are extreme events that become less rare and more severe. They are associated with large social and economic costs. It is therefore important to evaluate whether countries are prepared to manage epidemiological risks. We use a fully data-driven approach to measure epidemiological susceptibility risk at the country level using time-varying and regularly reproduced information that captures the role of demographics, infrastructure, governance and economic activity conditions. Given the nature of the problem, we choose both principal component analysis (PCA) and dynamic factor model (DFM) to deal with the presence of strong cross-section dependence in the data due to unobserved common factors. We conduct extensive in-sample model evaluations of 168 countries covering 17 indicators for the 2010-2019 period. The results show that the robust PCA method accounts for about 90% of total variability, whilst the DFM accounts for about 76% of the total variability. Our framework and index could therefore provide the basis for developing risk assessments of epidemiological risk contagion after the outbreak of an epidemic but also for ongoing monitoring of its spread and social and economic effects. It could be also used by firms to assess likely economic consequences of epidemics with useful managerial implication.
Keywords:	Innovative Applications of O.R., Epidemiological risk, Data-driven, Cross-country, Policy framework, Principal Component Analysis, Dynamic Factor Model, Machine learning
JEL:	I18 C55 C38 F68
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:pav:demwpp:demwp0200&r=all

Multi-Horizon Equity Returns Predictability via Machine Learning

By:	Lenka Nechvatalova (Institute of Economic Studies, Charles University and Institute of Information Theory and Automation, Czech Academy of Sciences Prague, Czech Republic)
Abstract:	We examine the predictability of expected stock returns across horizons using machine learning. We use neural networks, and gradient boosted regression trees on the U.S. and international equity datasets. We find that predictability of returns using neural networks models decreases with longer forecasting horizon. We also document the profitability of long-short portfolios, which were created using predictions of cumulative returns at various horizons, before and after accounting for transaction costs. There is a trade-off between higher transaction costs connected to frequent rebalancing and greater returns on shorter horizons. However, we show that increasing the forecasting horizon while matching the rebalancing period increases risk-adjusted returns after transaction cost for the U.S. We combine predictions of expected returns at multiple horizons using double-sorting and buy/hold spread, a turnover reducing strategy. Using double sorts significantly increases profitability on the U.S. sample. Buy/hold spread portfolios have better risk-adjusted profitability in the U.S.
Keywords:	Machine learning, asset pricing, horizon predictability, anomalies
JEL:	G11 G12 G15 C55
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:fau:wpaper:wp2021_02&r=all

Deep Learning for Market by Order Data

By:	Zihao Zhang; Bryan Lim; Stefan Zohren
Abstract:	Market by order (MBO) data - a detailed feed of individual trade instructions for a given stock on an exchange - is arguably one of the most granular sources of microstructure information. While limit order books (LOBs) are implicitly derived from it, MBO data is largely neglected by current academic literature which focuses primarily on LOB modelling. In this paper, we demonstrate the utility of MBO data for forecasting high-frequency price movements, providing an orthogonal source of information to LOB snapshots. We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme to consider level information in order books and to allow model training with multiple instruments. Through forecasting experiments using deep neural networks, we show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy -- indicating that MBO data is additive to LOB-based features.
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2102.08811&r=all

Artificial Intelligence, Robotics, Work and Productivity: The Role of Firm Heterogeneity

By:	Heyman, Fredrik (Research Institute of Industrial Economics (IFN)); Norbäck, Pehr-Johan (Research Institute of Industrial Economics (IFN)); Persson, Lars (Research Institute of Industrial Economics (IFN))
Abstract:	We propose a model with asymmetric firms where new technologies displace workers. We show that both leading (low-cost) firms and laggard (high-cost) firms increase productivity when automating but that only laggard firms hire more automation-susceptible workers. The reason for this asymmetry is that in laggard firms, the lower incentive to invest in new technologies implies a weaker displacement effect and thus that the output-expansion effect on labor demand dominates. Using novel firm-level automation workforce probabilities, which reveal the extent to which a firms’ workforce can be replaced by new AI and robotic technology and a new shiftshare instrument to address endogeneity, we find strong empirical evidence for these predictions in Swedish matched employer-employee data.
Keywords:	AI&R Technology; Automation; Job displacement; Firm Heterogeneity; Matched employer-employee data
JEL:	J70 L20 M50
Date:	2021–02–09
URL:	http://d.repec.org/n?u=RePEc:hhs:iuiwop:1382&r=all

Common pool resource management and risk perceptions

By:	Can Askan Mavi (University of Luxembourg); Nicolas Querou (Universite Montpellier, CEEM)
Abstract:	Motivated by recent discussions about the issue of risk perceptions for climate change related events, we introduce a non-cooperative game setting where agents manage a common pool resource under a potential risk, and agents exhibit different risk perception biases. Focusing on the effect of the polarization level and other population features, we show that the type of bias (overestimation versus underestimation biases) and the resource quality level before and after the occurrence of the shift have first-order importance on the qualitative nature of behavioral adjustments and on the pattern of resource conservation. When there are non-uniform biases within the population, the intra-group structure of the population qualitatively affects the degree of resource conservation. Moreover, unbiased agents may react in nonmonotone ways to changes in the polarization level when faced with agents exhibiting different types of bias. The size of the unbiased agentsâ€™ sub-population does not qualitatively affect how an increase in the polarization level impacts individual behavioral adjustments, even though it affects the magnitude of this change. Finally, it is shown how perception biases affect the comparison between centralized and decentralized management.
Keywords:	Perception bias, , , ,
JEL:	Q20
Date:	2021–02
URL:	http://d.repec.org/n?u=RePEc:fae:wpaper:2021.02&r=all

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.