nep-big 2021-06-28 papers

on Big Data

Issue of 2021‒06‒28
thirty-two papers chosen by
Tom Coupé
University of Canterbury

Artificial Intelligence, Ethics, and Diffused Pivotality By Victor Klockmann; Alicia von Schenk; Marie Villeval
Next-Day Bitcoin Price Forecast Based on Artificial intelligence Methods By Liping Yang
Machine Learning in U.S. Bank Merger Prediction: A Text-Based Approach By Katsafados, Apostolos G.; Leledakis, George N.; Pyrgiotakis, Emmanouil G.; Androutsopoulos, Ion; Fergadiotis, Manos
Using machine learning to predict patent lawsuits By Juranek, Steffen; Otneim, Håkon
A News-based Machine Learning Model for Adaptive Asset Pricing By Liao Zhu; Haoxuan Wu; Martin T. Wells
Adversarial Attacks on Deep Models for Financial Transaction Records By Ivan Fursov; Matvey Morozov; Nina Kaploukhaya; Elizaveta Kovtun; Rodrigo Rivera-Castro; Gleb Gusev; Dmitry Babaev; Ivan Kireev; Alexey Zaytsev; Evgeny Burnaev
Constrained Classification and Policy Learning By Toru Kitagawa; Shosei Sakaguchi; Aleksey Tetenov
Environnement big data et prise de décision : maintien de l'ordre durant un évènement sportif d'ampleur By Jordan Vazquez; Cécile Godé; Jean-Fabrice Lebraty
Generative Adversarial Networks in finance: an overview By Florian Eckerli; Joerg Osterrieder
Learning Multiple Stock Trading Patterns with Temporal Routing Adaptor and Optimal Transport By Hengxu Lin; Dong Zhou; Weiqing Liu; Jiang Bian
3D Tensor-based Deep Learning Models for Predicting Option Price By Muyang Ge; Shen Zhou; Shijun Luo; Boping Tian
Artificial Intelligence, Ethics, and Intergenerational Responsibility By Victor Klockmann; Alicia von Schenk; Marie Villeval
Fund2Vec: Mutual Funds Similarity using Graph Learning By Vipul Satone; Dhruv Desai; Dhagash Mehta
Measuring the AI content of government-funded R&D projects: A proof of concept for the OECD Fundstat initiative By Izumi Yamashita; Akiyoshi Murakami; Stephanie Cairns; Fernando Galindo-Rueda
AI Watch - National strategies on Artificial Intelligence: A European perspective, 2021 edition By Vincent Van Roy; Fiammetta Rossetti; Karine Perset; Laura Galindo-Romero
Design and Analysis of Robust Deep Learning Models for Stock Price Prediction By Jaydip Sen; Sidra Mehtab
Active labour market policies for the long-term unemployed: New evidence from causal machine learning By Goller, Daniel; Harrer, Tamara; Lechner, Michael; Wolff, Joachim
Efficient Black-Box Importance Sampling for VaR and CVaR Estimation By Anand Deo; Karthyek Murthy
Stock Market Analysis with Text Data: A Review By Kamaladdin Fataliyev; Aneesh Chivukula; Mukesh Prasad; Wei Liu
Tools for trustworthy AI: A framework to compare implementation tools for trustworthy AI systems By OECD
Active labour market policies for the long-term unemployed: New evidence from causal machine learning By Daniel Goller; Tamara Harrer; Michael Lechner; Joachim Wolff
A Two-Step Framework for Arbitrage-Free Prediction of the Implied Volatility Surface By Wenyong Zhang; Lingfei Li; Gongqiu Zhang
The link between Bitcoin and Google Trends attention By Nektarios Aslanidis; Aurelio F. Bariviera; \'Oscar G. L\'opez
Using the Eye of the Storm to Predict the Wave of Covid-19 UI Claims By Daniel Aaronson; Scott A. Brave; R. Andrew Butters; Daniel Sacks; Boyoung Seo
Unbiased Self-Play By Shohei Ohsawa
Citizen-Generated Data and Official Statistics: an application to SDG indicators By Monica Pratesi; Claudio Ceccarelli; Stefano Menghinello
Alternative Microfoundations for Strategic Classification By Meena Jagadeesan; Celestine Mendler-D\"unner; Moritz Hardt
The gig economy in Poland: evidence based on mobile big data By Ber\k{e}sewicz Maciej; Nikulin Dagmara; Szymkowiak Marcin; Wilak Kamil
Credit spread approximation and improvement using random forest regression By Mathieu Mercadier; Jean-Pierre Lardy
The Data Privacy Paradox and Digital Demand By Long Chen; Yadong Huang; Shumiao Ouyang; Wei Xiong
Competition in Pricing Algorithms By Zach Y. Brown; Alexander MacKay
Influencers on Economic Issues in Latin America, Spain and the United States – II By Newland, Carlos; Rosiello, Juan Carlos; Salinas, Roberto

Artificial Intelligence, Ethics, and Diffused Pivotality

By:	Victor Klockmann (Goethe-University Frankfurt am Main, Max Planck Institute for Human Development - Max-Planck-Gesellschaft); Alicia von Schenk (Goethe-University Frankfurt am Main, Max Planck Institute for Human Development - Max-Planck-Gesellschaft); Marie Villeval (GATE Lyon Saint-Étienne - Groupe d'analyse et de théorie économique - CNRS - Centre National de la Recherche Scientifique - Université de Lyon - UJM - Université Jean Monnet [Saint-Étienne] - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - UL2 - Université Lumière - Lyon 2 - ENS Lyon - École normale supérieure - Lyon, IZA - Forschungsinstitut zur Zukunft der Arbeit - Institute of Labor Economics)
Abstract:	With Big Data, decisions made by machine learning algorithms depend on training data generated by many individuals. In an experiment, we identify the effect of varying individual responsibility for moral choices of an artificially intelligent algorithm. Across treatments, we manipulated the sources of training data and thus the impact of each individual's decisions on the algorithm. Reducing or diffusing pivotality for algorithmic choices increased the share of selfish decisions. Once the generated training data exclusively affected others' payoffs, individuals opted for more egalitarian payoff allocations. These results suggest that Big Data offers a "moral wiggle room" for selfish behavior.
Keywords:	Artificial Intelligence,Pivotality,Ethics,Externalities,Experiment
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:halshs-03237453&r=

Next-Day Bitcoin Price Forecast Based on Artificial intelligence Methods

By:	Liping Yang
Abstract:	In recent years, Bitcoin price prediction has attracted the interest of researchers and investors. However, the accuracy of previous studies is not well enough. Machine learning and deep learning methods have been proved to have strong prediction ability in this area. This paper proposed a method combined with Ensemble Empirical Mode Decomposition (EEMD) and a deep learning method called long short-term memory (LSTM) to research the problem of next-day Bitcoin price forecast.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12961&r=

Machine Learning in U.S. Bank Merger Prediction: A Text-Based Approach

By:	Katsafados, Apostolos G.; Leledakis, George N.; Pyrgiotakis, Emmanouil G.; Androutsopoulos, Ion; Fergadiotis, Manos
Abstract:	This paper investigates the role of textual information in a U.S. bank merger prediction task. Our intuition behind this approach is that text could reduce bank opacity and allow us to understand better the strategic options of banking firms. We retrieve textual information from bank annual reports using a sample of 9,207 U.S. bank-year observations during the period 1994-2016. To predict bidders and targets, we use textual information along with financial variables as inputs to several machine learning models. Our key findings suggest that: (1) when textual information is used as a single type of input, the predictive accuracy of our models is similar, or even better, compared to the models using only financial variables as inputs, and (2) when we jointly use textual information and financial variables as inputs, the predictive accuracy of our models is substantially improved compared to models using a single type of input. Therefore, our findings highlight the importance of textual information in a bank merger prediction task.
Keywords:	Bank merger prediction; Textual analysis; Natural language processing; Machine learning
JEL:	C38 C45 G1 G2 G21 G3 G34
Date:	2021–06–12
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:108272&r=

Using machine learning to predict patent lawsuits

By:	Juranek, Steffen (Dept. of Business and Management Science, Norwegian School of Economics); Otneim, Håkon (Dept. of Business and Management Science, Norwegian School of Economics)
Abstract:	We use machine learning methods to predict which patents end up at court using the population of US patents granted between 2002 and 2005. We analyze the role of the different dimensions of an empirical analysis for the performance of the prediction - the number of observations, the number of patent characteristics and the model choice. We find that the extending the set of patent characteristics has the biggest impact on the prediction performance. Small samples have not only a low predictive performance, their predictions are also particularly unstable. However, only samples of intermediate size are required for reasonably stable performance. The model choice matters, too, more sophisticated machine learning methods can provide additional value to a simple logistic regression. Our results provide practical advice to everyone building patent litigation models, e.g., for litigation insurance or patent management in more general.
Keywords:	Patents; litigation; prediction; machine learning
JEL:	K00 K41 O34
Date:	2021–06–22
URL:	http://d.repec.org/n?u=RePEc:hhs:nhhfms:2021_006&r=

A News-based Machine Learning Model for Adaptive Asset Pricing

By:	Liao Zhu; Haoxuan Wu; Martin T. Wells
Abstract:	The paper proposes a new asset pricing model -- the News Embedding UMAP Selection (NEUS) model, to explain and predict the stock returns based on the financial news. Using a combination of various machine learning algorithms, we first derive a company embedding vector for each basis asset from the financial news. Then we obtain a collection of the basis assets based on their company embedding. After that for each stock, we select the basis assets to explain and predict the stock return with high-dimensional statistical methods. The new model is shown to have a significantly better fitting and prediction power than the Fama-French 5-factor model.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.07103&r=

Adversarial Attacks on Deep Models for Financial Transaction Records

By:	Ivan Fursov; Matvey Morozov; Nina Kaploukhaya; Elizaveta Kovtun; Rodrigo Rivera-Castro; Gleb Gusev; Dmitry Babaev; Ivan Kireev; Alexey Zaytsev; Evgeny Burnaev
Abstract:	Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the model's output. In this work, we examine adversarial attacks on transaction records data and defences from these attacks. The transaction records data have a different structure than the canonical NLP or time series data, as neighbouring records are less connected than words in sentences, and each record consists of both discrete merchant code and continuous transaction amount. We consider a black-box attack scenario, where the attack doesn't know the true decision model, and pay special attention to adding transaction tokens to the end of a sequence. These limitations provide more realistic scenario, previously unexplored in NLP world. The proposed adversarial attacks and the respective defences demonstrate remarkable performance using relevant datasets from the financial industry. Our results show that a couple of generated transactions are sufficient to fool a deep-learning model. Further, we improve model robustness via adversarial training or separate adversarial examples detection. This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.08361&r=

Constrained Classification and Policy Learning

By:	Toru Kitagawa; Shosei Sakaguchi; Aleksey Tetenov
Abstract:	Modern machine learning approaches to classification, including AdaBoost, support vector machines, and deep neural networks, utilize surrogate loss techniques to circumvent the computational complexity of minimizing empirical classification risk. These techniques are also useful for causal policy learning problems, since estimation of individualized treatment rules can be cast as a weighted (cost-sensitive) classification problem. Consistency of the surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006) crucially relies on the assumption of correct specification, meaning that the specified set of classifiers is rich enough to contain a first-best classifier. This assumption is, however, less credible when the set of classifiers is constrained by interpretability or fairness, leaving the applicability of surrogate loss based algorithms unknown in such second-best scenarios. This paper studies consistency of surrogate loss procedures under a constrained set of classifiers without assuming correct specification. We show that in the setting where the constraint restricts the classifier's prediction set only, hinge losses (i.e., $\ell_1$-support vector machines) are the only surrogate losses that preserve consistency in second-best scenarios. If the constraint additionally restricts the functional form of the classifier, consistency of a surrogate loss approach is not guaranteed even with hinge loss. We therefore characterize conditions for the constrained set of classifiers that can guarantee consistency of hinge risk minimizing classifiers. Exploiting our theoretical results, we develop robust and computationally attractive hinge loss based procedures for a monotone classification problem.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12886&r=

Environnement big data et prise de décision : maintien de l'ordre durant un évènement sportif d'ampleur

By:	Jordan Vazquez; Cécile Godé (CRET-LOG - Centre de Recherche sur le Transport et la Logistique - AMU - Aix Marseille Université); Jean-Fabrice Lebraty
Abstract:	Les effectifs de la Police nationale française rencontrent fréquemment des situations inattendues qui imposent des prises de décisions rapides (Godé et Vazquez, 2017). Les environnements big data sont susceptibles d'affecter le processus de prise de décision des policiers. La question que nous posons ici est la suivante : Comment les experts de la sécurité publique prennent-ils des décisions en environnement big data ?. Cette recherche s'intéresse à un évènement en particulier : l'étape de contre-la-montre du Tour de France 2017. La ville de Marseille a accueilli le 21 juillet 2017 les coureurs du Tour de France pour une étape de contrela-montre : jusqu'à 300 000 personnes étaient attendues pour l'évènement. Afin de coordonner les patrouilles de police et les différentes compagnies de C.R.S. sur le terrain, les équipes du Centre d'Information et de Commandement (C.I.C.) de la Police de Marseille ont pu s'appuyer sur de nombreuses technologies qui constituaient leur environnement big data. Cet environnement big data permet aux décideurs de repérer des situations en contexte changeant, de réévaluer des situations non familières et d'envisager des solutions de retrait pour sécuriser les actions des équipes sur le terrain.
Keywords:	décision,intuition,environnement big data,évènements inattendus,Police nationale,Tour de France 2017,maintien de l'ordre
Date:	2021–06–09
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03252399&r=

Generative Adversarial Networks in finance: an overview

By:	Florian Eckerli; Joerg Osterrieder
Abstract:	Modelling in finance is a challenging task: the data often has complex statistical properties and its inner workings are largely unknown. Deep learning algorithms are making progress in the field of data-driven modelling, but the lack of sufficient data to train these models is currently holding back several new applications. Generative Adversarial Networks (GANs) are a neural network architecture family that has achieved good results in image generation and is being successfully applied to generate time series and other types of financial data. The purpose of this study is to present an overview of how these GANs work, their capabilities and limitations in the current state of research with financial data, and present some practical applications in the industry. As a proof of concept, three known GAN architectures were tested on financial time series, and the generated data was evaluated on its statistical properties, yielding solid results. Finally, it was shown that GANs have made considerable progress in their finance applications and can be a solid additional tool for data scientists in this field.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.06364&r=

Learning Multiple Stock Trading Patterns with Temporal Routing Adaptor and Optimal Transport

By:	Hengxu Lin; Dong Zhou; Weiqing Liu; Jiang Bian
Abstract:	Successful quantitative investment usually relies on precise predictions of the future movement of the stock price. Recently, machine learning based solutions have shown their capacity to give more accurate stock prediction and become indispensable components in modern quantitative investment systems. However, the i.i.d. assumption behind existing methods is inconsistent with the existence of diverse trading patterns in the stock market, which inevitably limits their ability to achieve better stock prediction performance. In this paper, we propose a novel architecture, Temporal Routing Adaptor (TRA), to empower existing stock prediction models with the ability to model multiple stock trading patterns. Essentially, TRA is a lightweight module that consists of a set of independent predictors for learning multiple patterns as well as a router to dispatch samples to different predictors. Nevertheless, the lack of explicit pattern identifiers makes it quite challenging to train an effective TRA-based model. To tackle this challenge, we further design a learning algorithm based on Optimal Transport (OT) to obtain the optimal sample to predictor assignment and effectively optimize the router with such assignment through an auxiliary loss term. Experiments on the real-world stock ranking task show that compared to the state-of-the-art baselines, e.g., Attention LSTM and Transformer, the proposed method can improve information coefficient (IC) from 0.053 to 0.059 and 0.051 to 0.056 respectively. Our dataset and code used in this work are publicly available: https://github.com/microsoft/qlib.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12950&r=

3D Tensor-based Deep Learning Models for Predicting Option Price

By:	Muyang Ge; Shen Zhou; Shijun Luo; Boping Tian
Abstract:	Option pricing is a significant problem for option risk management and trading. In this article, we utilize a framework to present financial data from different sources. The data is processed and represented in a form of 2D tensors in three channels. Furthermore, we propose two deep learning models that can deal with 3D tensor data. Experiments performed on the Chinese market option dataset prove the practicability of the proposed strategies over commonly used ways, including B-S model and vector-based LSTM.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.02916&r=

Artificial Intelligence, Ethics, and Intergenerational Responsibility

By:	Victor Klockmann (Goethe-University Frankfurt am Main, Max Planck Institute for Human Development - Max-Planck-Gesellschaft); Alicia von Schenk (Goethe-University Frankfurt am Main, Max Planck Institute for Human Development - Max-Planck-Gesellschaft); Marie Villeval (GATE Lyon Saint-Étienne - Groupe d'analyse et de théorie économique - CNRS - Centre National de la Recherche Scientifique - Université de Lyon - UJM - Université Jean Monnet [Saint-Étienne] - UCBL - Université Claude Bernard Lyon 1 - Université de Lyon - UL2 - Université Lumière - Lyon 2 - ENS Lyon - École normale supérieure - Lyon, IZA - Forschungsinstitut zur Zukunft der Arbeit - Institute of Labor Economics)
Abstract:	Humans shape the behavior of artificially intelligent algorithms. One mechanism is the training these systems receive through the passive observation of human behavior and the data we constantly generate. In a laboratory experiment with a sequence of dictator games, we let participants' choices train an algorithm. Thereby, they create an externality on future decision making of an intelligent system that affects future participants. We test how information on training artificial intelligence affects the prosociality and selfishness of human behavior. We find that making individuals aware of the consequences of their training on the well-being of future generations changes behavior, but only when individuals bear the risk of being harmed themselves by future algorithmic choices. Only in that case, the externality of artificially intelligence training induces a significantly higher share of egalitarian decisions in the present.
Keywords:	Artificial Intelligence,Morality,Prosociality,Generations,Externalities
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:halshs-03237437&r=

Fund2Vec: Mutual Funds Similarity using Graph Learning

By:	Vipul Satone; Dhruv Desai; Dhagash Mehta
Abstract:	Identifying similar mutual funds with respect to the underlying portfolios has found many applications in financial services ranging from fund recommender systems, competitors analysis, portfolio analytics, marketing and sales, etc. The traditional methods are either qualitative, and hence prone to biases and often not reproducible, or, are known not to capture all the nuances (non-linearities) among the portfolios from the raw data. We propose a radically new approach to identify similar funds based on the weighted bipartite network representation of funds and their underlying assets data using a sophisticated machine learning method called Node2Vec which learns an embedded low-dimensional representation of the network. We call the embedding \emph{Fund2Vec}. Ours is the first ever study of the weighted bipartite network representation of the funds-assets network in its original form that identifies structural similarity among portfolios as opposed to merely portfolio overlaps.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12987&r=

Measuring the AI content of government-funded R&D projects: A proof of concept for the OECD Fundstat initiative

By:	Izumi Yamashita; Akiyoshi Murakami; Stephanie Cairns; Fernando Galindo-Rueda
Abstract:	This report presents the results of a proof of concept for a new analytical infrastructure (“Fundstat”) for analysing government funding of R&D at the project level, exploiting the wealth of text-based information about funded projects. Reflecting the growth in popularity of artificial intelligence (AI) and the OECD Council Recommendation on AI’s emphasis on R&D investment, the report focuses on analysing government investments into AI-related R&D. Using text mining tools, it documents the creation of a list of key terms used to identify AI-related R&D projects contained in 13 funding databases from eight OECD countries and the EU, provides estimates for the total number and volume of government R&D funding, and characterises their AI funding portfolio. The methods and findings developed in this study also serve as a prototype for a new distributed mechanism capable of measuring and analysing government R&D support across key OECD priority areas and topics.
Keywords:	artificial intelligence, government funding, research and development
Date:	2021–06–28
URL:	http://d.repec.org/n?u=RePEc:oec:stiaaa:2021/09-en&r=

AI Watch - National strategies on Artificial Intelligence: A European perspective, 2021 edition

By:	Vincent Van Roy (European Commission - JRC); Fiammetta Rossetti (European Commission - JRC); Karine Perset (Organisation for Economic Co-operation and Development (OECD)); Laura Galindo-Romero (Organisation for Economic Co-operation and Development (OECD))
Abstract:	Artificial intelligence (AI) is transforming the world in many aspects. It is essential for Europe to consider how to make the most of the opportunities from this transformation and to address its challenges. In 2018 the European Commission adopted the Coordinated Plan on Artificial Intelligence that was developed together with the Member States to maximise the impact of investments at European Union (EU) and national levels, and to encourage synergies and cooperation across the EU. One of the key actions towards these aims was an encouragement for the Member States to develop their national AI strategies.The review of national strategies is one of the tasks of AI Watch launched by the European Commission to support the implementation of the Coordinated Plan on Artificial Intelligence. Building on the 2020 AI Watch review of national strategies, this report presents an updated review of national AI strategies from the EU Member States, Norway and Switzerland. By June 2021, 20 Member States and Norway had published national AI strategies, while 7 Member States were in the final drafting phase. Since the 2020 release of the AI Watch report, additional Member States - i.e. Bulgaria, Hungary, Poland, Slovenia, and Spain - published strategies, while Cyprus, Finland and Germany have revised the initial strategies. This report provides an overview of national AI policies according to the following policy areas: Human capital, From the lab to the market, Networking, Regulation, and Infrastructure. These policy areas are consistent with the actions proposed in the Coordinated Plan on Artificial Intelligence and with the policy recommendations to governments contained in the OECD Recommendation on AI. The report also includes a section on AI policies to address societal challenges of the COVID-19 pandemic and climate change. The collection of AI policies is conducted jointly by the European Commission’s Joint Research Centre (JRC) and the OECD’s Science Technology and Innovation Directorate, while the analyses presented in this report are carried out by the JRC, with contributions from the OECD. Both institutions joined forces to ensure that the information supplied by AI Watch and the OECD AI Policy Observatory is harmonised, consistent and up to date. This report is based on the EC-OECD database of national AI policies, validated by Member States’ representatives, and it demonstrates the importance of working closely with relevant stakeholders to share lessons learned, good practices and challenges when shaping AI policies.
Keywords:	Industrial research and innovation, Financial and economic analysis, Digital Economy, ICT R&D and Innovation
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc122684&r=

Design and Analysis of Robust Deep Learning Models for Stock Price Prediction

By:	Jaydip Sen; Sidra Mehtab
Abstract:	Building predictive models for robust and accurate prediction of stock prices and stock price movement is a challenging research problem to solve. The well-known efficient market hypothesis believes in the impossibility of accurate prediction of future stock prices in an efficient stock market as the stock prices are assumed to be purely stochastic. However, numerous works proposed by researchers have demonstrated that it is possible to predict future stock prices with a high level of precision using sophisticated algorithms, model architectures, and the selection of appropriate variables in the models. This chapter proposes a collection of predictive regression models built on deep learning architecture for robust and precise prediction of the future prices of a stock listed in the diversified sectors in the National Stock Exchange (NSE) of India. The Metastock tool is used to download the historical stock prices over a period of two years (2013- 2014) at 5 minutes intervals. While the records for the first year are used to train the models, the testing is carried out using the remaining records. The design approaches of all the models and their performance results are presented in detail. The models are also compared based on their execution time and accuracy of prediction.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.09664&r=

Active labour market policies for the long-term unemployed: New evidence from causal machine learning

By:	Goller, Daniel; Harrer, Tamara; Lechner, Michael; Wolff, Joachim
Abstract:	We investigate the effectiveness of three different job-search and training programmes for German long-term unemployed persons. On the basis of an extensive administrative data set, we evaluated the effects of those programmes on various levels of aggregation using Causal Machine Learning. We found participants to benefit from the investigated programmes with placement services to be most effective. Effects are realised quickly and are long-lasting for any programme. While the effects are rather homogenous for men, we found differential effects for women in various characteristics. Women benefit in particular when local labour market conditions improve. Regarding the allocation mechanism of the unemployed to the different programmes, we found the observed allocation to be as effective as a random allocation. Therefore, we propose data-driven rules for the allocation of the unemployed to the respective labour market programmes that would improve the status-quo.
Keywords:	Policy evaluation, Modified Causal Forest (MCF), active labour market programmes, conditional average treatment effect (CATE)
JEL:	J08 J68
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:usg:econwp:2021:08&r=

Efficient Black-Box Importance Sampling for VaR and CVaR Estimation

By:	Anand Deo; Karthyek Murthy
Abstract:	This paper considers Importance Sampling (IS) for the estimation of tail risks of a loss defined in terms of a sophisticated object such as a machine learning feature map or a mixed integer linear optimisation formulation. Assuming only black-box access to the loss and the distribution of the underlying random vector, the paper presents an efficient IS algorithm for estimating the Value at Risk and Conditional Value at Risk. The key challenge in any IS procedure, namely, identifying an appropriate change-of-measure, is automated with a self-structuring IS transformation that learns and replicates the concentration properties of the conditional excess from less rare samples. The resulting estimators enjoy asymptotically optimal variance reduction when viewed in the logarithmic scale. Simulation experiments highlight the efficacy and practicality of the proposed scheme
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.10236&r=

Stock Market Analysis with Text Data: A Review

By:	Kamaladdin Fataliyev; Aneesh Chivukula; Mukesh Prasad; Wei Liu
Abstract:	Stock market movements are influenced by public and private information shared through news articles, company reports, and social media discussions. Analyzing these vast sources of data can give market participants an edge to make profit. However, the majority of the studies in the literature are based on traditional approaches that come short in analyzing unstructured, vast textual data. In this study, we provide a review on the immense amount of existing literature of text-based stock market analysis. We present input data types and cover main textual data sources and variations. Feature representation techniques are then presented. Then, we cover the analysis techniques and create a taxonomy of the main stock market forecast models. Importantly, we discuss representative work in each category of the taxonomy, analyzing their respective contributions. Finally, this paper shows the findings on unaddressed open problems and gives suggestions for future work. The aim of this study is to survey the main stock market analysis models, text representation techniques for financial market prediction, shortcomings of existing techniques, and propose promising directions for future research.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12985&r=

Tools for trustworthy AI: A framework to compare implementation tools for trustworthy AI systems

By:	OECD
Abstract:	As artificial intelligence (AI) advances across economies and societies, stakeholder communities are actively exploring how best to encourage the design, development, deployment and use of AI that is human-centred and trustworthy. This report presents a framework for comparing tools and practices to implement trustworthy AI systems as set out in the OECD AI Principles. The framework aims to help collect, structure and share information, knowledge and lessons learned to date on tools, practices and approaches for implementing trustworthy AI. As such, it provides a way to compare tools in different use contexts. The framework will serve as the basis for the development of an interactive, publicly available database on the OECD.AI Policy Observatory. This report informs ongoing OECD work towards helping policy makers and other stakeholders implement the OECD AI Principles in practice.
Date:	2021–06–28
URL:	http://d.repec.org/n?u=RePEc:oec:stiaab:312-en&r=

Active labour market policies for the long-term unemployed: New evidence from causal machine learning

By:	Daniel Goller; Tamara Harrer; Michael Lechner; Joachim Wolff
Abstract:	We investigate the effectiveness of three different job-search and training programmes for German long-term unemployed persons. On the basis of an extensive administrative data set, we evaluated the effects of those programmes on various levels of aggregation using Causal Machine Learning. We found participants to benefit from the investigated programmes with placement services to be most effective. Effects are realised quickly and are long-lasting for any programme. While the effects are rather homogenous for men, we found differential effects for women in various characteristics. Women benefit in particular when local labour market conditions improve. Regarding the allocation mechanism of the unemployed to the different programmes, we found the observed allocation to be as effective as a random allocation. Therefore, we propose data-driven rules for the allocation of the unemployed to the respective labour market programmes that would improve the status-quo.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.10141&r=

A Two-Step Framework for Arbitrage-Free Prediction of the Implied Volatility Surface

By:	Wenyong Zhang; Lingfei Li; Gongqiu Zhang
Abstract:	We propose a two-step framework for predicting the implied volatility surface over time without static arbitrage. In the first step, we select features to represent the surface and predict them over time. In the second step, we use the predicted features to construct the implied volatility surface using a deep neural network (DNN) model by incorporating constraints that prevent static arbitrage. We consider three methods to extract features from the implied volatility data: principal component analysis, variational autoencoder and sampling the surface, and we predict these features using LSTM. Using a long time series of implied volatility data for S\&P500 index options to train our models, we find that sampling the surface with DNN for surface construction achieves the smallest error in out-of-sample prediction. Furthermore, the DNN model for surface construction not only removes static arbitrage, but also significantly reduces the prediction error compared with a standard interpolation method. Our framework can also be used to simulate the dynamics of the implied volatility surface without static arbitrage.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.07177&r=

The link between Bitcoin and Google Trends attention

By:	Nektarios Aslanidis; Aurelio F. Bariviera; \'Oscar G. L\'opez
Abstract:	This paper shows that Bitcoin is not correlated to a general uncertainty index as measured by the Google Trends data of Castelnuovo and Tran (2017). Instead, Bitcoin is linked to a Google Trends attention measure specific for the cryptocurrency market. First, we find a bidirectional relationship between Google Trends attention and Bitcoin returns up to six days. Second, information flows from Bitcoin volatility to Google Trends attention seem to be larger than information flows in the other direction. These relations hold across different sub-periods and different compositions of the proposed Google Trends Cryptocurrency index.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.07104&r=

Using the Eye of the Storm to Predict the Wave of Covid-19 UI Claims

By:	Daniel Aaronson; Scott A. Brave; R. Andrew Butters; Daniel Sacks; Boyoung Seo
Abstract:	We leverage an event-study research design focused on the seven costliest hurricanes to hit the US mainland since 2004 to identify the elasticity of unemployment insurance filings with respect to search intensity. Applying our elasticity estimate to the state-level Google Trends indexes for the topic “unemployment,” we show that out-of-sample forecasts made ahead of the official data releases for March 21 and 28 predicted to a large degree the extent of the Covid-19 related surge in the demand for unemployment insurance. In addition, we provide a robust assessment of the uncertainty surrounding these estimates and demonstrate their use within a broader forecasting framework for US economic activity.
Keywords:	unemployment insurance; Google Trends; hurricanes; search; unemployment; Covid-19 1 The
JEL:	C53 H12 J65
Date:	2020–04–08
URL:	http://d.repec.org/n?u=RePEc:fip:fedhwp:92754&r=

Unbiased Self-Play

By:	Shohei Ohsawa
Abstract:	We present a general optimization framework for emergent belief-state representation without any supervision. We employed the common configuration of multiagent reinforcement learning and communication to improve exploration coverage over an environment by leveraging the knowledge of each agent. In this paper, we obtained that recurrent neural nets (RNNs) with shared weights are highly biased in partially observable environments because of their noncooperativity. To address this, we designated an unbiased version of self-play via mechanism design, also known as reverse game theory, to clarify unbiased knowledge at the Bayesian Nash equilibrium. The key idea is to add imaginary rewards using the peer prediction mechanism, i.e., a mechanism for mutually criticizing information in a decentralized environment. Numerical analyses, including StarCraft exploration tasks with up to 20 agents and off-the-shelf RNNs, demonstrate the state-of-the-art performance.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.03007&r=

Citizen-Generated Data and Official Statistics: an application to SDG indicators

By:	Monica Pratesi; Claudio Ceccarelli; Stefano Menghinello
Abstract:	Official statistics are collected and produced by national statistical institutions (NSIs) based upon standardized questionnaire forms and a priori designed survey frame. Although the response to NSIs’ surveys is mandatory for respondent units, increasing disaffection in replying to official surveys is a common trend across many advanced countries. This work explores the possibility to use Citizen-Generated Data (CGD) as a new information source for the compilation of official statistics. CGD represent a unique and still unexploited data source that share some key characteristics with Big Data, while they present some specific features in terms of information relevance and data generating process. Given the relevance of CGD to reduce the information gap between the demand and supply of new or more robust Sustainable Development Goals (SDG) indicators, the experimental setting to assess the data quality of CGD refers to different ways to integrate official statistics and CGD. Istat collects CGD within the framework of a pilot survey focused on key SDG indicators, and the appropriate methodological approach to assess data quality for official statistics is defined according to different data integration modalities.
Keywords:	Citizen-Generated Data (CGD), National statistical Institutions (NSIs), Sustainable Development Goals (SDG), Official statistics (OS), Data Science, Latent variables models, civil society organizations (CSOs)
JEL:	C81 C83
Date:	2021–06–01
URL:	http://d.repec.org/n?u=RePEc:pie:dsedps:2021/274&r=

Alternative Microfoundations for Strategic Classification

By:	Meena Jagadeesan; Celestine Mendler-D\"unner; Moritz Hardt
Abstract:	When reasoning about strategic behavior in a machine learning context it is tempting to combine standard microfoundations of rational agents with the statistical decision theory underlying classification. In this work, we argue that a direct combination of these standard ingredients leads to brittle solution concepts of limited descriptive and prescriptive value. First, we show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule that we often do not observe empirically. Second, when any positive fraction of agents is not perfectly strategic, desirable stable points -- where the classifier is optimal for the data it entails -- cease to exist. Third, optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden within a broad class of possible assumptions about agent behavior. Recognizing these limitations we explore alternatives to standard microfoundations for binary classification. We start by describing a set of desiderata that help navigate the space of possible assumptions about how agents respond to a decision rule. In particular, we analyze a natural constraint on feature manipulations, and discuss properties that are sufficient to guarantee the robust existence of stable points. Building on these insights, we then propose the noisy response model. Inspired by smoothed analysis and empirical observations, noisy response incorporates imperfection in the agent responses, which we show mitigates the limitations of standard microfoundations. Our model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12705&r=

The gig economy in Poland: evidence based on mobile big data

By:	Ber\k{e}sewicz Maciej; Nikulin Dagmara; Szymkowiak Marcin; Wilak Kamil
Abstract:	In this article we address the question of how to measure the size and characteristics of the platform economy. We propose a~different, to sample surveys, approach based on smartphone data, which are passively collected through programmatic systems as part of online marketing. In particular, in our study we focus on two types of services: food delivery (Bolt Courier, Takeaway, Glover, Wolt and transport services (Bolt Driver, Free Now, iTaxi and Uber). Our results show that the platform economy in Poland is growing. In particular, with respect to food delivery and transportation services performed by means of applications, we observed a growing trend between January 2018 and December 2020. Taking into account the demographic structure of apps users, our results confirm findings from past studies: the majority of platform workers are young men but the age structure of app users is different for each of the two categories of services. Another surprising finding is that foreigners do not account for the majority of gig workers in Poland. When the number of platform workers is compared with corresponding working populations, the estimated share of active app users accounts for about 0.5-2% of working populations in 9 largest Polish cities.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.12827&r=

Credit spread approximation and improvement using random forest regression

By:	Mathieu Mercadier; Jean-Pierre Lardy
Abstract:	Credit Default Swap (CDS) levels provide a market appreciation of companies' default risk. These derivatives are not always available, creating a need for CDS approximations. This paper offers a simple, global and transparent CDS structural approximation, which contrasts with more complex and proprietary approximations currently in use. This Equity-to-Credit formula (E2C), inspired by CreditGrades, obtains better CDS approximations, according to empirical analyses based on a large sample spanning 2016-2018. A random forest regression run with this E2C formula and selected additional financial data results in an 87.3% out-of-sample accuracy in CDS approximations. The transparency property of this algorithm confirms the predominance of the E2C estimate, and the impact of companies' debt rating and size, in predicting their CDS.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.07358&r=

The Data Privacy Paradox and Digital Demand

By:	Long Chen; Yadong Huang; Shumiao Ouyang; Wei Xiong
Abstract:	A central issue in privacy governance is understanding how users balance their privacy preferences and data sharing to satisfy service demands. We combine survey and behavioral data of a sample of Alipay users to examine how data privacy preferences affect their data sharing with third-party mini-programs on the Alipay platform. We find that there is no relationship between the respondents’ self-stated privacy concerns and their number of data-sharing authorizations, confirming the puzzling data privacy paradox. Instead of attributing this paradox to the respondents’ unreliable survey responses, resignation from active protection of their data privacy, or behavioral factors in making their data-sharing choices, we show that this phenomenon can be explained by a curious finding that users with stronger privacy concerns tend to benefit more from using mini-programs. This positive relationship between privacy concerns and digital demands further suggests that consumers may develop data privacy concerns as a by-product of the process of using digital applications, not because such concerns are innate.
JEL:	D03 D12 M15
Date:	2021–05
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28854&r=

Competition in Pricing Algorithms

By:	Zach Y. Brown; Alexander MacKay
Abstract:	Increasingly, retailers have access to better pricing technology, especially in online markets. Using hourly data from five major online retailers, we show that retailers set prices at regular intervals that differ across firms. In addition, faster firms appear to use automated pricing rules that are functions of rivals' prices. These features are inconsistent with the standard assumptions about pricing technology used in the empirical literature. Motivated by these facts, we consider a model of competition in which firms can differ in pricing frequency and choose pricing algorithms rather than prices. We demonstrate that, relative to the standard simultaneous price-setting model, pricing technology with these features can increase prices in Markov perfect equilibrium. A simple counterfactual simulation implies that pricing algorithms lead to meaningful increases in markups in our empirical setting, especially for firms with the fastest pricing technology.
JEL:	D43 L13 L81 L86
Date:	2021–05
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28860&r=

Influencers on Economic Issues in Latin America, Spain and the United States – II

By:	Newland, Carlos (The Johns Hopkins Institute for Applied Economics, Global Health, and the Study of Business Enterprise); Rosiello, Juan Carlos (The Johns Hopkins Institute for Applied Economics, Global Health, and the Study of Business Enterprise); Salinas, Roberto (The Johns Hopkins Institute for Applied Economics, Global Health, and the Study of Business Enterprise)
Abstract:	The technological progress in our modern societies has witnessed the emergence of persons who deploy different means of communication across social networks, seeking to generate an impact among their audiences. These efforts in social media communications attempt to alter consumption preferences and patterns, political choices, as well as reinforce or modify opinions of all sorts and stripes. Individuals who attain greater relevance due to effects they trigger on third parties are characterized as influencers, and one of their preferred means of communication are online platforms or social media. Among them, Twitter stands out as the most conducive space for debates on ideas, political parties, or public policies. This social media platform is a microblogging service that allows a person to send short messages (up to 280 characters) that are displayed on a user’s individual page, and that are replicated on their followers’ pages. In this paper, we aim to identify the most important influencers in Latin America, the United States and Spain, who use this social media network to debate issues primarily related to economics and economic policy. On this subject, there is a very strong discussion about the role that the government should play in economic life, the pros and cons of greater regulation, the problem of income distribution, the impact of inflation, and the nature of free markets and capitalism. We will first describe the methodology we employed, in order to then proceed to illustrate a ranking of the ten most relevant influencers, in terms of number of followers, from Argentina, Brazil, Colombia, Chile, Mexico, Spain, and the United States. We then explore their profiles and present an analysis of the economic issues debated on the relevant Twitter accounts on a per country basis. Based on this analysis, we present a hypothesis on the positioning of influencers in economic matters. Finally, the global reach of the universe of influencers that are considered in this essay is described and measured.
Date:	2021–06–18
URL:	http://d.repec.org/n?u=RePEc:ris:jhisae:0183&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.