nep-big 2022-01-10 papers

on Big Data

Issue of 2022‒01‒10
thirty papers chosen by
Tom Coupé
University of Canterbury

Forecasting Social Unrest: A Machine Learning Approach By Sandile Hlatshwayo; Chris Redl
The Determinants of Design Applications in Europe By Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Leogrande, Domenico
Model-based Recursive Partitioning to Estimate Unfair Health Inequalities in the United Kingdom Household Longitudinal Study By Paolo Brunori; Apostolos Davillas; Andrew Jones; Giovanna Scarchilli
Generative Adversarial Network (GAN) and Enhanced Root Mean Square Error (ERMSE): Deep Learning for Stock Price Movement Prediction By Ashish Kumar; Abeer Alsadoon; P. W. C. Prasad; Salma Abdullah; Tarik A. Rashid; Duong Thu Hang Pham; Tran Quoc Vinh Nguyen
Expert Aggregation for Financial Forecasting By Bri\`ere Marie; Alasseur Cl\'emence; Joseph Mikael; Carl Remlinger
Forex Trading Volatility Prediction using NeuralNetwork Models By Shujian Liao; Jian Chen; Hao Ni
Fast Sampling from Time-Integrated Bridges using Deep Learning By Leonardo Perotti; Lech A. Grzelak
Improved Method of Stock Trading under Reinforcement Learning Based on DRQN and Sentiment Indicators ARBR By Peng Zhou; Jingling Tang
Optimal Price Targeting By Adam N. Smith; Stephan Seiler; Ishant Aggarwal
Using Polls to Forecast Popular Vote Share for US Presidential Elections 2016 and 2020: An Optimal Forecast Combination Based on Ensemble Empirical Model By Easaw, Joshy; Fang, Yongmei; Heravi, Saeed
Competition analysis on the over-the-counter credit default swap market By Louis Abraham
Development of an Ensemble of Models for Predicting Socio-Economic Indicators of the Russian Federation using IRT-Theory and Bagging Methods By Kitova, Olga; Savinova, Victoria
Recent Advances in Reinforcement Learning in Finance By Ben Hambly; Renyuan Xu; Huining Yang
Intuitive Mathematical Economics Series. General Equilibrium Models and the Gradient Field Method By Tomás Marinozzi; Leandro Nallar; Sergio Pernice
A Review on Graph Neural Network Methods in Financial Applications By Jianian Wang; Sheng Zhang; Yanghua Xiao; Rui Song
State of the art on ethical, legal, and social issues linked to audio- and video-based AAL solutions By Ake-Kob, Alin; Blazeviciene, Aurelija; Colonna, Liane; Cartolovni, Anto; Dantas, Carina; Fedosov, Anton; Florez-Revuelta, Francisco; Fosch-Villaronga, Eduard; He, Zhicheng; Klimczuk, Andrzej; Kuźmicz, Maksymilian; Lukács, Adrienn; Lutz, Christoph; Mekovec, Renata; Miguel, Cristina; Mordini, Emilio; Pajalic, Zada; Pierscionek, Barbara Krystyna; Jose Santofimia Romero, Maria; Ali Salah, Albert; Sobecki, Andrzej; Solanas, Agusti; Tamò-Larrieux, Aurelia
Simple Allocation Rules and Optimal Portfolio Choice Over the Lifecycle By Victor Duarte; Julia Fonseca; Aaron S. Goodman; Jonathan A. Parker
Safe Havens, Machine Learning, and the Sources of Geopolitical Risk: A Forecasting Analysis Using Over a Century of Data By Rangan Gupta; Sayar Karmakar; Christian Pierdzioch
High-Dimensional Stock Portfolio Trading with Deep Reinforcement Learning By Uta Pigorsch; Sebastian Sch\"afer
Travailleur du savoir et risque d'intolérance sensorielle lié à l'interaction hommemachine intelligente By Emmanuel Okamba
Detecting Edgeworth Cycles By Timothy Holt; Mitsuru Igami; Simon Scheidegger
Labour-saving automation and occupational exposure: a text-similarity measure By Fabio Montobbio; Jacopo Staccioli; Maria Enrica Virgillito; Marco Vivarelli
Omitted Variable Bias in Machine Learned Causal Models By Victor Chernozhukov; Carlos Cinelli; Whitney Newey; Amit Sharma; Vasilis Syrgkanis
Market power and artificial intelligence work on online labour markets By Néstor Duch-Brown; Estrella Gomez-Herrera; Frank Mueller-Langer; Songül Tolan
Digitalization in MedTech: Understanding the Impact on Total Knee Arthroplasty By Lorenz, Max; Reinhard, Patrick; Spring, Thomas
Structural Sieves By Konrad Menzel
Fair learning with bagging By Jean-David Fermanian; Dominique Guégan
Multi-modal Attention Network for Stock Movements Prediction By Shwai He; Shi Gu
Vaccination Policy and Trust By Jelnov, Artyom; Jelnov, Pavel
Privacy Laws and Value of Personal Data By Mehmet Canayaz; Ilja Kantorovitch; Roxana Mihet

Forecasting Social Unrest: A Machine Learning Approach

By:	Sandile Hlatshwayo; Chris Redl
Abstract:	We produce a social unrest risk index for 125 countries covering a period of 1996 to 2020. The risk of social unrest is based on the probability of unrest in the following year derived from a machine learning model drawing on over 340 indicators covering a wide range of macro-financial, socioeconomic, development and political variables. The prediction model correctly forecasts unrest in the following year approximately two-thirds of the time. Shapley values indicate that the key drivers of the predictions include high levels of unrest, food price inflation and mobile phone penetration, which accord with previous findings in the literature.
Keywords:	Social unrest, machine learning.
Date:	2021–11–05
URL:	http://d.repec.org/n?u=RePEc:imf:imfwpa:2021/263&r=

The Determinants of Design Applications in Europe

By:	Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio; Leogrande, Domenico
Abstract:	In this article we estimate the level of “Design Application” in 37 European Countries in the period 2010-2019. We use data from the European Innovation Scoreboard-EIS of the European Commission. We perform four econometric models i.e., Pooled OLS, Panel Data with Random Effects, Panel Data with Fixed Effects, Dynamic Panel. We found that the level of Design Applications is negatively associated to “Enterprise Births”, “Finance and Support”, “Firm Investments” and positively associated with “Venture Capital”, “Turnover share large enterprises”, “R&D expenditure public sector”, “Intellectual Assets”. In adjunct we perform a cluster analysis with the application of the k�Means algorithm optimized with the Silhouette Coefficient and we found three different clusters. Finally, we confront eight different machine learning algorithms to predict the level of “Design Application” and we found that the Tree Ensemble is the best predictor with a value for the 30% of the dataset analyzed that is expected to decrease in mean of -12,86%.
Keywords:	Innovation and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Technological Change: Choices and Consequences; Intellectual Property and Intellectual Capital.
JEL:	O30 O31 O32 O33 O34
Date:	2021–11–04
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:110836&r=

Model-based Recursive Partitioning to Estimate Unfair Health Inequalities in the United Kingdom Household Longitudinal Study

By:	Paolo Brunori (London School of Economics , International Inequality Institute & University of Bari); Apostolos Davillas (University of East Anglia, Norwich Medical School); Andrew Jones (University of York); Giovanna Scarchilli (University of Trento & University of Modena and Reggio Emilia)
Abstract:	We measure unfair health inequality in the UK using a novel data-driven empirical approach. We explain health variability as the result of circumstances beyond individual control and health-related behaviours. We do this using model-based recursive partitioning, a supervised machine learning algorithm. Unlike usual tree-based algorithms, model-based recursive partitioning does identify social groups with different expected levels of health but also unveils the heterogeneity of the relationship linking behaviors and health outcomes across groups. The empirical application is conducted using the UK Household Longitudinal Study. We show that unfair inequality is a substantial fraction of the total explained health variability. This finding holds no matter which exact definition of fairness is adopted: using both the fairness gap and direct unfairness measures, each evaluated at different reference values for circumstances or effort.
Keywords:	Health inequality, machine learning, UK Household Longitudinal Study
JEL:	I14 D63
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2021-596&r=

Generative Adversarial Network (GAN) and Enhanced Root Mean Square Error (ERMSE): Deep Learning for Stock Price Movement Prediction

By:	Ashish Kumar; Abeer Alsadoon; P. W. C. Prasad; Salma Abdullah; Tarik A. Rashid; Duong Thu Hang Pham; Tran Quoc Vinh Nguyen
Abstract:	The prediction of stock price movement direction is significant in financial circles and academic. Stock price contains complex, incomplete, and fuzzy information which makes it an extremely difficult task to predict its development trend. Predicting and analysing financial data is a nonlinear, time-dependent problem. With rapid development in machine learning and deep learning, this task can be performed more effectively by a purposely designed network. This paper aims to improve prediction accuracy and minimizing forecasting error loss through deep learning architecture by using Generative Adversarial Networks. It was proposed a generic model consisting of Phase-space Reconstruction (PSR) method for reconstructing price series and Generative Adversarial Network (GAN) which is a combination of two neural networks which are Long Short-Term Memory (LSTM) as Generative model and Convolutional Neural Network (CNN) as Discriminative model for adversarial training to forecast the stock market. LSTM will generate new instances based on historical basic indicators information and then CNN will estimate whether the data is predicted by LSTM or is real. It was found that the Generative Adversarial Network (GAN) has performed well on the enhanced root mean square error to LSTM, as it was 4.35% more accurate in predicting the direction and reduced processing time and RMSE by 78 secs and 0.029, respectively. This study provides a better result in the accuracy of the stock index. It seems that the proposed system concentrates on minimizing the root mean square error and processing time and improving the direction prediction accuracy, and provides a better result in the accuracy of the stock index.
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.03946&r=

Expert Aggregation for Financial Forecasting

By:	Bri\`ere Marie; Alasseur Cl\'emence; Joseph Mikael; Carl Remlinger
Abstract:	Machine learning algorithms dedicated to financial time series forecasting have gained a lot of interest over the last few years. One difficulty lies in the choice between several algorithms, as their estimation accuracy may be unstable through time. In this paper, we propose to apply an online aggregation-based forecasting model combining several machine learning techniques to build a portfolio which dynamically adapts itself to market conditions. We apply this aggregation technique to the construction of a long-short-portfolio of individual stocks ranked on their financial characteristics and we demonstrate how aggregation outperforms single algorithms both in terms of performances and of stability.
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2111.15365&r=

Forex Trading Volatility Prediction using NeuralNetwork Models

By:	Shujian Liao; Jian Chen; Hao Ni
Abstract:	In this paper, we investigate the problem of predicting the future volatility of Forex currency pairs using the deep learning techniques. We show step-by-step how to construct the deep-learning network by the guidance of the empirical patterns of the intra-day volatility. The numerical results show that the multiscale Long Short-Term Memory (LSTM) model with the input of multi-currency pairs consistently achieves the state-of-the-art accuracy compared with both the conventional baselines, i.e. autoregressive and GARCH model, and the other deep learning models.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.01166&r=

Fast Sampling from Time-Integrated Bridges using Deep Learning

By:	Leonardo Perotti; Lech A. Grzelak
Abstract:	We propose a methodology to sample from time-integrated stochastic bridges, namely random variables defined as $\int_{t_1}^{t_2} f(Y(t))dt$ conditioned on $Y(t_1)\!=\!a$ and $Y(t_2)\!=\!b$, with $a,b\in R$. The Stochastic Collocation Monte Carlo sampler and the Seven-League scheme are applied for this purpose. Notably, the distribution of the time-integrated bridge is approximated utilizing a polynomial chaos expansion built on a suitable set of stochastic collocation points. Furthermore, artificial neural networks are employed to learn the collocation points. The result is a robust, data-driven procedure for the Monte Carlo sampling from conditional time-integrated processes, which guarantees high accuracy and generates thousands of samples in milliseconds. Applications, with a focus on finance, are presented here as well.
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2111.13901&r=

Improved Method of Stock Trading under Reinforcement Learning Based on DRQN and Sentiment Indicators ARBR

By:	Peng Zhou; Jingling Tang
Abstract:	With the application of artificial intelligence in the financial field, quantitative trading is considered to be profitable. Based on this, this paper proposes an improved deep recurrent DRQN-ARBR model because the existing quantitative trading model ignores the impact of irrational investor behavior on the market, making the application effect poor in an environment where the stock market in China is non-efficiency. By changing the fully connected layer in the original model to the LSTM layer and using the emotion indicator ARBR to construct a trading strategy, this model solves the problems of the traditional DQN model with limited memory for empirical data storage and the impact of observable Markov properties on performance. At the same time, this paper also improved the shortcomings of the original model with fewer stock states and chose more technical indicators as the input values of the model. The experimental results show that the DRQN-ARBR algorithm proposed in this paper can significantly improve the performance of reinforcement learning in stock trading.
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2111.15356&r=

Optimal Price Targeting

By:	Adam N. Smith; Stephan Seiler; Ishant Aggarwal
Abstract:	We examine the proﬁtability of personalized pricing policies that are derived using diﬀerent speciﬁcations of demand in a typical retail setting with consumer-level panel data. We generate pricing policies from a variety of models, including Bayesian hierarchical choice models, regularized regressions, and classiﬁcation trees using diﬀerent sets of data inputs. To compare pricing policies, we employ an inverse probability weighted estimator of proﬁts that explicitly takes into account non-random price variation and the panel nature of the data. We ﬁnd that the performance of machine learning models is highly varied, ranging from a 21% loss to a 17% gain relative to a blanket couponing strategy, and a standard Bayesian hierarchical logit model achieves a 17.5% gain. Across all models purchase histories lead to large improvements in proﬁts, but demographic information only has a small impact. We show that out-of-sample hit probabilities, a standard measure of model performance, are uncorrelated with our proﬁt estimator and provide poor guidance towards model selection.
Keywords:	targeting, personalization, heterogeneity, choice models, machine learning
JEL:	C11 C33 C45 C52 D12 L11 L81
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_9439&r=

Using Polls to Forecast Popular Vote Share for US Presidential Elections 2016 and 2020: An Optimal Forecast Combination Based on Ensemble Empirical Model

By:	Easaw, Joshy (Cardiff Business School); Fang, Yongmei (College of Mathematics and Informatics, South China Agricultural University, China); Heravi, Saeed (Cardiff Business School)
Abstract:	This study introduces the Ensemble Empirical Mode Decomposition (EEMD) technique to forecasting popular vote share. The technique is useful when using polling data, which is pertinent when none of the main candidates is the incumbent. Our main interest in this study is the short- and long-term forecasting and, thus, we consider from the short forecast horizon of 1-day to three months ahead. The EEMD technique is used to decompose the election data for the two most recent US presidential elections; 2016 and 2020 US. Three models, Support Vector Machine (SVM), Neural Network (NN) and ARIMA models are then used to predict the decomposition components. The final hybrid model is then constructed by comparing the prediction performance of the decomposition components. The predicting performance of the combination model are compared with the benchmark individual models, SVM, NN, and ARIMA. In addition, this compared to the single prediction market IOWA Electronic Markets. The results indicated that the prediction performance of EEMD combined model is better than that of individual models.
Keywords:	Forecasting Popular Votes Shares; Electoral Poll; Forecast combination, Hybrid model; Support Vector Machine
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:cdf:wpaper:2021/34&r=

Competition analysis on the over-the-counter credit default swap market

By:	Louis Abraham (ETH Zürich - Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich])
Abstract:	We study two questions related to competition on the OTC CDS market using data collected as part of the EMIR regulation. First, we study the competition between central counterparties through collateral requirements. We present models that successfully estimate the initial margin requirements. However, our estimations are not precise enough to use them as input to a predictive model for CCP choice by counterparties in the OTC market. Second, we model counterpart choice on the interdealer market using a novel semi-supervised predictive task. We present our methodology as part of the literature on model interpretability before arguing for the use of conditional entropy as the metric of interest to derive knowledge from data through a model-agnostic approach. In particular, we justify the use of deep neural networks to measure conditional entropy on real-world datasets. We create the $\textit{Razor entropy}$ using the framework of algorithmic information theory and derive an explicit formula that is identical to our semi-supervised training objective. Finally, we borrow concepts from game theory to define $\textit{top-k Shapley values}$. This novel method of payoff distribution satisfies most of the properties of Shapley values, and is of particular interest when the value function is monotone submodular. Unlike classical Shapley values, top-k Shapley values can be computed in quadratic time of the number of features instead of exponential. We implement our methodology and report the results on our particular task of counterpart choice. Finally, we present an improvement to the $\textit{node2vec}$ algorithm that could for example be used to further study intermediation. We show that the neighbor sampling used in the generation of biased walks can be performed in logarithmic time with a quasilinear time pre-computation, unlike the current implementations that do not scale well.
Date:	2021–11–29
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03454808&r=

Development of an Ensemble of Models for Predicting Socio-Economic Indicators of the Russian Federation using IRT-Theory and Bagging Methods

By:	Kitova, Olga; Savinova, Victoria
Abstract:	This article describes the application of the bagging method to build a forecast model for the socio-economic indicators of the Russian Federation. This task is one of the priorities within the framework of the Federal Project "Strategic Planning", which implies the creation of a unified decision support system capable of predicting socio-economic indicators. This paper considers the relevance of the development of forecasting models, examines and analyzes the work of researchers on this topic. The authors carried out computational experiments for 40 indicators of the socio-economic sphere of the Russian Federation. For each indicator, a linear multiple regression equation was constructed. For the constructed equations, verification was carried out and indicators with the worst accuracy and quality of the forecast were selected. For these indicators, neural network modeling was carried out. Multilayer perceptrons were chosen as the architecture of neural networks. Next, an analysis of the accuracy and quality of neural network models was carried out. Indicators that could not be predicted with a sufficient level of accuracy were selected for the bagging procedure. Bagging was used for weighted averaging of prediction results for neural networks of various configurations. Item Response Theory (IRT) elements were used to determine the weights of the models.
Keywords:	Socio-economic Indicators of the Russian Federation, Forecasting, Bagging, Multiple Linear Regression, Neural Networks, Item Response Theory
JEL:	C45
Date:	2021–11–25
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:110824&r=

Recent Advances in Reinforcement Learning in Finance

By:	Ben Hambly; Renyuan Xu; Huining Yang
Abstract:	The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.04553&r=

Intuitive Mathematical Economics Series. General Equilibrium Models and the Gradient Field Method

By:	Tomás Marinozzi; Leandro Nallar; Sergio Pernice
Abstract:	General equilibrium models are typically presented with mathematical methods, such as the Edgeworth Box, that do not easily generalize to more than two goods and more than two agents. This is fine as a conceptual introduction, but it may be insufficient in the “Big-Data Machine-Learning Era”, with gigantic databases filled with data of extremely high dimensionality that are already changing the practice, and perhaps even the conceptual basis, of economics and other social sciences. In this paper present what we call the “Gradient Field Method” to solve these problems. It has the advantage of being, 1) as intuitive as the Edgeworth Box, 2) easily generalizes to far more complex situations, and 3) nicely mesh with the data friendly techniques of the new Era. In addition, it provides a unified framework to present both, partial equilibrium, and general equilibrium problems.
Keywords:	microeconomics, general equilibrium, radient, gradient field, machine learning.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:cem:doctra:820&r=

A Review on Graph Neural Network Methods in Financial Applications

By:	Jianian Wang; Sheng Zhang; Yanghua Xiao; Rui Song
Abstract:	Keeping the individual features and the complicated relations, graph data are widely utilized and investigated. Being able to capture the structural information by updating and aggregating nodes' representations, graph neural network (GNN) models are gaining popularity. In the financial context, the graph is constructed based on real-world data, which leads to complex graph structure and thus requires sophisticated methodology. In this work, we provide a comprehensive review of GNN models in recent financial context. We first categorize the commonly-used financial graphs and summarize the feature processing step for each node. Then we summarize the GNN methodology for each graph type, application in each area, and propose some potential research areas.
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2111.15367&r=

State of the art on ethical, legal, and social issues linked to audio- and video-based AAL solutions

By:	Ake-Kob, Alin; Blazeviciene, Aurelija; Colonna, Liane; Cartolovni, Anto; Dantas, Carina; Fedosov, Anton; Florez-Revuelta, Francisco; Fosch-Villaronga, Eduard; He, Zhicheng; Klimczuk, Andrzej; Kuźmicz, Maksymilian; Lukács, Adrienn; Lutz, Christoph; Mekovec, Renata; Miguel, Cristina; Mordini, Emilio; Pajalic, Zada; Pierscionek, Barbara Krystyna; Jose Santofimia Romero, Maria; Ali Salah, Albert; Sobecki, Andrzej; Solanas, Agusti; Tamò-Larrieux, Aurelia
Abstract:	Ambient assisted living (AAL) technologies are increasingly presented and sold as essential smart additions to daily life and home environments that will radically transform the healthcare and wellness markets of the future. An ethical approach and a thorough understanding of all ethics in surveillance/monitoring architectures are therefore pressing. AAL poses many ethical challenges raising questions that will affect immediate acceptance and long-term usage. Furthermore, ethical issues emerge from social inequalities and their potential exacerbation by AAL, accentuating the existing access gap between high-income countries (HIC) and low and middle-income countries (LMIC). Legal aspects mainly refer to the adherence to existing legal frameworks and cover issues related to product safety, data protection, cybersecurity, intellectual property, and access to data by public, private, and government bodies. Successful privacy-friendly AAL applications are needed, as the pressure to bring Internet of Things (IoT) devices and ones equipped with artificial intelligence (AI) quickly to market cannot overlook the fact that the environments in which AAL will operate are mostly private (e.g., the home). The social issues focus on the impact of AAL technologies before and after their adoption. Future AAL technologies need to consider all aspects of equality such as gender, race, age and social disadvantages and avoid increasing loneliness and isolation among, e.g. older and frail people. Finally, the current power asymmetries between the target and general populations should not be underestimated nor should the discrepant needs and motivations of the target group and those developing and deploying AAL systems. Whilst AAL technologies provide promising solutions for the health and social care challenges, they are not exempt from ethical, legal and social issues (ELSI). A set of ELSI guidelines is needed to integrate these factors at the research and development stage.
Keywords:	Ethical principles,Privacy,Assistive Living Technologies,Privacy by Design,General Data Protection Regulation,housing
JEL:	O18 D19 R58 M14 O33
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:esrepo:248470&r=

Simple Allocation Rules and Optimal Portfolio Choice Over the Lifecycle

By:	Victor Duarte; Julia Fonseca; Aaron S. Goodman; Jonathan A. Parker
Abstract:	We develop a machine-learning solution algorithm to solve for optimal portfolio choice in a detailed and quantitatively-accurate lifecycle model that includes many features of reality modelled only separately in previous work. We use the quantitative model to evaluate the consumption-equivalent welfare losses from using simple rules for portfolio allocation across stocks, bonds, and liquid accounts instead of the optimal portfolio choices. We find that the consumption-equivalent losses from using an age-dependent rule as embedded in current target-date/lifecycle funds (TDFs) are substantial, around 2 to 3 percent of consumption, despite the fact that TDF rules mimic average optimal behavior by age closely until shortly before retirement. Our model recommends higher average equity shares in the second half of life than the portfolio of the typical TDF, so that the typical TDF portfolio does not improve on investing an age-independent 2/3 share in equity. Finally, optimal equity shares have substantial heterogeneity, particularly by wealth level, state of the business cycle, and dividend-price ratio, implying substantial gains to further customization of advice or TDFs in these dimensions.
JEL:	C61 D15 E21 G11 G51
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:29559&r=

Safe Havens, Machine Learning, and the Sources of Geopolitical Risk: A Forecasting Analysis Using Over a Century of Data

By:	Rangan Gupta (Department of Economics, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa); Sayar Karmakar (Department of Statistics, University of Florida, 230 Newell Drive, Gainesville, FL, 32601, USA); Christian Pierdzioch (Department of Economics, Helmut Schmidt University, Holstenhofweg 85, P.O.B. 700822, 22008 Hamburg, Germany)
Abstract:	We use monthly data covering a century-long sample period (1915-2021) to study whether geopolitical risk helps to forecast subsequent gold returns and gold volatility. We account not only for geopolitical threats and acts, but also for 39 country-specific sources of geopolitical risk. The response of subsequent returns and volatility is heterogeneous across countries and nonlinear. We find that accounting for geopolitical risk at the country level improves forecast accuracy especially when we use random forests to estimate our forecasting models. As an extension, we report empirical evidence on the predictive value of the country-level sources of geopolitical risk for two other candidate safe-haven assets, oil and silver, over the sample periods 1900â€“2021 and 1915â€“2021, respectively. Our results have important implications for the portfolio decisions of investors who seek a safe haven in times of heightened geopolitical tensions.
Keywords:	Gold, Geopolitical Risk, Forecasting, Returns, Volatility, Random Forests
JEL:	C22 D80 H56 Q02
Date:	2022–01
URL:	http://d.repec.org/n?u=RePEc:pre:wpaper:202201&r=

High-Dimensional Stock Portfolio Trading with Deep Reinforcement Learning

By:	Uta Pigorsch; Sebastian Sch\"afer
Abstract:	This paper proposes a Deep Reinforcement Learning algorithm for financial portfolio trading based on Deep Q-learning. The algorithm is capable of trading high-dimensional portfolios from cross-sectional datasets of any size which may include data gaps and non-unique history lengths in the assets. We sequentially set up environments by sampling one asset for each environment while rewarding investments with the resulting asset's return and cash reservation with the average return of the set of assets. This enforces the agent to strategically assign capital to assets that it predicts to perform above-average. We apply our methodology in an out-of-sample analysis to 48 US stock portfolio setups, varying in the number of stocks from ten up to 500 stocks, in the selection criteria and in the level of transaction costs. The algorithm on average outperforms all considered passive and active benchmark investment strategies by a large margin using only one hyperparameter setup for all portfolios.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.04755&r=

Travailleur du savoir et risque d'intolérance sensorielle lié à l'interaction hommemachine intelligente

By:	Emmanuel Okamba (UPEM - Université Paris-Est Marne-la-Vallée)
Abstract:	La transformation digitale des organisations, caractérisée par l'usage généralisé de l'intelligence artificielle, via des algorithmes et des machines numériques, libère l'homme des tâches répétitives et crée le travailleur du savoir. Le profil de ce travailleur est une source de gains de productivité, lorsqu'il réduit le risque d'intolérance sensorielle lié au travail digital auquel il est exposé, améliore la concentration et la créativité en établissant plus d'harmonie esthétique que cognitive entre le travailleur et la machine intelligente.
Keywords:	Transformation digitale,Travailleur intelligent,Intelligence artificielle,Musicothérapie réceptive,Misophonie
Date:	2021–11–28
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03453843&r=

Detecting Edgeworth Cycles

By:	Timothy Holt; Mitsuru Igami; Simon Scheidegger
Abstract:	We propose algorithms to detect "Edgeworth cycles", asymmetric price movements that have caused antitrust concerns in many countries. We formalize four existing methods and propose six new methods based on spectral analysis and machine learning. We evaluate their accuracy in station-level gasoline-price data from Western Australia, New South Wales, and Germany. Most methods achieve high accuracy in the first two, but only a few can detect nuanced cycles in the third. Results suggest whether researchers find a positive or negative statistical relationship between cycles and markups, and hence their implications for competition policy, crucially depends on the choice of methods.
Keywords:	Edgeworth cycles, Fuel prices, Markups, Nonparametric methods
JEL:	C45 C55 L13 L41
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:lau:crdeep:21.16&r=

Labour-saving automation and occupational exposure: a text-similarity measure

By:	Fabio Montobbio (Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore – BRICK, Collegio Carlo Alberto, Torino – ICRIOS, Bocconi University, Milano); Jacopo Staccioli (Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore – Institute of Economics, Scuola Superiore Sant’Anna, Pisa); Maria Enrica Virgillito (Institute of Economics, Scuola Superiore Sant’Anna, Pisa – Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore); Marco Vivarelli (Dipartimento di Politica Economica, DISCE, Università Cattolica del Sacro Cuore – UNU-MERIT, Maastricht, The Netherlands – IZA, Bonn, Germany)
Abstract:	This paper represents one of the first attempts at building a direct measure of occupational exposure to robotic labour-saving technologies. After identifying robotic and labour-saving robotic patents retrieved by Montobbio et al., (2022), the underlying 4-digit CPC definitions are employed in order to detect functions and operations performed by technological artefacts which are more directed to substitute the labour input. This measure allows to obtain fine-grained information on tasks and occupations according to their similarity ranking. Occupational exposure by wage and employment dynamics in the United States is then studied, complemented by investigating industry and geographical penetration rates.
Keywords:	Labour-Saving Technology, Natural Language Processes, Labour Markets, Technological Unemployment
JEL:	O33 J24
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:ctc:serie5:dipe0021&r=

Omitted Variable Bias in Machine Learned Causal Models

By:	Victor Chernozhukov; Carlos Cinelli; Whitney Newey; Amit Sharma; Vasilis Syrgkanis
Abstract:	We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution -- all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial $R^2$ (Pearson's "correlation ratio") of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.13398&r=

Market power and artificial intelligence work on online labour markets

By:	Néstor Duch-Brown; Estrella Gomez-Herrera; Frank Mueller-Langer; Songül Tolan
Abstract:	The views are those of the authors and should not be regarded as stating an official position of the European Commission. Frank Mueller-Langer gratefully acknowledges financial support from a research grant of the University of the Bundeswehr Munich. We investigate three alternative but complementary indicators of market power on one of the largest online labour markets (OLMs) in Europe- (1) the elasticity of labour demand, (2) the elasticity of labour...
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:bre:wpaper:46376&r=

Digitalization in MedTech: Understanding the Impact on Total Knee Arthroplasty

By:	Lorenz, Max; Reinhard, Patrick; Spring, Thomas
Abstract:	Digital Technologies (DTs) in healthcare are of growing relevance for different actors along the patient journey. This paper breaks down the complex landscape of digitalization by focusing on the Total Knee Arthroplasty (TKA). It aims to identify today's technologies and the most promising future trend, assessing the impact on the respective stakeholders. To answer these questions, a structured literature review (SLR) was conducted combining the search term digital AND knee AND replacement with journey OR value OR trend. This resulted in 39 peer-reviewed articles for in-depth analysis. In addition, a qualitative assessment was carried out based on 27 semi-structured interviews (SSI) with six stakeholder groups (patients, surgeons, physiotherapists, industry experts, insurance representatives, regulators) along the patients' TKA journey. The SLR revealed five clusters (3D Printing, Big Data, Wearables, Virtual Healthcare, Robotics) as most recurrent DTs within TKA. The SSIs confirmed that all five clusters are relevant and recognised today. Big Data is considered by the stakeholders to be the most promising DT in the future because of its power to interconnect the other technologies and thereby improve health outcomes. Among the different stakeholder groups, the effect of DTs on their individual roles were perceived differently. Regulatory hurdles and cost-benefit uncertainties were determined to be the most prominent obstacles on the establishment of DTs. Improvements in patient outcomes is the principal gain from utilizing DTs throughout the patient journey. However, the benefits of switching to DTs require convincing scientific evidence to promote acceptance by all stakeholders in a value-based healthcare system.
Keywords:	Digital Technologies,Value-Based Healthcare,TKA,Knee Replacement,Big Data,Patient Journey
JEL:	I10 I11 I19
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:hsgmed:202102&r=

Structural Sieves

By:	Konrad Menzel
Abstract:	This paper explores the use of deep neural networks for semiparametric estimation of economic models of maximizing behavior in production or discrete choice. We argue that certain deep networks are particularly well suited as a nonparametric sieve to approximate regression functions that result from nonlinear latent variable models of continuous or discrete optimization. Multi-stage models of this type will typically generate rich interaction effects between regressors ("inputs") in the regression function so that there may be no plausible separability restrictions on the "reduced-form" mapping form inputs to outputs to alleviate the curse of dimensionality. Rather, economic shape, sparsity, or separability restrictions either at a global level or intermediate stages are usually stated in terms of the latent variable model. We show that restrictions of this kind are imposed in a more straightforward manner if a sufficiently flexible version of the latent variable model is in fact used to approximate the unknown regression function.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.01377&r=

Fair learning with bagging

By:	Jean-David Fermanian (Ensae-Crest); Dominique Guégan (Université Paris1 Panthéon-Sorbonne, Centre d'Economie de la Sorbonne, - Ca' Foscari University of Venezia)
Abstract:	The central question of this paper is how to enhance supervised learning algorithms with fairness requirement ensuring that any sensitive input does not "unfairly"' influence the outcome of the learning algorithm. To attain this objective we proceed by three steps. First after introducing several notions of fairness in a uniform approach, we introduce a more general notion through conditional fairness definition which englobes most of the well known fairness definitions. Second we use a ensemble of binary and continuous classifiers to get an optimal solution for a fair predictive outcome using a related-post-processing procedure without any transformation on the data, nor on the training algorithms. Finally we introduce several tests to verify the fairness of the predictions. Some empirics are provided to illustrate our approach
Keywords:	fairness; nonparametric regression; classification; accuracy
JEL:	C10 C38 C53
Date:	2021–11
URL:	http://d.repec.org/n?u=RePEc:mse:cesdoc:21034&r=

Multi-modal Attention Network for Stock Movements Prediction

By:	Shwai He; Shi Gu
Abstract:	Stock prices move as piece-wise trending fluctuation rather than a purely random walk. Traditionally, the prediction of future stock movements is based on the historical trading record. Nowadays, with the development of social media, many active participants in the market choose to publicize their strategies, which provides a window to glimpse over the whole market's attitude towards future movements by extracting the semantics behind social media. However, social media contains conflicting information and cannot replace historical records completely. In this work, we propose a multi-modality attention network to reduce conflicts and integrate semantic and numeric features to predict future stock movements comprehensively. Specifically, we first extract semantic information from social media and estimate their credibility based on posters' identity and public reputation. Then we incorporate the semantic from online posts and numeric features from historical records to make the trading strategy. Experimental results show that our approach outperforms previous methods by a significant margin in both prediction accuracy (61.20\%) and trading profits (9.13\%). It demonstrates that our method improves the performance of stock movements prediction and informs future research on multi-modality fusion towards stock prediction.
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2112.13593&r=

Vaccination Policy and Trust

By:	Jelnov, Artyom; Jelnov, Pavel
Abstract:	We study the relationship between trust and vaccination. We show theoretically that vaccination rates are higher in countries with more transparent and accountable governments. The mechanism that generates this result is the lower probability of a transparent and accountable government to promote an unsafe vaccine. Empirical evidence supports this result. We find that countries perceived as less corrupt and more liberal experience higher vaccination rates. Furthermore, they are less likely to adopt a mandatory vaccination policy. One unit of the Corruption Perception Index (scaled from 0 to 10) is associated with a vaccination rate that is higher by one percentage point (pp) but with a likelihood of compulsory vaccination that is lower by 10 pp. In addition, Google Trends data show that public interest in corruption is correlated with interest in vaccination. The insight from our analysis is that corruption affects not only the supply but also the demand for public services.
Keywords:	vaccination,corruption
JEL:	I18
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:glodps:1003&r=

Privacy Laws and Value of Personal Data

By:	Mehmet Canayaz (Pennsylvania State University - Smeal College of Business(HEC Lausanne); Swiss Finance Institute); Ilja Kantorovitch (EPFL CFI SFI.LL); Roxana Mihet (Swiss Finance Institute - HEC Lausanne)
Abstract:	We analyze how the adoption of the California Consumer Protection Act (CCPA), which limits buying or selling consumer data, heterogeneously affects firms with and without previously gathered data on consumers. Exploiting a novel and hand-collected data set of 11,436 conversational-AI firms with rich personal data on identifiable U.S. consumers, we find that the CCPA gives a strong protection and advantage to firms with in-house data on consumers. First, products of these firms experience significant appreciations in customer ratings and are able to collect more customer data relative to their competitors after the adoption of the CCPA. Second, publicly traded firms with in-house data exhibit higher valuations, profitability, asset utilization, and they invest more after the adoption of the CCPA. Third, earnings of such firms can be more accurately predicted by analysts. To rationalize these empirical findings, we build a general equilibrium model where firms produce final goods using labor and data in the form of intangible capital, which can be traded with other firms subject to an iceberg transportation cost. When the introduction of the CCPA increases the transportation cost, firms without in-house data suffer the most because they cannot adequately substitute the previously externally purchased data, while firms with in-house data expand their market share.
Keywords:	Privacy, Voice Data, In-House Data, Big Data, Intangible Capital
JEL:	D80 G30 G31 G38 L20 O30
Date:	2021–12
URL:	http://d.repec.org/n?u=RePEc:chf:rpseri:rp2192&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.