nep-big 2022-01-03 papers

on Big Data

Issue of 2022–01–03
fifteen papers chosen by
Tom Coupé, University of Canterbury

Mapping data portability initiatives, opportunities and challenges By OECD
An Investigation of the Impact of COVID-19 Non-Pharmaceutical Interventions and Economic Support Policies on Foreign Exchange Markets with Explainable AI Techniques By Siyuan Liu; Mehmet Orcun Yalcin; Hsuan Fu; Xiuyi Fan
The Fairness of Credit Scoring Models By Christophe HURLIN; Christophe PERIGNON; Sébastien SAURIN
Neural networks-based algorithms for stochastic control and PDEs in finance * By Maximilien Germain; Huyên Pham; Xavier Warin
An Improved Reinforcement Learning Model Based on Sentiment Analysis By Yizhuo Li; Peng Zhou; Fangyi Li; Xiao Yang
A Bayesian Spatio-temporal model for predicting passengers' occupancy at Beijing Metro By Cabras, Stefano; Sunhe, Flor
Deep Hedging under Rough Volatility By Blanka Horvath; Josef Teichmann; Zan Zuric
Model-Based Recursive Partitioning to Estimate Unfair Health Inequalities in the United Kingdom Household Longitudinal Study By Brunori, Paolo; Davillas, Apostolos; Jones, Andrew M.; Scarchilli, Giovanna
Machine Learning Anwendungen in der betrieblichen Praxis: Praktische Empfehlungen zur betrieblichen Mitbestimmung By Thieltges, Andree
Firms going digital: Tapping into the potential of data for innovation By David Gierten; Steffen Viete; Raphaela Andres; Thomas Niebel
Using Text Analysis to Gauge the Reasons for Respondents' Assessment in the Economy Watchers Survey By Tomoaki Mikami; Hiroaki Yamagata; Jouchi Nakajima
Structured Additive Regression and Tree Boosting By Michael Mayer; Steven C. Bourassa; Martin Hoesli; Donato Scognamiglio
Exploration of machine learning algorithms for maritime risk applications By Knapp, S.; van de Velden, M.
Technology Adoption and Skills A Pilot Study of Kent SMEs By Catherine Robinson; Christian Siegel; Sisi Liao
The impact of transparency policies on local flexibility markets in electrical distribution networks: A case study with artificial neural network forecasts By Erik Heilmann

Mapping data portability initiatives, opportunities and challenges

By:	OECD
Abstract:	Data portability has become an essential tool for enhancing access to and sharing of data across digital services and platforms. This report explores to what extent data portability can empower users (natural and legal persons) to play a more active role in the re-use of their data across digital services and platforms. It also examines how data portability can help increase interoperability and data flows and thus enhance competition and innovation by reducing switching costs and lock-in effects.
Date:	2021–12–20
URL:	https://d.repec.org/n?u=RePEc:oec:stiaab:321-en

An Investigation of the Impact of COVID-19 Non-Pharmaceutical Interventions and Economic Support Policies on Foreign Exchange Markets with Explainable AI Techniques

By:	Siyuan Liu; Mehmet Orcun Yalcin; Hsuan Fu; Xiuyi Fan
Abstract:	Since the onset of the the COVID-19 pandemic, many countries across the world have implemented various non-pharmaceutical interventions (NPIs) to contain the spread of virus, as well as economic support policies (ESPs) to save their economies. The pandemic and the associated NPIs have triggered unprecedented waves of economic shocks to the financial markets, including the foreign exchange (FX) markets. Although there are some studies exploring the impact of the NPIs and ESPs on FX markets, the relative impact of individual NPIs or ESPs has not been studied in a combined framework. In this work, we investigate the relative impact of NPIs and ESPs with Explainable AI (XAI) techniques. Experiments over exchange rate data of G10 currencies during the period from January 1, 2020 to January 13, 2021 suggest strong impacts on exchange rate markets by all measures of the strict lockdown, such as stay at home requirements, workplace closing, international travel control, and restrictions on internal movement. Yet, the impact of individual NPI and ESP can vary across different currencies. To the best of our knowledge, this is the first work that uses XAI techniques to study the relative impact of NPIs and ESPs on the FX market. The derived insights can guide governments and policymakers to make informed decisions when facing with the ongoing pandemic and a similar situation in the near future.
Date:	2021–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2111.14620

The Fairness of Credit Scoring Models

By:	Christophe HURLIN; Christophe PERIGNON; Sébastien SAURIN
Keywords:	, Discrimination, Credit markets, Machine Learning, Artificial intelligence
Date:	2021
URL:	https://d.repec.org/n?u=RePEc:leo:wpaper:2912

Neural networks-based algorithms for stochastic control and PDEs in finance *

By:	Maximilien Germain (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique - UP - Université de Paris, EDF R&D - EDF R&D - EDF - EDF, EDF - EDF); Huyên Pham (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistiques et Modélisations - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique, FiME Lab - Laboratoire de Finance des Marchés d'Energie - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CREST - EDF R&D - EDF R&D - EDF - EDF); Xavier Warin (EDF R&D - EDF R&D - EDF - EDF, FiME Lab - Laboratoire de Finance des Marchés d'Energie - Université Paris Dauphine-PSL - PSL - Université Paris sciences et lettres - CREST - EDF R&D - EDF R&D - EDF - EDF, EDF - EDF)
Abstract:	This paper presents machine learning techniques and deep reinforcement learningbased algorithms for the efficient resolution of nonlinear partial differential equations and dynamic optimization problems arising in investment decisions and derivative pricing in financial engineering. We survey recent results in the literature, present new developments, notably in the fully nonlinear case, and compare the different schemes illustrated by numerical tests on various financial applications. We conclude by highlighting some future research directions.
Date:	2021
URL:	https://d.repec.org/n?u=RePEc:hal:journl:hal-03115503

An Improved Reinforcement Learning Model Based on Sentiment Analysis

By:	Yizhuo Li; Peng Zhou; Fangyi Li; Xiao Yang
Abstract:	With the development of artificial intelligence technology, quantitative trading systems represented by reinforcement learning have emerged in the stock trading market. The authors combined the deep Q network in reinforcement learning with the sentiment quantitative indicator ARBR to build a high-frequency stock trading model for the share market. To improve the performance of the model, the PCA algorithm is used to reduce the dimensionality feature vector while incorporating the influence of market sentiment on the long-short power into the spatial state of the trading model and uses the LSTM layer to replace the fully connected layer to solve the traditional DQN model due to limited empirical data storage. Through the use of cumulative income, Sharpe ratio to evaluate the performance of the model and the use of double moving averages and other strategies for comparison. The results show that the improved model proposed by authors is far superior to the comparison model in terms of income, achieving a maximum annualized rate of return of 54.5%, which is proven to be able to increase reinforcement learning performance significantly in stock trading.
Date:	2021–11
URL:	https://d.repec.org/n?u=RePEc:arx:papers:2111.15354

A Bayesian Spatio-temporal model for predicting passengers' occupancy at Beijing Metro

By:	Cabras, Stefano; Sunhe, Flor
Abstract:	This work focuses on predicting metro passenger flow at Beijing Metro stations and assessing uncertainty using a Bayesian Spatio-temporal model. Forecasting is essential for Metro operation management, such as automatically adjusting train operation diagrams or crowd regulation planning measures. Different from another approach, the proposed model can provide prediction uncertainty conditionally on available data, a critical feature that makes this algorithm different from usual machine learning prediction algorithms. The Bayesian Spatio-temporal model for areal Poisson counts includes random effects for stations and days. The fitted model on a test set provides a prediction accuracy that meets the standards of the Beijing Metro enterprise.
Keywords:	Bayesian Modelling; Integrated Nested Laplace Approximation; Spatio-Temporal Modelling; Poisson Counts
Date:	2021–12–16
URL:	https://d.repec.org/n?u=RePEc:cte:wsrepe:33787

Deep Hedging under Rough Volatility

By:	Blanka Horvath (ETH Zürich - Department of Mathematics); Josef Teichmann (ETH Zurich; Swiss Finance Institute); Zan Zuric (Imperial College London - Department of Mathematics)
Abstract:	We investigate the performance of the Deep Hedging framework under training paths beyond the (finite dimensional) Markovian setup. In particular we analyse the hedging performance of the original architecture under rough volatility models with view to existing theoretical results for those. Furthermore, we suggest parsimonious but suitable network architectures capable of capturing the non-Markoviantity of time-series. Secondly, we analyse the hedging behaviour in these models in terms of P&L distributions and draw comparisons to jump diffusion models if the the rebalancing frequency is realistically small.
Keywords:	Imperfect Hedging, Derivatives Pricing, Derivatives Hedging, Deep Learning, Rough Volatility
JEL:	C61 C58 C45 G32
Date:	2021–02
URL:	https://d.repec.org/n?u=RePEc:chf:rpseri:rp2188

Model-Based Recursive Partitioning to Estimate Unfair Health Inequalities in the United Kingdom Household Longitudinal Study

By:	Brunori, Paolo (London School of Economics); Davillas, Apostolos (University of East Anglia); Jones, Andrew M. (University of York); Scarchilli, Giovanna (University of Trento)
Abstract:	We measure unfair health inequality in the UK using a novel data- driven empirical approach. We explain health variability as the result of circumstances beyond individual control and health-related behaviours. We do this using model-based recursive partitioning, a supervised machine learning algorithm. Unlike usual tree-based algorithms, model-based recursive partitioning does identify social groups with different expected levels of health but also unveils the heterogeneity of the relationship linking behaviors and health outcomes across groups. The empirical application is conducted using the UK Household Longitudinal Study. We show that unfair inequality is a substantial fraction of the total explained health variability. This finding holds no matter which exact definition of fairness is adopted: using both the fairness gap and direct unfairness measures, each evaluated at different reference values for circumstances or effort.
Keywords:	machine learning, health equity, inequality of opportunity, unhealthy lifestyle behaviours
JEL:	I14 D63
Date:	2021–12
URL:	https://d.repec.org/n?u=RePEc:iza:izadps:dp14925

Machine Learning Anwendungen in der betrieblichen Praxis: Praktische Empfehlungen zur betrieblichen Mitbestimmung

By:	Thieltges, Andree
Abstract:	KI-Modelle und Machine-Learning-Anwendungen halten Einzug in die alltägliche Praxis von Unternehmen und können mitbestimmt werden. Um die Interessen und Rechte der Beschäftigten zu berücksichtigen und zu schützen, sollten die aktuellen Regelungen in betrieblichen IT-Vereinbarungen hinterfragt und hinsichtlich ihrer Praxistauglichkeit geprüft werden. Die Auswertung "Machine-Learning-Anwendungen in der betrieblichen Praxis" zeigt Handlungsmöglichkeiten anhand von Regelungspunkten aus insgesamt 29 abgeschlossenen Betriebs- und Dienstvereinbarungen. Die Ergebnisse wurden in Workshops mit Betriebs- und Personalräten diskutiert und relevante Regelungsaspekte zu KI-Modellen und Machine-Learning-Anwendungen abgeleitet. Sie sind Bestandteil der hier vorgestellten Handlungsempfehlungen.
Keywords:	Daten,Datenschutz,Persönlichkeitsrechte,Leistungskontrolle,Verhaltenskontrolle,Overfitting,Underfitting,Big Data,Black Box,Data Mining,HR Analytics
Date:	2020
URL:	https://d.repec.org/n?u=RePEc:zbw:imumbp:33

Firms going digital: Tapping into the potential of data for innovation

By:	David Gierten; Steffen Viete; Raphaela Andres; Thomas Niebel
Abstract:	This paper aims to help policy makers understand and improve the conditions for firms to thrive in an increasingly digital economy where data has become an important resource for innovation. The paper: 1) analyses trends in the adoption of information and communication technologies and activities that enable firms to collect, store and use data, including big data analysis (BDA); 2) provides new evidence from micro-econometric analysis of firms’ BDA and innovation in products, processes, marketing and organisation, considering different types of data used for BDA; 3) examines business models of firms that successfully innovate with data; and 4) discusses policies that can help improve the conditions for all firms to go digital and tap into the potential of data for innovation.
Date:	2021–12–20
URL:	https://d.repec.org/n?u=RePEc:oec:stiaab:320-en

Using Text Analysis to Gauge the Reasons for Respondents' Assessment in the Economy Watchers Survey

By:	Tomoaki Mikami (Bank of Japan); Hiroaki Yamagata (Bank of Japan); Jouchi Nakajima (Bank of Japan)
Abstract:	The Economy Watchers Survey released monthly by the Cabinet Office provides not only the headline diffusion index of the economic assessment of survey respondents (so-called "economy watchers") but also textual data from respondents' comments giving reasons for their assessment. Employing such data, this article presents an example of the use of text analysis, which has attracted increasing attention in recent years. Following Tsuruga and Okazaki (2017) and Otaka and Kan (2018), we construct co-occurrence network diagrams to explore what issues economy watchers focus on. The co-occurrence network diagrams drawn using data for mid-2021 show that economy watchers mainly focused on the State of Emergency and business restrictions related to COVID-19, developments in the vaccination process, and the shortage of semiconductors for automobile production. Our analysis shows that textual data are useful for an assessment of the economy; it is important to make efforts to improve text analysis methods.
Keywords:	Big data; Text analysis; Economy Watchers Survey; Co-occurrence network diagram
Date:	2021–12–20
URL:	https://d.repec.org/n?u=RePEc:boj:bojlab:lab21e02

Structured Additive Regression and Tree Boosting

By:	Michael Mayer (Schweizerische Mobiliar Versicherungsgesellschaft); Steven C. Bourassa (Florida Atlantic University); Martin Hoesli (University of Geneva - Geneva School of Economics and Management (GSEM); Swiss Finance Institute; University of Aberdeen - Business School); Donato Scognamiglio (IAZI AG and University of Bern)
Abstract:	Structured additive regression (STAR) models are a rich class of regression models that include the generalized linear model (GLM) and the generalized additive model (GAM). STAR models can be fitted by Bayesian approaches, component-wise gradient boosting, penalized least-squares, and deep learning. Using feature interaction constraints, we show that such models can be implemented also by the gradient boosting powerhouses XGBoost and LightGBM, thereby benefiting from their excellent predictive capabilities. Furthermore, we show how STAR models can be used for supervised dimension reduction and explain under what circumstances covariate effects of such models can be described in a transparent way. We illustrate the methodology with case studies pertaining to house price modeling, with very encouraging results regarding both interpretability and predictive performance.
Keywords:	machine learning, structured additive regression, gradient boosting, interpretability, transparency
JEL:	C13 C21 C45 C51 C52 C55 R31
Date:	2021–09
URL:	https://d.repec.org/n?u=RePEc:chf:rpseri:rp2183

Exploration of machine learning algorithms for maritime risk applications

By:	Knapp, S.; van de Velden, M.
Abstract:	To manage and pre-empt incident risks effectively by maritime stakeholders, predicted incident probabilities at ship level have different application aspects such as enhanced targeting for ship inspections, improved domain awareness and improving risk exposure assessments for strategic planning and asset allocations to manage risk exposure. Using a unique and comprehensive global dataset from 2014 to 2020 of 1.2 million observations, this study explores 144 model variants from the field of machine learning (18 random forest variants for 8 incident endpoints of interest) with the aim to enhance prediction capabilities to be used in maritime applications. An additional point of interest is to determine and highlight the relative importance of over 500 evaluated covariates. The results differ for each endpoint of interest and confirm that random forest methods improve prediction capabilities, based on a full year of out of sample evaluation. Targeting the top 10% most risky vessels would lead to an improvement of predictions by 2.7 to 4.9 compared to random selection. Balanced random forests and random forests with balanced training variants outperform regular random forests where the end selection of the variants also depends on the aggregation type and use of probabilities in the application areas of interest. The most important covariate groups to predict incident risk are related to beneficial ownership, the safety management company, size and age of the vessel and the importance of these factors is similar across the endpoint of interest considered here
Keywords:	ship specific risk, safety quality, reducing false negative events, risk exposure estimation, machine learning, case weighting, subsampling, random forest, sampling, evaluation metrics, top decile lift, variable importance, machine learning
Date:	2021–12–13
URL:	https://d.repec.org/n?u=RePEc:ems:eureir:137081

Technology Adoption and Skills A Pilot Study of Kent SMEs

By:	Catherine Robinson; Christian Siegel; Sisi Liao
Abstract:	Does the successful deployment of digital technologies require complementary investment in skills? We conducted a pilot survey to investigate. The survey elicited information on whether the firm was adopting one of the three digital technologies of interest (AI, robotics, big data), provided in-house training, and whether they experienced any problems recruiting workers. We find evidence that new technologies require complementary skill investments and that firms deem both new technologies and training of their workforce important for productivity. While there is some heterogeneity across the type of technologies (Robotics, AI, Big Data) introduced, firms facing difficulties attracting workers with the right skills are more likely to run own training programmes. This might suggest that there is a skills gap that may be holding back productivity and economic growth. Overall, the findings from our pilot survey demonstrate firms's awareness of the need for skills to complement new technologies to realise the productivity benefits in full.
Keywords:	capital-skill complementarity; business performance; technology adoption
JEL:	J24 M53 O33
Date:	2021–12
URL:	https://d.repec.org/n?u=RePEc:ukc:ukcedp:2114

The impact of transparency policies on local flexibility markets in electrical distribution networks: A case study with artificial neural network forecasts

By:	Erik Heilmann (University of Kassel)
Abstract:	The energy transition brings various challenges of technical, economic and organizational nature. One major topic, especially in zonal electricity systems, is the organization of future congestion management. Local flexibility market (LFM) is an often discussed concept of market-based congestion management. Similar to the whole energy system, the market transparency of LFMs can influence the individual bidders' behavior. In this context, the predictability of the network status and an LFM's outcome, depending on a given transparency policy, is investigated in this paper. For this, forecast models based on artificial neural networks (ANN) are implemented on synthetical network and LFM data. Three defined transparency policies determine the amount of input data used for the models. The results suggest that the transparency policy can influence the predictability of network status and LFM outcome, but appropriate forecasts are generally feasible. Therefore, the transparency policy should not conceal information but provide a level playing field for all parties involved. The provision of semi-disaggregated data on the network area level can be suitable for bidders' decision making and reduces transaction costs.
Keywords:	Local flexibility markets, Market transparency, Transparency policy, Artificial neural network forecast
JEL:	L94 L98 Q41 Q47
Date:	2021
URL:	https://d.repec.org/n?u=RePEc:mar:magkse:202141

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the Griffith Business School of Griffith University in Australia.