nep-big 2022-08-15 papers

on Big Data

Issue of 2022‒08‒15
forty-one papers chosen by
Tom Coupé
University of Canterbury

Predicting Economic Welfare with Images on Wealth By Jeonggil Song
The value of scattered greenery in urban areas: A hedonic analysis in Japan By Yuta Kuroda; Takeru Sugasawa
Integrating Prediction and Attribution to Classify News By Nelson P. Rayl; Nitish R. Sinha
Solving barrier options under stochastic volatility using deep learning By Weilong Fu; Ali Hirsa
Machine Learning Adoption based on the TOE Framework: A Quantitative Study By Zöll, Anne; Eitle, Verena; Buxmann, Peter
Deep Partial Least Squares for Empirical Asset Pricing By Matthew F. Dixon; Nicholas G. Polson; Kemen Goicoechea
Promotheus: An End-to-End Machine Learning Framework for Optimizing Markdown in Online Fashion E-commerce By Eleanor Loh; Jalaj Khandelwal; Brian Regan; Duncan A. Little
Automation trends in Portugal: implications in productivity and employment By Marta Candeias; Nuno Boavida; António Brandão Moniz
Tracing the Trends in Consumer Preferences for Eco-labeled Food: A Text Mining and Topic Modeling Approach By Duan, Dinglin; Gao, Zhifeng; Uddin, Md Azhar; Nian, Yefan; Nguyen, Ly
PREDICTING COMPANY INNOVATIVENESS BY ANALYSING THE WEBSITE DATA OF FIRMS: A COMPARISON ACROSS DIFFERENT TYPES OF INNOVATION By Sander Sõna; Jaan Masso; Shakshi Sharma; Priit Vahter; Rajesh Sharma
Generational Differences in Automobility: Comparing America's Millennials and Gen Xers Using Gradient Boosting Decision Trees By Kailai Wang; Xize Wang
AI Watch. European landscape on the use of Artificial Intelligence by the Public Sector By TANGI Luca; VAN NOORDT Colin; COMBETTO Marco; GATTWINKEL Dietmar; PIGNATELLI Francesco
Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning By Anthony Coache; Sebastian Jaimungal; \'Alvaro Cartea
q-Learning in Continuous Time By Yanwei Jia; Xun Yu Zhou
Artificial Intelligence and the Rights of the Child: Towards an Integrated Agenda for Research and Policy By CHARISI Vasiliki; CHAUDRON Stephane; DI GIOIA Rosanna; VUORIKARI Riina; ESCOBAR PLANAS Marina; SANCHEZ MARTIN Jose Ignacio; GOMEZ GUTIERREZ Emilia
Accelerating Machine Learning Training Time for Limit Order Book Prediction By Mark Joseph Bennett
AI Watch Road to the adoption of Artificial Intelligence by the Public Sector: A Handbook for Policymakers, Public Administrations and Relevant Stakeholders By MANZONI Marina; MEDAGLIA Rony; TANGI Luca; VAN NOORDT Colin; VACCARI Lorenzino; GATTWINKEL Dietmar
County-level USDA Crop Progress and Condition data, machine learning, and commodity market surprises By Cao, An N.Q.; Gebrekidan, Bisrat Haile; Heckelei, Thomas; Robe, Michel A.
Situational awareness in big data environment: Insights from French Police decision makers By Jordan Vazquez; Cécile Godé; Jean-Fabrice Lebraty
Estimating value at risk: LSTM vs. GARCH By Weronika Ormaniec; Marcin Pitera; Sajad Safarveisi; Thorsten Schmidt
Shai-am: A Machine Learning Platform for Investment Strategies By Jonghun Kwak; Jungyu Ahn; Jinho Lee; Sungwoo Park
A bibliometric analysis on Artificial intelligence in Tourism. State of the art and future research avenues By Martina Nannelli; Francesco Capone; Luciana Lazzeretti
Social Learning and Behavioral Change When Faced with the COVID-19 Pandemic: A big data analysis By OTA Rui; ITO Arata; SATO Masahiro; YANO Makoto
What constitutes a machine-learning-driven business model? A taxonomy of B2B start-ups with machine learning at their core By Vetter, Oliver A.; Hoffmann, Felix; Pumplun, Luisa; Buxmann, Peter
A Random Forest approach of the Evolution of Inequality of Opportunity in Mexico By Thibaut Plassot; Isidro Soloaga; Pedro J. Torres
Facial Emotion Expressions in Human–Robot Interaction: A Survey By Rawal, Niyati; Stock-Homburg, Ruth
Multipurpose synthetic population for policy applications By HRADEC Jiri; CRAGLIA Massimo; DI LEO Margherita; DE NIGRIS Sarah; OSTLAENDER Nicole; NICHOLSON Nicholas
Dynamic Early Warning and Action Model By Mueller, H.; Rauh, C.; Ruggieri, A.
ETF Portfolio Construction via Neural Network trained on Financial Statement Data By Jinho Lee; Sungwoo Park; Jungyu Ahn; Jonghun Kwak
Using Machine Learning to Test the Consistency of Food Insecurity Measures By Aveiga, Alexis H. Villacis; Badruddoza, Syed; Mayorga, Joaquin; Mishra, Ashok K.
Learning Underspecified Models By In-Koo Cho; Jonathan Libgober
On data-driven chance constraint learning for mixed-integer optimization problems By Alcantara Mata, Antonio; Ruiz Mora, Carlos
Deep Learning for Systemic Risk Measures By Yichen Feng; Ming Min; Jean-Pierre Fouque
Regional Convergence in Bangladesh using Night Lights By Syed Abul, Basher; Salim, Rashid; Mohammad Riad, Uddin
Baseline validation of a bias-mitigated loan screening model based on the European Banking Authority's trust elements of Big Data & Advanced Analytics applications using Artificial Intelligence By Alessandro Danovi; Marzio Roma; Davide Meloni; Stefano Olgiati; Fernando Metelli
Identify Arbitrage Using Machine Learning on Multi-stock Pair Trading Price Forecasting By Zhijie Zhang
Market for Artificial Intelligence in Health Care and Compensation for Medical Errors By Chopard, Bertrand; Musy, Olivier
Assessing and Comparing Fixed-Target Forecasts of Arctic Sea Ice: Glide Charts for Feature-Engineered Linear Regression and Machine Learning Models By Francis X. Diebold; Maximilian Goebel; Philippe Goulet Coulombe
DDPG based on multi-scale strokes for financial time series trading strategy By Jun-Cheng Chen; Cong-Xiao Chen; Li-Juan Duan; Zhi Cai
Evaluating a new earnings indicator. Can we improve the timeliness of existing statistics on earnings by using salary information from online job adverts? By Jyldyz Djumalieva; Stef Garasto; Cath Sleeman
Generative Adversarial Networks Applied to Synthetic Financial Scenarios Generation By Christophe Geissler; Nicolas Morizet; Matteo Rizzato; Julien Wallart

Predicting Economic Welfare with Images on Wealth

By:	Jeonggil Song
Abstract:	Using images containing information on wealth, this research investigates that pictures are capable of reliably predicting the economic prosperity of households. Without surveys on wealth-related information and human-made standard of wealth quality that the traditional wealth-based approach relied on, this novel approach makes use of only images posted on Dollar Street as input data on household wealth across 66 countries and predicts the consumption or income level of each household using the Convolutional Neural Network (CNN) method. The best result predicts the log of consumption level with root mean squared error of 0.66 and R-squared of 0.80 in CNN regression problem. In addition, this simple model also performs well in classifying extreme poverty with an accuracy of 0.87 and F-beta score of 0.86. Since the model shows a higher performance in the extreme poverty classification when I applied the different threshold of poverty lines to countries by their income group, it is suggested that the decision of the World Bank to define poverty lines differently by income group was valid.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.14810&r=

The value of scattered greenery in urban areas: A hedonic analysis in Japan

By:	Yuta Kuroda; Takeru Sugasawa
Abstract:	This study investigates the impact of scattered greenery (street trees and yard bushes), rather than cohesive greenery (parks and forests), on housing prices. We identify urban greenspace from high-resolution satellite images and combine these data with data on both sales and rentals of condominiums to estimate hedonic pricing models. We find that scattered urban greenery within 100 meters significantly increases housing prices, while more distant scattered greenery does not. Scattered greenery is highly valued near highways but is less valued near the central business district (CBD). Additionally, the prices of inexpensive and small for-sale and of for-rent properties are less affected by scattered greenery. These results indicate that there is significant heterogeneity in urban greenery preferences by property characteristics and location. This heterogeneity in preferences for greenery could lead to environmental gentrification since the number of more expensive properties increases in areas with more green amenities.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:toh:dssraa:128&r=

Integrating Prediction and Attribution to Classify News

By:	Nelson P. Rayl; Nitish R. Sinha
Abstract:	Recent modeling developments have created tradeoffs between attribution-based models, models that rely on causal relationships, and â€œpure prediction modelsâ€ such as neural networks. While forecasters have historically favored one technology or the other based on comfort or loyalty to a particular paradigm, in domains with many observations and predictors such as textual analysis, the tradeoffs between attribution and prediction have become too large to ignore. We document these tradeoffs in the context of relabeling 27 million Thomson Reuters news articles published between 1996 and 2021 as debt-related or non-debt related. Articles in our dataset were labeled by journalists at the time of publication, but these labels may be inconsistent as labeling standards and the relation between text and label has changed over time. We propose a method for identifying and correcting inconsistent labeling that combines attribution and pure prediction methods and is applicable to any domain with human-labeled data. Implementing our proposed labeling solution returns a debt-related news dataset with 54% more observations than if the original journalist labels had been used and 31% more observation than if our solution had been implemented using attribution-based methods only.
Keywords:	News; Text Analysis; Debt; Labeling; Supervised Learning; DMR
JEL:	C40 C45 C55
Date:	2022–07–01
URL:	http://d.repec.org/n?u=RePEc:fip:fedgfe:2022-42&r=

Solving barrier options under stochastic volatility using deep learning

By:	Weilong Fu; Ali Hirsa
Abstract:	We develop an unsupervised deep learning method to solve the barrier options under the Bergomi model. The neural networks serve as the approximate option surfaces and are trained to satisfy the PDE as well as the boundary conditions. Two singular terms are added to the neural networks to deal with the non-smooth and discontinuous payoff at the strike and barrier levels so that the neural networks can replicate the asymptotic behaviors of barrier options at short maturities. After that, vanilla options and barrier options are priced in a single framework. Also, neural networks are employed to deal with the high dimensionality of the function input in the Bergomi model. Once trained, the neural network solution yields fast and accurate option values.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.00524&r=

Machine Learning Adoption based on the TOE Framework: A Quantitative Study

By:	Zöll, Anne; Eitle, Verena; Buxmann, Peter
Abstract:	The increasing use of machine learning (ML) in businesses is ubiquitous in research and in practice. Even though ML has become one of the key technologies in recent years, organizations have difficulties adopting ML applications. Implementing ML is a challenging task for organizations due to its new programming paradigm and the significant organizational changes. In order to increase the adoption rate of ML, our study seeks to examine which generic and specific factors of the technological-organizational-environmental (TOE) framework leverage ML adoption. We validate the impact of these factors on ML adoption through a quantitative research design. Our study contributes to research by extending the TOE framework by adding ML specifications and demonstrating a moderator effect of firm size on the relationship between technology competence and ML adoption.
Date:	2022–07–07
URL:	http://d.repec.org/n?u=RePEc:dar:wpaper:133079&r=

Deep Partial Least Squares for Empirical Asset Pricing

By:	Matthew F. Dixon; Nicholas G. Polson; Kemen Goicoechea
Abstract:	We use deep partial least squares (DPLS) to estimate an asset pricing model for individual stock returns that exploits conditioning information in a flexible and dynamic way while attributing excess returns to a small set of statistical risk factors. The novel contribution is to resolve the non-linear factor structure, thus advancing the current paradigm of deep learning in empirical asset pricing which uses linear stochastic discount factors under an assumption of Gaussian asset returns and factors. This non-linear factor structure is extracted by using projected least squares to jointly project firm characteristics and asset returns on to a subspace of latent factors and using deep learning to learn the non-linear map from the factor loadings to the asset returns. The result of capturing this non-linear risk factor structure is to characterize anomalies in asset returns by both linear risk factor exposure and interaction effects. Thus the well known ability of deep learning to capture outliers, shed lights on the role of convexity and higher order terms in the latent factor structure on the factor risk premia. On the empirical side, we implement our DPLS factor models and exhibit superior performance to LASSO and plain vanilla deep learning models. Furthermore, our network training times are significantly reduced due to the more parsimonious architecture of DPLS. Specifically, using 3290 assets in the Russell 1000 index over a period of December 1989 to January 2018, we assess our DPLS factor model and generate information ratios that are approximately 1.2x greater than deep learning. DPLS explains variation and pricing errors and identifies the most prominent latent factors and firm characteristics.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.10014&r=

Promotheus: An End-to-End Machine Learning Framework for Optimizing Markdown in Online Fashion E-commerce

By:	Eleanor Loh; Jalaj Khandelwal; Brian Regan; Duncan A. Little
Abstract:	Managing discount promotional events ("markdown") is a significant part of running an e-commerce business, and inefficiencies here can significantly hamper a retailer's profitability. Traditional approaches for tackling this problem rely heavily on price elasticity modelling. However, the partial information nature of price elasticity modelling, together with the non-negotiable responsibility for protecting profitability, mean that machine learning practitioners must often go through great lengths to define strategies for measuring offline model quality. In the face of this, many retailers fall back on rule-based methods, thus forgoing significant gains in profitability that can be captured by machine learning. In this paper, we introduce two novel end-to-end markdown management systems for optimising markdown at different stages of a retailer's journey. The first system, "Ithax", enacts a rational supply-side pricing strategy without demand estimation, and can be usefully deployed as a "cold start" solution to collect markdown data while maintaining revenue control. The second system, "Promotheus", presents a full framework for markdown optimization with price elasticity. We describe in detail the specific modelling and validation procedures that, within our experience, have been crucial to building a system that performs robustly in the real world. Both markdown systems achieve superior profitability compared to decisions made by our experienced operations teams in a controlled online test, with improvements of 86% (Promotheus) and 79% (Ithax) relative to manual strategies. These systems have been deployed to manage markdown at ASOS.com, and both systems can be fruitfully deployed for price optimization across a wide variety of retail e-commerce settings.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.01137&r=

Automation trends in Portugal: implications in productivity and employment

By:	Marta Candeias (Universidade Nova de Lisboa); Nuno Boavida (Universidade Nova de Lisboa); António Brandão Moniz (Universidade Nova de Lisboa)
Abstract:	Recent developments in automation and Artificial Intelligence (AI) are leading to a wave of innovation in organizational design and changes in the workplace. Techno-optimists even named it the â€˜second machine ageâ€™, arguing that it now involves the substitution of the human brain. Other authors see this as just a continuation of previous ICT developments. Potentially, automation and AI can have significant technical, economic, and social implications in firms. The paper will answer the question: what are the implications on industrial productivity and employment in the automotive sector with the recent automation trends, including AI, in Portugal? Our approach used mixed methods to conduct statistical analyses of relevant databases and interviews with experts on R&D projects related to automation and AI implementation. Results suggest that automation can have widespread adoption in the short term in the automotive sector, but AI technologies will take more time to be adopted. Findings show that adoption of automation and AI increases productivity in firms and is dephased in time with employment implications. Investments in automation are not substituting operators but rather changing work organization. Thus, negative effects about technology and unemployment were not substantiated by our results.
Keywords:	Artificial Intelligence; automation; productivity; employment; automotive industry
JEL:	L23 L62 O30 O32 O33
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:mde:wpaper:0165&r=

Tracing the Trends in Consumer Preferences for Eco-labeled Food: A Text Mining and Topic Modeling Approach

By:	Duan, Dinglin; Gao, Zhifeng; Uddin, Md Azhar; Nian, Yefan; Nguyen, Ly
Keywords:	Marketing, Agribusiness, Resource/Energy Economics and Policy
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:ags:aaea22:322419&r=

PREDICTING COMPANY INNOVATIVENESS BY ANALYSING THE WEBSITE DATA OF FIRMS: A COMPARISON ACROSS DIFFERENT TYPES OF INNOVATION

By:	Sander Sõna; Jaan Masso; Shakshi Sharma; Priit Vahter; Rajesh Sharma
Abstract:	This paper investigates which of the core types of innovation can be best predicted based on the website data of firms. In particular, we focus on four distinct key standard types of innovation â€“ product, process, organisational, and marketing innovation in firms. Web-mining of textual data on the websites of firms from Estonia combined with the application of artificial intelligence (AI) methods turned out to be a suitable approach to predict firm-level innovation indicators. The key novel addition to the existing literature is the finding that web-mining is more applicable to predicting marketing innovation than predicting the other three core types of innovation. As AI based models are often black-box in nature, for transparency, we use an explainable AI approach (SHAP - SHapley Additive exPlanations), where we look at the most important words predicting a particular type of innovation. Our models confirm that the marketing innovation indicator from survey data was clearly related to marketing-related terms on the firms' websites. In contrast, the results on the relevant words on websites for other innovation indicators were much less clear. Our analysis concludes that the effectiveness of web-scraping and web-text-based AI approaches in predicting cost-effective, granular and timely firm-level innovation indicators varies according to the type of innovation considered.
Keywords:	Innovation, Marketing Innovation, Community Innovation Survey (CIS), Machine learning, Neural network, Explainable AI, SHAP
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:mtk:febawb:143&r=

Generational Differences in Automobility: Comparing America's Millennials and Gen Xers Using Gradient Boosting Decision Trees

By:	Kailai Wang (University of Houston); Xize Wang (National University of Singapore)
Abstract:	Whether the Millennials are less auto-centric than the previous generations has been widely discussed in the literature. Most existing studies use regression models and assume that all factors are linear-additive in contributing to the young adults' driving behaviors. This study relaxes this assumption by applying a non-parametric statistical learning method, namely the gradient boosting decision trees (GBDT). Using U.S. nationwide travel surveys for 2001 and 2017, this study examines the non-linear dose-response effects of lifecycle, socio-demographic and residential factors on daily driving distances of Millennial and Gen-X young adults. Holding all other factors constant, Millennial young adults had shorter predicted daily driving distances than their Gen-X counterparts. Besides, residential and economic factors explain around 50% of young adults' daily driving distances, while the collective contributions for life course events and demographics are about 33%. This study also identifies the density ranges for formulating effective land use policies aiming at reducing automobile travel demand.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.11056&r=

AI Watch. European landscape on the use of Artificial Intelligence by the Public Sector

By:	TANGI Luca (European Commission - JRC); VAN NOORDT Colin; COMBETTO Marco (European Commission - JRC); GATTWINKEL Dietmar; PIGNATELLI Francesco (European Commission - JRC)
Abstract:	This report provides the result of the second landscaping study conducted in the context of AI Watch, the European Commission knowledge service to monitor the development, uptake and impact of Artificial Intelligence in Europe. The report presents the results of the mapping of the use of AI in public services. The findings are based on three pillars: (i) an analysis of national strategies by European Member States on AI that focuses on how these strategies describe policy actions to address AI development in the public sector, (ii) an inventory of AI use cases in the public sector to provide an overview of the status of AI implementation in Europe, and (iii) in-depth case studies which describe in detail the factors and consequences crucial for the responsible development and adoption of AI. The findings highlight that the use of AI by public administrations is growing. AI technologies could significantly improve the effectiveness and efficiency of public administrations. However, the diffusion of AI remains unequal and barriers to AI adoption require significant considerations by policymakers to tackle. In particular, ensuring the right balance between public and private sector expertise and capacity, ensuring strong collaboration, enhancing data governance and risk mitigation are among the main ways to advance. These results contribute to the existing body of knowledge on the topic by moving from a more theoretical and anecdotal view to a more systematic analysis, based on a large number of concrete examples.
Keywords:	Artificial Intelligence
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc129301&r=

Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning

By:	Anthony Coache; Sebastian Jaimungal; \'Alvaro Cartea
Abstract:	We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.14666&r=

q-Learning in Continuous Time

By:	Yanwei Jia; Xun Yu Zhou
Abstract:	We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020) As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term "(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a "q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2021). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2021) and time-discretized conventional Q-learning algorithms.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.00713&r=

Artificial Intelligence and the Rights of the Child: Towards an Integrated Agenda for Research and Policy

By:	CHARISI Vasiliki (European Commission - JRC); CHAUDRON Stephane (European Commission - JRC); DI GIOIA Rosanna (European Commission - JRC); VUORIKARI Riina (European Commission - JRC); ESCOBAR PLANAS Marina (European Commission - JRC); SANCHEZ MARTIN Jose Ignacio (European Commission - JRC); GOMEZ GUTIERREZ Emilia (European Commission - JRC)
Abstract:	This report proposes a set of science-for-policy future directions for AI and child’s rights. It connects research and policy to gain insights by the interplay among different stakeholders and to go beyond the identification of ethical guidelines towards methods for practical future implementations. For the formulation of the proposed directions, we considered the current relevant policy initiatives by major international organizations and the recent coordinated actions on AI by the European Commission as well as the state-of-the art of the scientific work on AI-based technologies for children with a focus on three applications, conversational agents, recommender systems and robotic systems. In addition, we took into consideration the results of two workshops with young people and three workshops with experts and policymakers that contributed to the formulation of a set of requirements, methods and knowledge gaps as an integrated agenda for research and policy on AI and the rights of the child.
Keywords:	Child's rights, Artificial intelligence, science for policy, education, recommender systems, social robots, conversational agents, trustworthy AI, ethics
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc127564&r=

Accelerating Machine Learning Training Time for Limit Order Book Prediction

By:	Mark Joseph Bennett
Abstract:	Financial firms are interested in simulation to discover whether a given algorithm involving financial machine learning will operate profitably. While many versions of this type of algorithm have been published recently by researchers, the focus herein is on a particular machine learning training project due to the explainable nature and the availability of high frequency market data. For this task, hardware acceleration is expected to speed up the time required for the financial machine learning researcher to obtain the results. As the majority of the time can be spent in classifier training, there is interest in faster training steps. A published Limit Order Book algorithm for predicting stock market direction is our subject, and the machine learning training process can be time-intensive especially when considering the iterative nature of model development. To remedy this, we deploy Graphical Processing Units (GPUs) produced by NVIDIA available in the data center where the computer architecture is geared to parallel high-speed arithmetic operations. In the studied configuration, this leads to significantly faster training time allowing more efficient and extensive model development.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.09041&r=

AI Watch Road to the adoption of Artificial Intelligence by the Public Sector: A Handbook for Policymakers, Public Administrations and Relevant Stakeholders

By:	MANZONI Marina (European Commission - JRC); MEDAGLIA Rony; TANGI Luca (European Commission - JRC); VAN NOORDT Colin; VACCARI Lorenzino; GATTWINKEL Dietmar
Abstract:	This handbook is developed in the context of Artificial Intelligence (AI) Watch, as the first endeavor at European level outlining actionable guidelines to promote the adoption of safe, lawful, inclusive and trustworthy AI by public sector administrations in the EU. The purpose of this handbook is threefold: i) Present an updated state of play of AI approaches applied by the public sector in Europe, including encountered benefits and criticalities; ii) Identify key common issues to be addressed by the relevant stakeholders both, at policy and operational levels, as well as at different governance levels, going from international organizations, to national, regional and local administration levels; iii) Provide policy makers and interest operational parties and practitioners with a set of recommendations to address identified areas of intervention to promote the adoption of AI by the public sector in the Europe. The recommendations and actions included in this handbook build upon a two-years analysis of public sector national strategies and approaches throughout Europe, and draws on iterative feedback from stakeholders’ representatives. This handbook intends to act as a multi-level and multidimensional actionable plan by providing 16 recommendations clustered in four areas of intervention, accompanied by over fifty actions set out to foster the adoption of citizen-centric AI in the public sector, at different operational levels, as a safe and trustworthy driver to achieve common goals in Europe.
Keywords:	Artificial Intelligence, Digital Transformation, Public Sector, Public Administrations, AI based solutions, AI National strategies and approaches, AI use, AI Impact
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc129100&r=

County-level USDA Crop Progress and Condition data, machine learning, and commodity market surprises

By:	Cao, An N.Q.; Gebrekidan, Bisrat Haile; Heckelei, Thomas; Robe, Michel A.
Keywords:	Agricultural and Food Policy, Agricultural Finance, Research Methods/Statistical Methods
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:ags:aaea22:322281&r=

Situational awareness in big data environment: Insights from French Police decision makers

By:	Jordan Vazquez; Cécile Godé (CRET-LOG - Centre de Recherche sur le Transport et la Logistique - AMU - Aix Marseille Université); Jean-Fabrice Lebraty (Laboratoire de Recherche Magellan - UJML - Université Jean Moulin - Lyon 3 - Université de Lyon - Institut d'Administration des Entreprises (IAE) - Lyon)
Date:	2022–06–30
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-03678829&r=

Estimating value at risk: LSTM vs. GARCH

By:	Weronika Ormaniec; Marcin Pitera; Sajad Safarveisi; Thorsten Schmidt
Abstract:	Estimating value-at-risk on time series data with possibly heteroscedastic dynamics is a highly challenging task. Typically, we face a small data problem in combination with a high degree of non-linearity, causing difficulties for both classical and machine-learning estimation algorithms. In this paper, we propose a novel value-at-risk estimator using a long short-term memory (LSTM) neural network and compare its performance to benchmark GARCH estimators. Our results indicate that even for a relatively short time series, the LSTM could be used to refine or monitor risk estimation processes and correctly identify the underlying risk dynamics in a non-parametric fashion. We evaluate the estimator on both simulated and market data with a focus on heteroscedasticity, finding that LSTM exhibits a similar performance to GARCH estimators on simulated data, whereas on real market data it is more sensitive towards increasing or decreasing volatility and outperforms all existing estimators of value-at-risk in terms of exception rate and mean quantile score.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.10539&r=

Shai-am: A Machine Learning Platform for Investment Strategies

By:	Jonghun Kwak; Jungyu Ahn; Jinho Lee; Sungwoo Park
Abstract:	The finance industry has adopted machine learning (ML) as a form of quantitative research to support better investment decisions, yet there are several challenges often overlooked in practice. (1) ML code tends to be unstructured and ad hoc, which hinders cooperation with others. (2) Resource requirements and dependencies vary depending on which algorithm is used, so a flexible and scalable system is needed. (3) It is difficult for domain experts in traditional finance to apply their experience and knowledge in ML-based strategies unless they acquire expertise in recent technologies. This paper presents Shai-am, an ML platform integrated with our own Python framework. The platform leverages existing modern open-source technologies, managing containerized pipelines for ML-based strategies with unified interfaces to solve the aforementioned issues. Each strategy implements the interface defined in the core framework. The framework is designed to enhance reusability and readability, facilitating collaborative work in quantitative research. Shai-am aims to be a pure AI asset manager for solving various tasks in financial markets.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.00436&r=

A bibliometric analysis on Artificial intelligence in Tourism. State of the art and future research avenues

By:	Martina Nannelli (University of Florence); Francesco Capone (University of Florence); Luciana Lazzeretti (University of Florence)
Abstract:	The tourism industry has undergone a deep transformation driven by the information and communication technologies (ICTs) development that has taken hold in recent years thanks to innovations in Artificial Intelligence (AI) and its tools. Although a growing literature is being produced on this topic, scientific research on AI and tourism is still fuzzy and fragmented. The aim of this work is to explore the current state of art and possible future developments of Artificial Intelligence and its tools in the field of tourism. Different methodologies were combined to achieve this objective. The study develops first a bibliometric analysis using the ISI web database and applies then social network analysis (SNA) to identify the main authors and to explore the intellectual structure of this field through keyword co-occurrence, then a qualitative literature review has been developed to investigate the main research themes, applications and developments. The findings confirm the industry's direction towards digitization and robotization of services and identify some of the main research strands: the use of Big Data (BD) for demand forecasting and customer satisfaction measurement; Augmented Reality (AR) and Virtual Reality (VR) experience for value co-creation processes; the Covid-19 pandemic, healthcare and social distances issues and service robots; and finally, the smart tourism trends.
Keywords:	artificial intelligence; tourism industry; bibliometric analysis; social network analysis; literature review.
JEL:	L83 Z30 L86
Date:	2022
URL:	http://d.repec.org/n?u=RePEc:frz:wpmmos:wp2022_03.rdf&r=

Social Learning and Behavioral Change When Faced with the COVID-19 Pandemic: A big data analysis

By:	OTA Rui; ITO Arata; SATO Masahiro; YANO Makoto
Abstract:	At the beginning of the COVID-19 outbreak, knowledge about the disease and its prevention was scarce. For example, there was no scientific evidence that masks could prevent the disease. However, masks were rapidly purchased in large quantities in Japan, resulting in a severe shortage after late January 2020. The purpose of this paper is to clarify what factors caused this change in people's behavior toward infection prevention. To this end, we employ high-resolution consumer panel data and newspaper articles nationally or locally published in Japan to empirically analyze the impact of consumers' information reception on their mask purchasing behavior. Logistic regression results demonstrate that the cumulative number of articles was significantly related to the frequency of mask purchases with respect to any period of the first wave of infections. We found that early information in a pandemic is important and that learning from public information, or social learning, can significantly induce behavioral change.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:22065&r=

What constitutes a machine-learning-driven business model? A taxonomy of B2B start-ups with machine learning at their core

By: Vetter, Oliver A.; Hoffmann, Felix; Pumplun, Luisa; Buxmann, Peter

Date: 2022–06

URL: http://d.repec.org/n?u=RePEc:dar:wpaper:133080&r=

A Random Forest approach of the Evolution of Inequality of Opportunity in Mexico

By:	Thibaut Plassot (Universidad Iberoamericana, Mexico City: Department of Economics); Isidro Soloaga (Universidad Iberoamericana, Mexico City: Department of Economics); Pedro J. Torres (Universidad Iberoamericana, Mexico City: Department of Economics)
Abstract:	This work presents the trend of Inequality of Opportunity (IOp) and total inequality in wealth in Mexico for the years 2006, 2011 and 2017, and provides estimations using both an ex-ante and ex-post compensation criterion. We resort on a data-driven approach using supervised machine learning models to run regression trees and random forests that consider individualsâ€™ circumstances and effort. We find an intensification of both total inequality and IOp between 2006 and 2011, as well as a reduction of these between 2011 and 2017, being absolute IOp slightly higher in 2017 than in 2006. From an ex-ante perspective, the share of IOp within total inequality slightly decreased although using an ex-post perspective the share remains stable across time. The most important variable in determining IOp is householdÂ´s wealth at age 14, followed by both, fatherÂ´s and motherÂ´s education. Other variables such as the ability of the parents to speak an indigenous language proved to have had a lower impact over time.
Keywords:	Inequality Of Opportunity, Mexico, Shapley Decomposition, Random Forests
JEL:	C14 C81 D31 D63
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2022-614&r=

Facial Emotion Expressions in Human–Robot Interaction: A Survey

By:	Rawal, Niyati; Stock-Homburg, Ruth
Abstract:	Facial expressions are an ideal means of communicating one’s emotions or intentions to others. This overview will focus on human facial expression recognition as well as robotic facial expression generation. In the case of human facial expression recognition, both facial expression recognition on predefined datasets as well as in real-time will be covered. For robotic facial expression generation, hand-coded and automated methods i.e., facial expressions of a robot are generated by moving the features (eyes, mouth) of the robot by hand-coding or automatically using machine learning techniques, will also be covered. There are already plenty of studies that achieve high accuracy for emotion expression recognition on predefined datasets, but the accuracy for facial expression recognition in real-time is comparatively lower. In the case of expression generation in robots, while most of the robots are capable of making basic facial expressions, there are not many studies that enable robots to do so automatically. In this overview, state-of-the-art research in facial emotion expressions during human–robot interaction has been discussed leading to several possible directions for future research.
Date:	2022–06–24
URL:	http://d.repec.org/n?u=RePEc:dar:wpaper:133073&r=

Multipurpose synthetic population for policy applications

By:	HRADEC Jiri (European Commission - JRC); CRAGLIA Massimo (European Commission - JRC); DI LEO Margherita (European Commission - JRC); DE NIGRIS Sarah (European Commission - JRC); OSTLAENDER Nicole (European Commission - JRC); NICHOLSON Nicholas (European Commission - JRC)
Abstract:	While privacy preservation is a major topic today, until recently, striking the balance between usefulness and detail in data was achieved by aggregation on linear scale. New methods for handling analytics however allow to close this gap and to preserve both privacy and knowledge. Compared to other privacy-preservation techniques, synthetic data can have the best value/effort performance. Synthetic population models facilitate application of novel methods for data-driven policy formulation and evaluation, representing a unique opportunity. This report showcases several applications of structured population such as population activity-based modelling, knock-on effects of selective lock-downs during the COVID-19 pandemic, investigative analysis of existing policy instrument design in the energy transition domain, and applications for synthetic cancer patient records. The text carefully weighs pros and cons of synthetic data in these policy applications to provide actionable insights for decision makers on opportunities and reliability of advice based on synthetic data. Such data can become unifying bridge between policy support computational models, provide data hidden in silos, and become the key enabler of artificial intelligence in business and policy applications in Europe. Synthetic data have potential help controlling unevenness and bias in algorithmic governance and enable better targeted policies with small regulatory footprint.
Keywords:	synthetic data, synthetic populations, data for artificial intelligence, new polices, integration, big data, activity-based modelling
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:ipt:iptwpa:jrc128595&r=

Dynamic Early Warning and Action Model

By:	Mueller, H.; Rauh, C.; Ruggieri, A.
Abstract:	This document presents the outcome of two modules developed for the UK Foreign, Commonwealth Development Office (FCDO): 1) a forecast model which uses machine learning and text downloads to predict outbreaks and intensity of internal armed conflict. 2) A decision making module that embeds these forecasts into a model of preventing armed conflict damages. The outcome is a quantitative benchmark which should provide a testing ground for internal FCDO debates on both strategic levels (i.e. the process of deciding on country priorities) and operational levels (i.e. identifying critical periods by the country experts). Our method allows the FCDO to simulate policy interventions and changes in its strategic focus. We show, for example, that the FCDO should remain engaged in recently stabilized armed conflicts and re-think its development focus in countries with the highest risks. The total expected economic benefit of reinforced preventive efforts, as defined in this report, would bring monthly savings in expected costs of 26 billion USD with a monthly gain to the UK of 630 million USD.
Keywords:	dynamic optimisation, forecasting, internal armed conflict, prevention
Date:	2022–06–14
URL:	http://d.repec.org/n?u=RePEc:cam:camjip:2213&r=

ETF Portfolio Construction via Neural Network trained on Financial Statement Data

By:	Jinho Lee; Sungwoo Park; Jungyu Ahn; Jonghun Kwak
Abstract:	Recently, the application of advanced machine learning methods for asset management has become one of the most intriguing topics. Unfortunately, the application of these methods, such as deep neural networks, is difficult due to the data shortage problem. To address this issue, we propose a novel approach using neural networks to construct a portfolio of exchange traded funds (ETFs) based on the financial statement data of their components. Although a number of ETFs and ETF-managed portfolios have emerged in the past few decades, the ability to apply neural networks to manage ETF portfolios is limited since the number and historical existence of ETFs are relatively smaller and shorter, respectively, than those of individual stocks. Therefore, we use the data of individual stocks to train our neural networks to predict the future performance of individual stocks and use these predictions and the portfolio deposit file (PDF) to construct a portfolio of ETFs. Multiple experiments have been performed, and we have found that our proposed method outperforms the baselines. We believe that our approach can be more beneficial when managing recently listed ETFs, such as thematic ETFs, of which there is relatively limited historical data for training advanced machine learning methods.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.01187&r=

Using Machine Learning to Test the Consistency of Food Insecurity Measures

By:	Aveiga, Alexis H. Villacis; Badruddoza, Syed; Mayorga, Joaquin; Mishra, Ashok K.
Keywords:	Food Consumption/Nutrition/Food Safety, International Development, Production Economics
Date:	2022–08
URL:	http://d.repec.org/n?u=RePEc:ags:aaea22:322472&r=

Learning Underspecified Models

By:	In-Koo Cho; Jonathan Libgober
Abstract:	This paper examines whether one can learn to play an optimal action while only knowing part of true specification of the environment. We choose the optimal pricing problem as our laboratory, where the monopolist is endowed with an underspecified model of the market demand, but can observe market outcomes. In contrast to conventional learning models where the model specification is complete and exogenously fixed, the monopolist has to learn the specification and the parameters of the demand curve from the data. We formulate the learning dynamics as an algorithm that forecast the optimal price based on the data, following the machine learning literature (Shalev-Shwartz and Ben-David (2014)). Inspired by PAC learnability, we develop a new notion of learnability by requiring that the algorithm must produce an accurate forecast with a reasonable amount of data uniformly over the class of models consistent with the part of the true specification. In addition, we assume that the monopolist has a lexicographic preference over the payoff and the complexity cost of the algorithm, seeking an algorithm with a minimum number of parameters subject to PAC-guaranteeing the optimal solution (Rubinstein (1986)). We show that for the set of demand curves with strictly decreasing uniformly Lipschitz continuous marginal revenue curve, the optimal algorithm recursively estimates the slope and the intercept of the linear demand curve, even if the actual demand curve is not linear. The monopolist chooses a misspecified model to save computational cost, while learning the true optimal decision uniformly over the set of underspecified demand curves.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.10140&r=

On data-driven chance constraint learning for mixed-integer optimization problems

By:	Alcantara Mata, Antonio; Ruiz Mora, Carlos
Abstract:	When dealing with real-world optimization problems, decision-makers usually face high levels of uncertainty associated with partial information, unknown parameters, or complex relationships between these and the problem decision variables. In this work, we develop a novel Chance Constraint Learning (CCL) methodology with a focus on mixedinteger linear optimization problems which combines ideas from the chance constraint and constraint learning literature. Chance constraints set a probabilistic confidence level for a single or a set of constraints to be fulfilled, whereas the constraint learning methodology aims to model the functional relationship between the problem variables through predictive models. One of the main issues when establishing a learned constraint arises when we need to set further bounds for its response variable: the fulfillment of these is directly related to the accuracy of the predictive model and its probabilistic behaviour. In this sense, CCL makes use of linearizable machine learning models to estimate conditional quantiles of the learned variables, providing a data-driven solution for chance constraints. An open-access software has been developed to be used by practitioners. Furthermore, benefits from CCL have been tested in two real-world case studies, proving how robustness is added to optimal solutions when probabilistic bounds are set for learned constraints.
Keywords:	Chance Constraint; Constraint Learning; Data-Driven Optimization; Quantile Estimation; Machine Learning
Date:	2022–07–07
URL:	http://d.repec.org/n?u=RePEc:cte:wsrepe:35425&r=

Deep Learning for Systemic Risk Measures

By:	Yichen Feng; Ming Min; Jean-Pierre Fouque
Abstract:	The aim of this paper is to study a new methodological framework for systemic risk measures by applying deep learning method as a tool to compute the optimal strategy of capital allocations. Under this new framework, systemic risk measures can be interpreted as the minimal amount of cash that secures the aggregated system by allocating capital to the single institutions before aggregating the individual risks. This problem has no explicit solution except in very limited situations. Deep learning is increasingly receiving attention in financial modelings and risk management and we propose our deep learning based algorithms to solve both the primal and dual problems of the risk measures, and thus to learn the fair risk allocations. In particular, our method for the dual problem involves the training philosophy inspired by the well-known Generative Adversarial Networks (GAN) approach and a newly designed direct estimation of Radon-Nikodym derivative. We close the paper with substantial numerical studies of the subject and provide interpretations of the risk allocations associated to the systemic risk measures. In the particular case of exponential preferences, numerical experiments demonstrate excellent performance of the proposed algorithm, when compared with the optimal explicit solution as a benchmark.
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.00739&r=

Regional Convergence in Bangladesh using Night Lights

By:	Syed Abul, Basher; Salim, Rashid; Mohammad Riad, Uddin
Abstract:	We analyze economic convergence across 64 districts of Bangladesh using newly harmonized satellite night light data over 1992-2018. The growth in night lights—taken as a proxy for regional economic activity—reveals overwhelming evidence of absolute convergence. Regional differences in night light (or income) growth have been shrinking at an annual convergence rate of 4.57%, corresponding to a half-life of 15 years. Net migration plays a relatively prominent role in the regional convergence process.
Keywords:	Night lights, convergence, Bangladesh
JEL:	O47 R11
Date:	2022–06–14
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113394&r=

Baseline validation of a bias-mitigated loan screening model based on the European Banking Authority's trust elements of Big Data & Advanced Analytics applications using Artificial Intelligence

By:	Alessandro Danovi; Marzio Roma; Davide Meloni; Stefano Olgiati; Fernando Metelli
Abstract:	The goal of our 4-phase research project was to test if a machine-learning-based loan screening application (5D) could detect bad loans subject to the following constraints: a) utilize a minimal-optimal number of features unrelated to the credit history, gender, race or ethnicity of the borrower (BiMOPT features); b) comply with the European Banking Authority and EU Commission principles on trustworthy Artificial Intelligence (AI). All datasets have been anonymized and pseudoanonymized. In Phase 0 we selected a subset of 10 BiMOPT features out of a total of 84 features; in Phase I we trained 5D to detect bad loans in a historical dataset extracted from a mandatory report to the Bank of Italy consisting of 7,289 non-performing loans (NPLs) closed in the period 2010-2021; in Phase II we assessed the baseline performance of 5D on a distinct validation dataset consisting of an active portolio of 63,763 outstanding loans (performing and non-performing) for a total financed value of over EUR 11.5 billion as of December 31, 2021; in Phase III we will monitor the baseline performance for a period of 5 years (2023-27) to assess the prospective real-world bias-mitigation and performance of the 5D system and its utility in credit and fintech institutions. At baseline, 5D correctly detected 1,461 bad loans out of a total of 1,613 (Sensitivity = 0.91, Prevalence = 0.0253;, Positive Predictive Value = 0.19), and correctly classified 55,866 out of the other 62,150 exposures (Specificity = 0.90, Negative Predictive Value = 0.997). Our preliminary results support the hypothesis that Big Data & Advanced Analytics applications based on AI can mitigate bias and improve consumer protection in the loan screening process without compromising the efficacy of the credit risk assessment. Further validation is required to assess the prospective performance and utility of 5D in credit and fintech institutions.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.08938&r=

Identify Arbitrage Using Machine Learning on Multi-stock Pair Trading Price Forecasting

By:	Zhijie Zhang
Abstract:	Aims: Market neutral pair-trading strategy of two highly cointegrated stocks can be extended to a higher dimensional arbitrage algorithm. In this paper, a linear combination of multiple cointegratedstocks is introduced to overcome the limitations of a traditional one-to-one pair trading technique. Methods: First, stocks from diversified industries are pre-partitioned using clustering algorithm to break industrial boundaries. Then, cointegrated stocks will be formed using ElasticNet algorithm boosted by AdaBoost algorithm. Results: All three indicators on price prediction chosen for performance evaluation increased significantly. MSE increased by 32.21% compared to OLS, 37.06% increase on MAE, 37.73% improvement on MAPE. (Portfolio return performance is still under construction, indicators including cumulative return, draw-down and Sharpe-ratio. The comparison will be against against Buy-and-Hold strategy, a common benchmark for any portfolio)
Date:	2022–07
URL:	http://d.repec.org/n?u=RePEc:toh:dssraa:127&r=

Market for Artificial Intelligence in Health Care and Compensation for Medical Errors

By:	Chopard, Bertrand; Musy, Olivier
Abstract:	We study the market for AI systems that are used to help to diagnose and treat diseases, reducing the risk of medical error. Based on a two-firm vertical product differentiation model, we examine how, in the event of patient harm, the amount of the compensation payment, and the division of this compensation between physicians and AI system producers affects both price competition between firms, and the quality (accuracy) of AI systems. One producer sells products with the best-available accuracy. The second sells a system with strictly lower accuracy at a lower price. Specifically, we show that both producers enjoy a positive market share, so long as some patients are diagnosed by physicians who do not use an AI system. The quality of the system is independent of how any compensation payment to the patient is divided between physicians and producers. However, the magnitude of the compensation payment impacts price competition. Increased malpractice pressure leads to lower vertical differentiation, thus encouraging price competition. We also explore the effect of compensation on firms’ profits at equilibrium. We conclude by discussing our results with respect to the evolution of the civil liability regime for AI in healthcare.
Keywords:	Artificial Intelligence, Diagnostic, Duopoly, Liability, Physician, Compensation
JEL:	I11 K13 K41 L13
Date:	2022–06–09
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:113328&r=

Assessing and Comparing Fixed-Target Forecasts of Arctic Sea Ice: Glide Charts for Feature-Engineered Linear Regression and Machine Learning Models

By:	Francis X. Diebold; Maximilian Goebel; Philippe Goulet Coulombe
Abstract:	We use "glide charts" (plots of sequences of root mean squared forecast errors as the target date is approached) to evaluate and compare fixed-target forecasts of Arctic sea ice. We first use them to evaluate the simple feature-engineered linear regression (FELR) forecasts of Diebold and Goebel (2021), and to compare FELR forecasts to naive pure-trend benchmark forecasts. Then we introduce a much more sophisticated feature-engineered machine learning (FEML) model, and we use glide charts to evaluate FEML forecasts and compare them to a FELR benchmark. Our substantive results include the frequent appearance of predictability thresholds, which differ across months, meaning that accuracy initially fails to improve as the target date is approached but then increases progressively once a threshold lead time is crossed. Also, we find that FEML can improve appreciably over FELR when forecasting "turning point" months in the annual cycle at horizons of one to three months ahead.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2206.10721&r=

DDPG based on multi-scale strokes for financial time series trading strategy

By:	Jun-Cheng Chen; Cong-Xiao Chen; Li-Juan Duan; Zhi Cai
Abstract:	With the development of artificial intelligence,more and more financial practitioners apply deep reinforcement learning to financial trading strategies.However,It is difficult to extract accurate features due to the characteristics of considerable noise,highly non-stationary,and non-linearity of single-scale time series,which makes it hard to obtain high returns.In this paper,we extract a multi-scale feature matrix on multiple time scales of financial time series,according to the classic financial theory-Chan Theory,and put forward to an approach of multi-scale stroke deep deterministic policy gradient reinforcement learning model(MSSDDPG)to search for the optimal trading strategy.We carried out experiments on the datasets of the Dow Jones,S&P 500 of U.S. stocks, and China's CSI 300,SSE Composite,evaluate the performance of our approach compared with turtle trading strategy, Deep Q-learning(DQN)reinforcement learning strategy,and deep deterministic policy gradient (DDPG) reinforcement learning strategy.The result shows that our approach gets the best performance in China CSI 300,SSE Composite,and get an outstanding result in Dow Jones,S&P 500 of U.S.
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2207.10071&r=

Evaluating a new earnings indicator. Can we improve the timeliness of existing statistics on earnings by using salary information from online job adverts?

By:	Jyldyz Djumalieva; Stef Garasto; Cath Sleeman
Abstract:	This paper examines how the salary information from online job adverts might be used to improve the timeliness of official statistics on earnings. The unique dataset underpinning the analysis contains over 51 million adverts for UK positions, collected between January 2012 and September 2018. The data was sourced from Burning Glass Technologies, a leading labour market intelligence company. We trial a mixture of forecasting approaches, including traditional econometric models and the relatively newer recurrent neural networks. For 2 out of 13 industries and for 5 out of 6 occupation groups, salaries from online job adverts are shown to improve the accuracy of earnings forecasts over and above official data on its own. More broadly, this paper provides a detailed methodology for evaluating a novel data source, such as salaries from job adverts, to inform an official statistical series, such as earnings.
Keywords:	arima models, earnings, forecasting, neural networks, online job adverts
JEL:	C18 C45 C53 J30
Date:	2020–12
URL:	http://d.repec.org/n?u=RePEc:nsr:escoed:escoe-dp-2020-19&r=

Generative Adversarial Networks Applied to Synthetic Financial Scenarios Generation

By:	Christophe Geissler (Advestis); Nicolas Morizet (Advestis); Matteo Rizzato (Advestis); Julien Wallart (Fujitsu Systems Europe)
Abstract:	The finance industry is producing an increasing amount of datasets that investment professionals can consider to be influential on the price of financial assets. These datasets were initially mainly limited to exchange data, namely price, capitalization and volume. Their coverage has now considerably expanded to include, for example, macroeconomic data, supply and demand of commodities, balance sheet data and more recently extra-financial data such as ESG scores. This broadening of the factors retained as influential constitutes a serious challenge for statistical modeling. Indeed, the instability of the correlations between these factors makes it practically impossible to identify the joint laws needed to construct scenarios. Fortunately, spectacular advances in Deep Learning field in recent years have given rise to GANs. GANs are a type of generative machine learning models that produce new data samples with the same characteristics as a training data distribution in an unsupervised way, avoiding data assumptions and human induced biases. In this work, we are exploring the use of GANs for synthetic financial scenarios generation. This pilot study is the result of a collaboration between Fujitsu and Advestis and it will be followed by a thorough exploration of the use cases that can benefit from the proposed solution. We propose a GANs-based algorithm that allows the replication of multivariate data representing several properties (including, but not limited to, price, market capitalization, ESG score, controversy score,. . .) of a set of stocks. This approach differs from examples in the financial literature, which are mainly focused on the reproduction of temporal asset price scenarios. We also propose several metrics to evaluate the quality of the data generated by the GANs. This approach is well fit for the generation of scenarios, the time direction simply arising as a subsequent (eventually conditioned) generation of data points drawn from the learned distribution. Our method will allow to simulate high dimensional scenarios (compared to ≲ 10 features currently employed in most recent use cases) where network complexity is reduced thanks to a wisely performed feature engineering and selection. Complete results will be presented in a forthcoming study.
Keywords:	Generative Adversarial Networks,Data Augmentation,Financial Scenarios,Risk Management
Date:	2022–07–11
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03716692&r=

This nep-big issue is ©2022 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	Vetter, Oliver A.; Hoffmann, Felix; Pumplun, Luisa; Buxmann, Peter
Date:	2022–06
URL:	http://d.repec.org/n?u=RePEc:dar:wpaper:133080&r=