nep-big 2020-04-20 papers

on Big Data

Issue of 2020‒04‒20
24 papers chosen by
Tom Coupé
University of Canterbury

A new multilayer network construction via Tensor learning By Giuseppe Brandi; T. Di Matteo
A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels By Hannes Wallimann; David Imhof; Martin Huber
Economic Black Holes and Labor Singularities in the Presence of Self-replicating Artificial Intelligence By YANO Makoto; FURUKAWA Yuichi
A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding By Ioannis Boukas; Damien Ernst; Thibaut Th\'eate; Adrien Bolland; Alexandre Huynen; Martin Buchwald; Christelle Wynants; Bertrand Corn\'elusse
Financial Market Trend Forecasting and Performance Analysis Using LSTM By Jonghyeon Min
Google It Up! A Google Trends-based analysis of COVID-19 outbreak in Iran By Mohammad Reza Farzanegan; Mehdi Feizi; Saeed Malek Sadati
Machine Learning Algorithms for Financial Asset Price Forecasting By Philip Ndikum
Robotisation, Employment and Industrial Growth Intertwined Across Global Value Chains By Mahdi Ghodsi; Oliver Reiter; Robert Stehrer; Roman Stöllinger
Company classification using machine learning By Sven Husmann; Antoniya Shivarova; Rick Steinert
The Middle-Income Trap 2.0: The Increasing Role of Human Capital in the Age of Automation and Implications for Developing Asia By Wagner, Helmut; Glawe, Linda
Deep Probabilistic Modelling of Price Movements for High-Frequency Trading By Ye-Sheen Lim; Denise Gorse
Deep learning for Stock Market Prediction By Mojtaba Nabipour; Pooyan Nayyeri; Hamed Jabani; Amir Mosavi
Deep Recurrent Modelling of Stationary Bitcoin Price Formation Using the Order Flow By Ye-Sheen Lim; Denise Gorse
Information Token Driven Machine Learning for Electronic Markets: Performance Effects in Behavioral Financial Big Data Analytics By Jim Samuel
Linkage of Patent and Design Right Data: Analysis of Industrial Design Activities in Companies at the Creator Level By IKEUCHI Kenta; MOTOHASHI Kazuyuki
Machine learning for multiple yield curve markets: fast calibration in the Gaussian affine framework By Sandrine G\"umbel; Thorsten Schmidt
An Application of Deep Reinforcement Learning to Algorithmic Trading By Thibaut Th\'eate; Damien Ernst
Theorization of Institutional Change in the Rise of Artificial Intelligence By Masashi Goto
The Economic Consequences of Data Privacy Regulation: Empirical Evidence from GDPR By Guy Aridor; Yeon-Koo Che; Tobias Salz
Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making By Jonathan Sadighian
Manipulation-Proof Machine Learning By Daniel Bj\"orkegren; Joshua E. Blumenstock; Samsun Knight
Predicting Labor Shortages from Labor Demand and Labor Supply Data: A Machine Learning Approach By Nikolas Dawson; Marian-Andrei Rizoiu; Benjamin Johnston; Mary-Anne Williams
What do online listings tell us about the housing market? By Michele Loberto; Andrea Luciani; Marco Pangallo
Heterogeneous Relationships between Automation Technologies and Skilled Labor: Evidence from a Firm Survey By MORIKAWA Masayuki

A new multilayer network construction via Tensor learning

By:	Giuseppe Brandi; T. Di Matteo
Abstract:	Multilayer networks proved to be suitable in extracting and providing dependency information of different complex systems. The construction of these networks is difficult and is mostly done with a static approach, neglecting time delayed interdependences. Tensors are objects that naturally represent multilayer networks and in this paper, we propose a new methodology based on Tucker tensor autoregression in order to build a multilayer network directly from data. This methodology captures within and between connections across layers and makes use of a filtering procedure to extract relevant information and improve visualization. We show the application of this methodology to different stationary fractionally differenced financial data. We argue that our result is useful to understand the dependencies across three different aspects of financial risk, namely market risk, liquidity risk, and volatility risk. Indeed, we show how the resulting visualization is a useful tool for risk managers depicting dependency asymmetries between different risk factors and accounting for delayed cross dependencies. The constructed multilayer network shows a strong interconnection between the volumes and prices layers across all the stocks considered while a lower number of interconnections between the uncertainty measures is identified.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.05367&r=all

A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

By:	Hannes Wallimann; David Imhof; Martin Huber
Abstract:	We propose a new method for flagging bid rigging, which is particularly useful for detecting incomplete bid-rigging cartels. Our approach combines screens, i.e. statistics derived from the distribution of bids in a tender, with machine learning to predict the probability of collusion. As a methodological innovation, we calculate such screens for all possible subgroups of three or four bids within a tender and use summary statistics like the mean, median, maximum, and minimum of each screen as predictors in the machine learning algorithm. This approach tackles the issue that competitive bids in incomplete cartels distort the statistical signals produced by bid rigging. We demonstrate that our algorithm outperforms previously suggested methods in applications to incomplete cartels based on empirical data from Switzerland.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.05629&r=all

Economic Black Holes and Labor Singularities in the Presence of Self-replicating Artificial Intelligence

By:	YANO Makoto; FURUKAWA Yuichi
Abstract:	This study is motivated by the widely-held view that self-replicating artificial intelligence may approach "some essential singularity . . . beyond which human affairs, as we know them, could not continue" (von Neumann). It investigates what state this process would lead to in an economy with frictionless markets. We demonstrate that if the production technologies, too, are frictionless, all workers will eventually be pulled into the most labor friendly sector (economic black hole). If, instead, they are subject to a friction created by congestion, it will give rise to, within a finite span of time, a state in which all workers will be unemployed (total job destruction).
Date:	2020–02
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:20009&r=all

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

By:	Ioannis Boukas; Damien Ernst; Thibaut Th\'eate; Adrien Bolland; Alexandre Huynen; Martin Buchwald; Christelle Wynants; Bertrand Corn\'elusse
Abstract:	The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.05940&r=all

Financial Market Trend Forecasting and Performance Analysis Using LSTM

By:	Jonghyeon Min
Abstract:	The financial market trend forecasting method is emerging as a hot topic in financial markets today. Many challenges still currently remain, and various researches related thereto have been actively conducted. Especially, recent research of neural network-based financial market trend prediction has attracted much attention. However, previous researches do not deal with the financial market forecasting method based on LSTM which has good performance in time series data. There is also a lack of comparative analysis in the performance of neural network-based prediction techniques and traditional prediction techniques. In this paper, we propose a financial market trend forecasting method using LSTM and analyze the performance with existing financial market trend forecasting methods through experiments. This method prepares the input data set through the data preprocessing process so as to reflect all the fundamental data, technical data and qualitative data used in the financial data analysis, and makes comprehensive financial market analysis through LSTM. In this paper, we experiment and compare performances of existing financial market trend forecasting models, and performance according to the financial market environment. In addition, we implement the proposed method using open sources and platform and forecast financial market trends using various financial data indicators.
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01502&r=all

Google It Up! A Google Trends-based analysis of COVID-19 outbreak in Iran

By:	Mohammad Reza Farzanegan (Philipps-University Marburg); Mehdi Feizi (Ferdowsi University of Mashhad); Saeed Malek Sadati (Ferdowsi University of Mashhad)
Abstract:	Soon after the first identified COVID-19 cases in Iran, the spread of the new Coronavirus has affected almost all its provinces. In the absence of credible data on people's unfiltered concerns and needs, especially in developing countries, Google search data is a reliable source that truthfully captures the public sentiment. This study examines the within province changes of confirmed cases of Corona across Iranian provinces from 19 Feb. 2020 to 9 March 2020. Using real-time Google Trends data, panel fixed effects, and GMM regression estimations, we show a robust negative association between the intensity of search for disinfection methods and materials in the past and current confirmed cases of the COVID-19 virus. In addition, we find a positive and robust association between the intensity of the searches for symptoms of Corona and the number of confirmed cases within the Iranian provinces. These findings are robust to control for province and period fixed effects, province-specific time trends, and lag of confirmed cases. Our results show how not only prevention could hinder affection in an epidemic disease but also prophecies, shaped by individual concerns and reflected in Google search queries, might not be self-fulfilling.
Keywords:	Google Trends, COVID-19, Iran, epidemic disease
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:mar:magkse:202017&r=all

Machine Learning Algorithms for Financial Asset Price Forecasting

By:	Philip Ndikum
Abstract:	This research paper explores the performance of Machine Learning (ML) algorithms and techniques that can be used for financial asset price forecasting. The prediction and forecasting of asset prices and returns remains one of the most challenging and exciting problems for quantitative finance and practitioners alike. The massive increase in data generated and captured in recent years presents an opportunity to leverage Machine Learning algorithms. This study directly compares and contrasts state-of-the-art implementations of modern Machine Learning algorithms on high performance computing (HPC) infrastructures versus the traditional and highly popular Capital Asset Pricing Model (CAPM) on U.S equities data. The implemented Machine Learning models - trained on time series data for an entire stock universe (in addition to exogenous macroeconomic variables) significantly outperform the CAPM on out-of-sample (OOS) test data.
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01504&r=all

Robotisation, Employment and Industrial Growth Intertwined Across Global Value Chains

By:	Mahdi Ghodsi (The Vienna Institute for International Economic Studies, wiiw); Oliver Reiter (The Vienna Institute for International Economic Studies, wiiw); Robert Stehrer (The Vienna Institute for International Economic Studies, wiiw); Roman Stöllinger (The Vienna Institute for International Economic Studies, wiiw)
Abstract:	The global economy is currently experiencing a new wave of technological change involving new technologies, especially in the realm of artificial intelligence and robotics, but not limited to it. One key concern in this context is the consequences of these new technologies on the labour market. This paper provides a comprehensive analysis of the direct and indirect effects of the rise of industrial robots and productivity via international value chains on various industrial indicators, including employment and real value added. The paper thereby adds to the existing empirical work on the relationship between technological change, employment and industrial growth by adding data on industrial robots while controlling for other technological advancements measured by total factor productivity (TFP). The results indicate that the overall impact of the installation of new robots did not statistically affect the growth of industrial employment during the period 2000–2014 significantly, while the overall impact on the real value added growth of industries in the world was positive and significant. The methodology also allows for a differentiation between the impact of robots across various industries and countries based on two different perspectives of source and destination industries across global value chains. Disclaimer This is a background paper for the UNIDO Industrial Development Report 2020. Industrializing in the digital age.
Keywords:	Robotisation, digitalisation, global value chains, total factor productivity, industrial growth, employment, value added
JEL:	D57 J21 L16 O14
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:wii:wpaper:177&r=all

Company classification using machine learning

By:	Sven Husmann; Antoniya Shivarova; Rick Steinert
Abstract:	The recent advancements in computational power and machine learning algorithms have led to vast improvements in manifold areas of research. Especially in finance, the application of machine learning enables researchers to gain new insights into well-studied areas. In our paper, we demonstrate that unsupervised machine learning algorithms can be used to visualize and classify company data in an economically meaningful and effective way. In particular, we implement the t-distributed stochastic neighbor embedding (t-SNE) algorithm due to its beneficial properties as a data-driven dimension reduction and visualization tool in combination with spectral clustering to perform company classification. The resulting groups can then be implemented by experts in the field for empirical analysis and optimal decision making. By providing an exemplary out-of-sample study within a portfolio optimization framework, we show that meaningful grouping of stock data improves the overall portfolio performance. We, therefore, introduce the t-SNE algorithm to the financial community as a valuable technique both for researchers and practitioners.
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01496&r=all

The Middle-Income Trap 2.0: The Increasing Role of Human Capital in the Age of Automation and Implications for Developing Asia

By:	Wagner, Helmut; Glawe, Linda
Abstract:	We modify the concept of the middle-income trap (MIT) against the background of the Fourth Industrial Revolution and the (future) challenges of automation (creating the concept of the "MIT 2.0") and discuss the implications for developing Asia. In particular, we analyze the impacts of automation, artificial intelligence, and digitalization on the growth drivers of emerging market economies and the MIT mechanism. Our findings suggest that improving human capital accumulation, particularly the upgrading of skills needed with the rapid advance of automation, will be key success factors for overcoming the MIT 2.0.
Keywords:	automation,AI,human capital,middle-income trap,developing Asia,economic development,economic growth,employment
JEL:	J24 O10 O11 O15 O33 O47 O53
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:zbw:ceames:152018&r=all

Deep Probabilistic Modelling of Price Movements for High-Frequency Trading

By:	Ye-Sheen Lim; Denise Gorse
Abstract:	In this paper we propose a deep recurrent architecture for the probabilistic modelling of high-frequency market prices, important for the risk management of automated trading systems. Our proposed architecture incorporates probabilistic mixture models into deep recurrent neural networks. The resulting deep mixture models simultaneously address several practical challenges important in the development of automated high-frequency trading strategies that were previously neglected in the literature: 1) probabilistic forecasting of the price movements; 2) single objective prediction of both the direction and size of the price movements. We train our models on high-frequency Bitcoin market data and evaluate them against benchmark models obtained from the literature. We show that our model outperforms the benchmark models in both a metric-based test and in a simulated trading scenario
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01498&r=all

Deep learning for Stock Market Prediction

By:	Mojtaba Nabipour; Pooyan Nayyeri; Hamed Jabani; Amir Mosavi
Abstract:	Prediction of stock groups' values has always been attractive and challenging for shareholders. This paper concentrates on the future prediction of stock market groups. Four groups named diversified financials, petroleum, non-metallic minerals and basic metals from Tehran stock exchange are chosen for experimental evaluations. Data are collected for the groups based on ten years of historical records. The values predictions are created for 1, 2, 5, 10, 15, 20 and 30 days in advance. The machine learning algorithms utilized for prediction of future values of stock market groups. We employed Decision Tree, Bagging, Random Forest, Adaptive Boosting (Adaboost), Gradient Boosting and eXtreme Gradient Boosting (XGBoost), and Artificial neural network (ANN), Recurrent Neural Network (RNN) and Long short-term memory (LSTM). Ten technical indicators are selected as the inputs into each of the prediction models. Finally, the result of predictions is presented for each technique based on three metrics. Among all the algorithms used in this paper, LSTM shows more accurate results with the highest model fitting ability. Also, for tree-based models, there is often an intense competition between Adaboost, Gradient Boosting, and XGBoost.
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01497&r=all

Deep Recurrent Modelling of Stationary Bitcoin Price Formation Using the Order Flow

By:	Ye-Sheen Lim; Denise Gorse
Abstract:	In this paper we propose a deep recurrent model based on the order flow for the stationary modelling of the high-frequency directional prices movements. The order flow is the microsecond stream of orders arriving at the exchange, driving the formation of prices seen on the price chart of a stock or currency. To test the stationarity of our proposed model we train our model on data before the 2017 Bitcoin bubble period and test our model during and after the bubble. We show that without any retraining, the proposed model is temporally stable even as Bitcoin trading shifts into an extremely volatile "bubble trouble" period. The significance of the result is shown by benchmarking against existing state-of-the-art models in the literature for modelling price formation using deep learning.
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01499&r=all

Information Token Driven Machine Learning for Electronic Markets: Performance Effects in Behavioral Financial Big Data Analytics

By:	Jim Samuel
Abstract:	Conjunct with the universal acceleration in information growth, financial services have been immersed in an evolution of information dynamics. It is not just the dramatic increase in volumes of data, but the speed, the complexity and the unpredictability of big-data phenomena that have compounded the challenges faced by researchers and practitioners in financial services. Math, statistics and technology have been leveraged creatively to create analytical solutions. Given the many unique characteristics of financial bid data (FBD) it is necessary to gain insights into strategies and models that can be used to create FBD specific solutions. Behavioral finance data, a subset of FBD, is seeing exponential growth and this presents an unprecedented opportunity to study behavioral finance employing big data analytics methodologies. The present study maps machine learning (ML) techniques and behavioral finance categories to explore the potential for using ML techniques to address behavioral aspects in FBD. The ontological feasibility of such an approach is presented and the primary purpose of this study is propositioned- ML based behavioral models can effectively estimate performance in FBD. A simple machine learning algorithm is successfully employed to study behavioral performance in an artificial stock market to validate the propositions. Keywords: Information; Big Data; Electronic Markets; Analytics; Behavior
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.06642&r=all

Linkage of Patent and Design Right Data: Analysis of Industrial Design Activities in Companies at the Creator Level

By:	IKEUCHI Kenta; MOTOHASHI Kazuyuki
Abstract:	In addition to technological superiority (functional value), attention to design superiority (semantic value) is increasing as a source of competitiveness in product markets. In this research, we have created a linked data set of utility patent and design patent information from the Japan Patent Office to evaluate design patent data as a source of understanding design innovation. First, machine learning was performed on a classification model to disambiguate the same inventor / creator on patent right/design right applications using data from applications from the Japanese Patent Office. By interconnecting the inventor's and creator's identifiers estimated by the learned classification model, we identified design creators who also created the patented invention. Next, an empirical analysis is conducted to characterize the design created by a utility patent inventor. It was found that about half of design patents are found to be created by the same individuals who are involved in the relevant utility patents. However, the division of labor between designers (creators of design patents) and engineers (inventors of utility patents) is advancing, particularly in large firms.
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:20005&r=all

Machine learning for multiple yield curve markets: fast calibration in the Gaussian affine framework

By:	Sandrine G\"umbel; Thorsten Schmidt
Abstract:	Calibration is a highly challenging task, in particular in multiple yield curve markets. This paper is a first attempt to study the chances and challenges of the application of machine learning techniques for this. We employ Gaussian process regression, a machine learning methodology having many similarities with extended Kalman filtering - a technique which has been applied many times to interest rate markets and term structure models. We find very good results for the single curve markets and many challenges for the multi curve markets in a Vasicek framework. The Gaussian process regression is implemented with the Adam optimizer and the non-linear conjugate gradient method, where the latter performs best. We also point towards future research.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.07736&r=all

An Application of Deep Reinforcement Learning to Algorithmic Trading

By:	Thibaut Th\'eate; Damien Ernst
Abstract:	This scientific research paper presents an innovative approach based on deep reinforcement learning (DRL) to solve the algorithmic trading problem of determining the optimal trading position at any point in time during a trading activity in stock markets. It proposes a novel DRL trading strategy so as to maximise the resulting Sharpe ratio performance indicator on a broad range of stock markets. Denominated the Trading Deep Q-Network algorithm (TDQN), this new trading strategy is inspired from the popular DQN algorithm and significantly adapted to the specific algorithmic trading problem at hand. The training of the resulting reinforcement learning (RL) agent is entirely based on the generation of artificial trajectories from a limited set of stock market historical data. In order to objectively assess the performance of trading strategies, the research paper also proposes a novel, more rigorous performance assessment methodology. Following this new performance assessment approach, promising results are reported for the TDQN strategy.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.06627&r=all

Theorization of Institutional Change in the Rise of Artificial Intelligence

By:	Masashi Goto (Research Institute for Economics and Business Administration, Kobe University, Japan)
Abstract:	This study explores how professional institutional change is theorized in the context of the emergence of disruptive technology as a precipitating jolt. I conducted a case study of two Big four accounting firms in Japan on their initiatives to apply artificial intelligence (AI) to their core audit services between 2015 and 2017. The data shows the process for incumbent dominant organizations to collaborate and develop social perceptions about the changing but continuing relevance of their profession. The analysis suggests that the retheorization can advance even without concrete alternative templates when disruptive technology is perceived to have overwhelming influences, following multi-level steps progressing from internal to external theorization. This article proposes a grounded theory model of the process of professional institutional change: (1) Theorizing change internally at the field, (2) Developing solutions by experimentations in organizations, (3) Exploring solutions driven by individuals in organizations and (4) Theorizing change externally by organizations. It contributes to the profession and institutional scholarship by expanding our knowledge about the diversity of professional institutional field change process in this age of increasing technology influences on organizations.
Keywords:	Institutional change; Professions; Artificial intelligence; Qualitative research; Grounded theory
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:kob:dpaper:dp2020-12&r=all

The Economic Consequences of Data Privacy Regulation: Empirical Evidence from GDPR

By:	Guy Aridor; Yeon-Koo Che; Tobias Salz
Abstract:	This paper studies the effects of the EU’s General Data Protection Regulation (GDPR) on the ability of firms to collect consumer data, identify consumers over time, accrue revenue via online advertising, and predict their behavior. Utilizing a novel dataset by an intermediary that spans much of the online travel industry, we perform a difference-in-differences analysis that exploits the geographic reach of GDPR. We find a 12.5% drop in the intermediary-observed consumers as a result of the new opt-in requirement of GDPR. At the same time, the remaining consumers are observable for a longer period of time. We provide evidence that this pattern is consistent with the hypothesis that privacy-conscious consumers substitute away from less efficient privacy protection (e.g, cookie deletion) to explicit opt out, a process that would reduce the number of artificially short consumer histories. Further in keeping with this hypothesis, we observe that the average value of the remaining consumers to advertisers has increased, offsetting most of the losses from consumers that opt out. Finally, we find that the ability to predict consumer behavior by the intermediary’s proprietary machine learning algorithm does not significantly worsen as a result of the changes induced by GDPR. Our results highlight the externalities that consumer privacy decisions have both on other consumers and for firms.
JEL:	L0 L5 L81
Date:	2020–03
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26900&r=all

Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making

By:	Jonathan Sadighian
Abstract:	There has been a recent surge in interest in the application of artificial intelligence to automated trading. Reinforcement learning has been applied to single- and multi-instrument use cases, such as market making or portfolio management. This paper proposes a new approach to framing cryptocurrency market making as a reinforcement learning challenge by introducing an event-based environment wherein an event is defined as a change in price greater or less than a given threshold, as opposed to by tick or time-based events (e.g., every minute, hour, day, etc.). Two policy-based agents are trained to learn a market making trading strategy using eight days of training data and evaluate their performance using 30 days of testing data. Limit order book data recorded from Bitmex exchange is used to validate this approach, which demonstrates improved profit and stability compared to a time-based approach for both agents when using a simple multi-layer perceptron neural network for function approximation and seven different reward functions.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.06985&r=all

Manipulation-Proof Machine Learning

By:	Daniel Bj\"orkegren; Joshua E. Blumenstock; Samsun Knight
Abstract:	An increasing number of decisions are guided by machine learning algorithms. In many settings, from consumer credit to criminal justice, those decisions are made by applying an estimator to data on an individual's observed behavior. But when consequential decisions are encoded in rules, individuals may strategically alter their behavior to achieve desired outcomes. This paper develops a new class of estimator that is stable under manipulation, even when the decision rule is fully transparent. We explicitly model the costs of manipulating different behaviors, and identify decision rules that are stable in equilibrium. Through a large field experiment in Kenya, we show that decision rules estimated with our strategy-robust method outperform those based on standard supervised learning approaches.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.03865&r=all

Predicting Labor Shortages from Labor Demand and Labor Supply Data: A Machine Learning Approach

By:	Nikolas Dawson; Marian-Andrei Rizoiu; Benjamin Johnston; Mary-Anne Williams
Abstract:	This research develops a Machine Learning approach able to predict labor shortages for occupations. We compile a unique dataset that incorporates both Labor Demand and Labor Supply occupational data in Australia from 2012 to 2018. This includes data from 1.3 million job advertisements (ads) and 20 official labor force measures. We use these data as explanatory variables and leverage the XGBoost classifier to predict yearly labor shortage classifications for 132 standardized occupations. The models we construct achieve macro-F1 average performance scores of up to 86 per cent. However, the more significant findings concern the class of features which are most predictive of labor shortage changes. Our results show that job ads data were the most predictive features for predicting year-to-year labor shortage changes for occupations. These findings are significant because they highlight the predictive value of job ads data when they are used as proxies for Labor Demand, and incorporated into labor market prediction models. This research provides a robust framework for predicting labor shortages, and their changes, and has the potential to assist policy-makers and businesses responsible for preparing labor markets for the future of work.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.01311&r=all

What do online listings tell us about the housing market?

By:	Michele Loberto; Andrea Luciani; Marco Pangallo
Abstract:	Traditional data sources for the analysis of housing markets show several limitations, that recently started to be overcome using data coming from housing sales advertisements (ads) websites. In this paper, using a large dataset of ads in Italy, we provide the first comprehensive analysis of the problems and potential of these data. The main problem is that multiple ads ("duplicates") can correspond to the same housing unit. We show that this issue is mainly caused by sellers' attempt to increase visibility of their listings. Duplicates lead to misrepresentation of the volume and composition of housing supply, but this bias can be corrected by identifying duplicates with machine learning tools. We then focus on the potential of these data. We show that the timeliness, granularity, and online nature of these data allow monitoring of housing demand, supply and liquidity, and that the (asking) prices posted on the website can be more informative than transaction prices.
Date:	2020–04
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2004.02706&r=all

Heterogeneous Relationships between Automation Technologies and Skilled Labor: Evidence from a Firm Survey

By:	MORIKAWA Masayuki
Abstract:	Based on an original survey of Japanese firms, this study presents evidence of the use of recent automation technologiesâ€”artificial intelligence (AI), big data analytics, and roboticsâ€”and discusses the relationship between these technologies and skilled employees at the firm-level. The result indicates that while the number of firms already using these technologies is small, the number of firms interested in using them is large. The use of AI and big data is positively associated with the share of highly educated employees, particularly those with a postgraduate degree; however, such a relationship is absent in the case of the use of industrial robots in the manufacturing industry. Studies have not distinguished between robotics and other automation technologies, such as AI, but the result suggests a heterogeneous complementarity with high-skilled employees for each type of automation technology.
Date:	2020–01
URL:	http://d.repec.org/n?u=RePEc:eti:dpaper:20004&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.