Big Data
http://lists.repec.org/mailman/listinfo/nep-big
Big Data
2019-03-11
Using Artificial Intelligence to Recapture Norms: Did #metoo change gender norms in Sweden?
http://d.repec.org/n?u=RePEc:arx:papers:1903.00690&r=big
Norms are challenging to define and measure, but this paper takes advantage of text data and the recent development in machine learning to create an encompassing measure of norms. An LSTM neural network is trained to detect gendered language. The network functions as a tool to create a measure on how gender norms changes in relation to the Metoo movement on Swedish Twitter. This paper shows that gender norms on average are less salient half a year after the date of the first appearance of the hashtag #Metoo. Previous literature suggests that gender norms change over generations, but the current result suggests that norms can change in the short run.
Sara Moricz
2019-03
Artificial Intelligence: The Ambiguous Labor Market Impact of Automating Prediction
http://d.repec.org/n?u=RePEc:nbr:nberwo:25619&r=big
Recent advances in artificial intelligence are primarily driven by machine learning, a prediction technology. Prediction is useful because it is an input into decision-making. In order to appreciate the impact of artificial intelligence on jobs, it is important to understand the relative roles of prediction and decision tasks. We describe and provide examples of how artificial intelligence will affect labor, emphasizing differences between when automating prediction leads to automating decisions versus enhancing decision-making by humans.
Ajay Agrawal
Joshua S. Gans
Avi Goldfarb
2019-02
Metrics for Measuring the Performance of Machine Learning Prediction Models: An Application to the Housing Market
http://d.repec.org/n?u=RePEc:grz:wpaper:2019-02&r=big
With the rapid growth of machine learning (ML) methods and datasets to which they can be applied, the question of how one can compare the predictive performance of competing models is becoming an issue of high importance. The existing literature is interdisciplinary, making it hard for users to locate and evaluate the set of available metrics. In this article we collect a number of such metrics from various sources. We classify them by type and then evaluate them with respect to two novel symmetry conditions. While none of these metrics satisfy both conditions, we propose a number of new metrics that do. In total we consider a portfolio of 56 performance metrics. To illustrate the problem of choosing between them, we provide an application in which five ML methods are used to predict apartment prices. We show that the most popular metrics for evaluating performance in the AVM literature generate misleading results. A different picture emerges when the full set of metrics is considered, and especially when we focus on the class of metrics with the best symmetry properties. We conclude by recommending four key metrics for evaluating model predictive performance.
Miriam Steurer
Robert Hill
Machine learning; Performance metric; Prediction error; Automated valuation model
2019-02
Liquidity Management of Canadian Corporate Bond Mutual Funds: A Machine Learning Approach
http://d.repec.org/n?u=RePEc:bca:bocsan:19-7&r=big
How do Canadian corporate bond mutual funds meet investor redemptions? We revisit this question using decision tree and random forest algorithms. We uncover new patterns in the decisions made by fund managers: the interaction between a larger, market-wide term spread and relatively less-liquid holdings increases the probability that a fund manager will sell less-liquid assets (corporate bonds) to meet redemptions. The evidence also shows that machine learning algorithms can extract new knowledge that is not apparent using a classical linear modelling approach.
Rohan Arora
Chen Fan
Guillaume Ouellet Leblanc
Financial markets; Financial stability
2019
Syria in the Dark: Estimating the Economic Consequences of the Civil War through Satellite-Derived Night Time Lights
http://d.repec.org/n?u=RePEc:frz:wpaper:wp2019_05.rdf&r=big
The Syrian Civil War has begun in 2011 and is still wrecking enormous damages on the country's economy, with an impressive toll measured in deaths, migration, and the destruction of the Syrian historical heritage and physical infrastructure. This paper examines the impact of the War on Syria's economy from the perspective of outer space, to bypass the issue of data availability due to the inaccessibility of the war-ravaged territory. The estimates obtained in this way are more pessimistic than the ones reported by international organisations. Starting from our estimates, we provide long-term projections for the country's economy, and estimate the window for GDP recovery at the pre-war levels. We discuss geo-political implications which could prevent our projections from happening.
Giorgia Giovannetti
Elena Perra
Syria, War, GDP estimates, Night-Lights
2019
Forecasting Economics and Financial Time Series: ARIMA vs. LSTM
http://d.repec.org/n?u=RePEc:arx:papers:1803.06386&r=big
Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In particular, ARIMA model has demonstrated its outperformance in precision and accuracy of predicting the next lags of time series. With the recent advancement in computational power of computers and more importantly developing more advanced machine learning algorithms and approaches such as deep learning, new algorithms are developed to forecast time series data. The research question investigated in this article is that whether and how the newly developed deep learning-based algorithms for forecasting time series data, such as "Long Short-Term Memory (LSTM)", are superior to the traditional algorithms. The empirical studies conducted and reported in this article show that deep learning-based algorithms such as LSTM outperform traditional-based algorithms such as ARIMA model. More specifically, the average reduction in error rates obtained by LSTM is between 84 - 87 percent when compared to ARIMA indicating the superiority of LSTM to ARIMA. Furthermore, it was noticed that the number of training times, known as "epoch" in deep learning, has no effect on the performance of the trained forecast model and it exhibits a truly random behavior.
Sima Siami-Namini
Akbar Siami Namin
2018-03
Big Data et pratiques de GRH
http://d.repec.org/n?u=RePEc:hal:journl:halshs-01961214&r=big
Le Big Data constitue un phénomène qui irrigue aujourd'hui nombre de domaines : marketing, biologie, justice… La définition commune du Big Data, issue du rapport de Gartner de 2001, se fonde essentiellement sur les caractéristiques des données mobilisées : volume, hétérogénéité des sources et du degré de structuration, mise à jour en temps réel des données… Cette définition peut sembler restrictive, mais d'autres définitions plus récentes et plus englobantes permettent d'identifier quelques dispositifs introduisant du Big Data dans les RH. En mobilisant les notions de dispositifs et de pratiques de GRH, et en nous centrant sur trois dispositifs mobilisant des données en RH, nous cherchons à qualifier les objectifs de modification de pratiques de GRH portés par les dispositifs de Big Data RH, autour de la personnalisation et de la prédiction.
Clotilde Coron
Big Data,RH,Dispositifs de GRH,Pratiques de GRH
2019
Identifying Bid Leakage In Procurement Auctions: Machine Learning Approach
http://d.repec.org/n?u=RePEc:arx:papers:1903.00261&r=big
We propose a novel machine-learning-based approach to detect bid leakage in first-price sealed-bid auctions. We extract and analyze the data on more than 1.4 million Russian procurement auctions between 2014 and 2018. As bid leakage in each particular auction is tacit, the direct classification is impossible. Instead, we reduce the problem of bid leakage detection to Positive-Unlabeled Classification. The key idea is to regard the losing participants as fair and the winners as possibly corrupted. This allows us to estimate the prior probability of bid leakage in the sample, as well as the posterior probability of bid leakage for each specific auction. We find that at least 16\% of auctions are exposed to bid leakage. Bid leakage is more likely in auctions with a higher reserve price, lower number of bidders and lower price fall, and where the winning bid is received in the last hour before the deadline.
Dmitry I. Ivanov
Alexander S. Nesterov
2019-03
Conditional Density Estimation with Neural Networks: Best Practices and Benchmarks
http://d.repec.org/n?u=RePEc:arx:papers:1903.00954&r=big
Given a set of empirical observations, conditional density estimation aims to capture the statistical relationship between a conditional variable $\mathbf{x}$ and a dependent variable $\mathbf{y}$ by modeling their conditional probability $p(\mathbf{y}|\mathbf{x})$. The paper develops best practices for conditional density estimation for finance applications with neural networks, grounded on mathematical insights and empirical evaluations. In particular, we introduce a noise regularization and data normalization scheme, alleviating problems with over-fitting, initialization and hyper-parameter sensitivity of such estimators. We compare our proposed methodology with popular semi- and non-parametric density estimators, underpin its effectiveness in various benchmarks on simulated and Euro Stoxx 50 data and show its superior performance. Our methodology allows to obtain high-quality estimators for statistical expectations of higher moments, quantiles and non-linear return transformations, with very little assumptions about the return dynamic.
Jonas Rothfuss
Fabio Ferreira
Simon Walther
Maxim Ulrich
2019-03
Narratives About Technology-Induced Job Degradation Then and Now
http://d.repec.org/n?u=RePEc:cwl:cwldpp:2168&r=big
Concerns that technological progress degrades job opportunities have been expressed over much of the last two centuries by both professional economists and the general public. These concerns can be seen in narratives both in scholarly publications and in the news media. Part of the expressed concern about jobs has been about the potential for increased economic inequality. But another part of the concern has been about a perceived decline in job quality in terms of its effects on monotony vs creativity of work, individual sense of identity, power to act independently, and meaning of life. Public policy should take account of both of these concerns, inequality and job quality.
Robert J. Shiller
Labor-saving machines, Artificial intelligence, History of thought, Division of labor, Unemployment, Automation, Robotics
2019-02
Model Selection in Utility-Maximizing Binary Prediction
http://d.repec.org/n?u=RePEc:arx:papers:1903.00716&r=big
The semiparametric maximum utility estimation proposed by Elliott and Lieli (2013) can be viewed as cost-sensitive binary classification; thus, its in-sample overfitting issue is similar to that of perceptron learning in the machine learning literature. Based on structural risk minimization, a utility-maximizing prediction rule (UMPR) is constructed to alleviate the in-sample overfitting of the maximum utility estimation. We establish non-asymptotic upper bounds on the difference between the maximal expected utility and the generalized expected utility of the UMPR. Simulation results show that the UMPR with an appropriate data-dependent penalty outweighs some common estimators in binary classification if the conditional probability of the binary outcome is misspecified, or a decision maker's preference is ignored.
Jiun-Hua Su
2019-03
Gaussian Process Regression for Pricing Variable Annuities with Stochastic Volatility and Interest Rate
http://d.repec.org/n?u=RePEc:arx:papers:1903.00369&r=big
In this paper we develop an efficient approach based on a Machine Learning technique which allows one to quickly evaluate insurance products considering stochastic volatility and interest rate. Specifically, following De Spiegeleer et al., we apply Gaussian Process Regression to compute the price and the Greeks of a GMWB Variable Annuity. Starting from observed prices previously computed by means of a Hybrid Tree PDE approach for some known combinations of model parameters, it is possible to approximate the whole target function on a bounded domain. The regression algorithm consists of two main steps: algorithm training and evaluation. In particular, the first step is the most time demanding, but it needs to be performed only once, while the prediction step is very fast and requires to be performed only when evaluating the function. The developed method, as well as for the calculation of prices and Greeks, can also be employed to compute the no-arbitrage fee, which is a common practice in the Variable Annuities sector. We consider three increasing complexity models, namely the Black-Scholes, the Heston and the Heston Hull-White models, which extend the sources of randomness up to consider stochastic volatility and stochastic interest rate together. Numerical experiments show that the accuracy of the estimated values is high, while the computational cost is much lower than the one required by a direct calculation with standard approaches. Finally, we stress out that the analysis is carried out for a GMWB annuity but it could be generalized to other insurance products. Machine Learning seems to be a very promising and interesting tool for insurance risk management.
Ludovic Gouden\`ege
Andrea Molent
Antonino Zanette
2019-03
On binscatter
http://d.repec.org/n?u=RePEc:fip:fednsr:881&r=big
Binscatter is very popular in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing “big data” in regression settings, and it is often used for informal testing of substantive hypotheses such as linearity or monotonicity of the regression function. This paper presents a foundational, thorough analysis of binscatter: We give an array of theoretical and practical results that aid both in understanding current practices (that is, their validity or lack thereof) and in offering theory-based guidance for future applications. Our main results include principled number of bins selection, confidence intervals and bands, hypothesis tests for parametric and shape restrictions of the regression function, and several other new methods, applicable to canonical binscatter as well as higher-order polynomial, covariate-adjusted, and smoothness-restricted extensions thereof. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice. We also discuss extensions to clustered data. Our results are illustrated with simulated and real data throughout. Companion general-purpose software packages for Stata and R are provided. Finally, from a technical perspective, new theoretical results for partitioning-based series estimation are obtained that may be of independent interest.
Cattaneo, Matias D.
Crump, Richard K.
Farrell, Max H.
Feng , Yingjie
binned scatter plot; regressogram; piecewise polynomials; splines; partitioning estimators; nonparametric regression; robust bias correction; uniform inference; binning selection
2019-02-01