nep-big 2021-07-19 papers

on Big Data

Issue of 2021‒07‒19
27 papers chosen by
Tom Coupé
University of Canterbury

Role of the Media in the Inflation Expectation Formation Process By Tetiana Yukhymenko
Big Data is Decision Science: the Case of Covid-19 Vaccination By Jacques Bughin; Michele Cincera; Dorota Reykowska; Rafal Ohme
Words Matter: Gender, Jobs and Applicant Behavior By Chaturvedi, Sugat; Mahajan, Kanika; Siddique, Zahra
SUITCEYES Scoping Report on Law and Policy on Deafblindness, Disability and New Technologies: United Kingdom By Woodin, Sarah L.
A Neural Frequency-Severity Model and Its Application to Insurance Claims By Dong-Young Lim
Predicting Exporters with Machine Learning By Francesca Micocci; Armando Rungi
Catching The Drivers of Inclusive Growth In Sub-Saharan Africa: An Application of Machine Learning By Ofori, Isaac K
Estimation and Machine Learning Prediction of Imports of Goods in European Countries in the Period 2010-2019 By Costantiello, Alberto; Laureti, Lucio; Leogrande, Angelo
Expl(AI)ned: The impact of explainable artificial intelligence on cognitive processes By Bauer, Kevin; von Zahn, Moritz; Hinz, Oliver
Does Saving Cause Borrowing? By Paolina C. Medina; Michaela Pagel
Short-term electricity price forecastingmodels comparative analysis : Machine Learning vs. Econometrics By Antoine FerrÉ; Guillaume de Certaines; Jérôme Cazelles; Tancrède Cohet; Arash Farnoosh; Frédéric Lantz
Neural network regression for Bermudan option pricing By Bernard Lapeyre; Jérôme Lelong
Heterogeneous Treatment Effects in Regression Discontinuity Designs By \'Agoston Reguly
Clustering and attention model based for Intelligent Trading By Mimansa Rana; Nanxiang Mao; Ming Ao; Xiaohui Wu; Poning Liang; Matloob Khushi
Credit scoring using neural networks and SURE posterior probability calibration By Matthieu Garcin; Samuel St\'ephan
Causal inference in case-control studies By Sung Jae Jun; Sokbae (Simon) Lee
Image Content, Complexity, and the Market Value of Art By Stephen Sheppard
Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts By Sukjin Han; Eric H. Schulman; Kristen Grauman; Santhosh Ramakrishnan
Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities By Rian Dolphin; Barry Smyth; Yang Xu; Ruihai Dong
Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey By Ali B. Barlas; Seda Guler Mert; Berk Orkun Isa; Alvaro Ortiz; Tomasa Rodrigo; Baris Soybilgen; Ege Yazgan
Exploiting Symmetry in High-Dimensional Dynamic Programming By Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Jesse Perla; Arnav Sood
End-to-End Risk Budgeting Portfolio Optimization with Neural Networks By Ayse Sinem Uysal; Xiaoyue Li; John M. Mulvey
Exploiting Symmetry in High-Dimensional Dynamic Programming By Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Jesse Perla; Arnav Sood
Mapping urban living standards in developing countries with energy consumption data By Agyemang, Felix; Fox, Sean; Memon, Rashid
Trust predicts compliance to Covid-19 containment policies: evidence from ten countries using big data By Francesco Sarracino; Talita Greyling; Kelsey J. O'Connor; Chiara Peroni; Stephanie Rossouw
The Demand for Executive Skills By Stephen Hansen; Tejas Ramdas; Raffaella Sadun; Joe Fuller
Price change prediction of ultra high frequency financial data based on temporal convolutional network By Wei Dai; Yuan An; Wen Long

Role of the Media in the Inflation Expectation Formation Process

By:	Tetiana Yukhymenko (National Bank of Ukraine)
Abstract:	This research highlights the role played by the media in the inflation expectations formation process of different types of respondents in Ukraine. Using a large news corpus and machine learning techniques I constructed news-based measures transforming text into quantitative indicators, which reflect news topics relevant to inflation expectations. As such, I found evidence that the different news topics have an impact on inflation expectations and can explain part of their variance. Thus, my results can help understand inflation expectations, especially as anchoring inflation expectations remains a key challenge for central banks.
Keywords:	Inflation expectations; natural language processing; textual data; machine learning
JEL:	C55 C82 D84 E31 E58
Date:	2021–06–30
URL:	http://d.repec.org/n?u=RePEc:gii:giihei:heidwp13-2021&r=

Big Data is Decision Science: the Case of Covid-19 Vaccination

By:	Jacques Bughin; Michele Cincera; Dorota Reykowska; Rafal Ohme
Abstract:	Data science has been proven to be an important asset to support better decision-making in a variety of settings, whether it is for a scientist to better predict climate change, for a company to better predict sales, or for a government to anticipate voting preferences. In this research, we leverage Random Forest (RF) as one of the most effective machine learning techniques using big data to predict vaccine intent in five European countries. The findings support the idea that outside of vaccine features, building adequate perception of the risk of contamination, as well securing institutional and peer trust are key nudges to convert skeptics to get vaccinated against the covid-19. What machine learning techniques further add beyond traditional regression techniques, is some extra granularity in factors affecting vaccine preferences (twice more factors than logistic regression). Other factors that emerge as predictors of vaccine intent are compliance appetite with non-pharmaceutical protective measures, as well as perception of the crisis duration.
Keywords:	Attitudes, Big data, Covid-19, iCode™, Machine learning techniques, Random Forest, Response time, Vaccination
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:ict:wpaper:2013/327150&r=

Words Matter: Gender, Jobs and Applicant Behavior

By:	Chaturvedi, Sugat (Indian Statistical Institute); Mahajan, Kanika (Ashoka University); Siddique, Zahra (University of Bristol)
Abstract:	We examine employer preferences for hiring men vs women using 160,000 job ads posted on an online job portal in India, linked with more than 6 million applications. We apply machine learning algorithms on text contained in job ads to predict an employer's gender preference. We find that advertised wages are lowest in jobs where employers prefer women, even when this preference is implicitly retrieved through the text analysis, and that these jobs also attract a larger share of female applicants. We then systematically uncover what lies beneath these relationships by retrieving words that are predictive of an explicit gender preference, or gendered words, and assigning them to the categories of hard and soft-skills, personality traits, and flexibility. We find that skills related female-gendered words have low returns but attract a higher share of female applicants while male-gendered words indicating decreased flexibility (e.g., frequent travel or unusual working hours) have high returns but result in a smaller share of female applicants. This contributes to a gender earnings gap. Our findings illustrate how gender preferences are partly driven by stereotypes and statistical discrimination.
Keywords:	gender, job portal, machine learning
JEL:	J16 J63 J71
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp14497&r=

SUITCEYES Scoping Report on Law and Policy on Deafblindness, Disability and New Technologies: United Kingdom

By:	Woodin, Sarah L.
Abstract:	This report discusses law and policy on new technologies: artificial intelligence (AI), machine learning and the Internet of Things (IoT) in relation to disabled people and people with deafblindness in the UK. Written as part of the SUITCEYES project, it provides a broad overview of formal rights and the extent to which disabled people can access new technologies in practice. The field is fast moving and volatile, with judgements regularly made and overturned in the courts and frequent new initiatives. The UK government emphasises the importance of investing in new technologies as a means of strengthening the economy. The opportunities represented by technological developments have been largely welcomed by disabled people but questions remain about how the technology might be used and developed by disabled people and people with deafblindness themselves and the need for safeguards against exploitation.
Date:	2020–05–31
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:uv5fe&r=

A Neural Frequency-Severity Model and Its Application to Insurance Claims

By:	Dong-Young Lim
Abstract:	This paper proposes a flexible and analytically tractable class of frequency-severity models based on neural networks to parsimoniously capture important empirical observations. In the proposed two-part model, mean functions of frequency and severity distributions are characterized by neural networks to incorporate the non-linearity of input variables. Furthermore, it is assumed that the mean function of the severity distribution is an affine function of the frequency variable to account for a potential linkage between frequency and severity. We provide explicit closed-form formulas for the mean and variance of the aggregate loss within our modelling framework. Components of the proposed model including parameters of neural networks and distribution parameters can be estimated by minimizing the associated negative log-likelihood functionals with neural network architectures. Furthermore, we leverage the Shapely value and recent developments in machine learning to interpret the outputs of the model. Applications to a synthetic dataset and insurance claims data illustrate that our method outperforms the existing methods in terms of interpretability and predictive accuracy.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.10770&r=

Predicting Exporters with Machine Learning

By:	Francesca Micocci; Armando Rungi
Abstract:	In this contribution, we exploit machine learning techniques to predict out-of-sample firms' ability to export based on the financial accounts of both exporters and non-exporters. Therefore, we show how forecasts can be used as exporting scores, i.e., to measure the distance of non-exporters from export status. For our purpose, we train and test various algorithms on the financial reports of 57,021 manufacturing firms in France in 2010-2018. We find that a Bayesian Additive Regression Tree with Missingness In Attributes (BART-MIA) performs better than other techniques with a prediction accuracy of up to $0.90$. Predictions are robust to changes in definitions of exporters and in the presence of discontinuous exporters. Eventually, we argue that exporting scores can be helpful for trade promotion, trade credit, and to assess firms' competitiveness. For example, back-of-the-envelope estimates show that a representative firm with just below-average exporting scores needs up to $44\%$ more cash resources and up to $2.5$ times more capital expenses to reach full export status.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.02512&r=

Catching The Drivers of Inclusive Growth In Sub-Saharan Africa: An Application of Machine Learning

By:	Ofori, Isaac K
Abstract:	A conspicuous lacuna in the literature on Sub-Saharan Africa (SSA) is the lack of clarity on variables key for driving and predicting inclusive growth. To address this, I train the machine learning algorithms for the Standard lasso, the Minimum Schwarz Bayesian Information Criterion (Minimum BIC) lasso, and the Adaptive lasso to study patterns in a dataset comprising 97 covariates of inclusive growth for 43 SSA countries. First, the regularization results show that only 13 variables are key for driving inclusive growth in SSA. Further, the results show that out of the 13, the poverty headcount (US$1.90) matters most. Second, the findings reveal that ‘Minimum BIC lasso’ is best for predicting inclusive growth in SSA. Policy recommendations are provided in line with the region’s green agenda and the coming into force of the African Continental Free Trade Area.
Keywords:	Clean Fuel; Economic Growth; Machine Learning; Lasso; Sub-Saharan Africa; Regularization; Poverty.
JEL:	C52 C53 C55 C63 C87 F6 O1 O55
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:108622&r=

Estimation and Machine Learning Prediction of Imports of Goods in European Countries in the Period 2010-2019

By:	Costantiello, Alberto; Laureti, Lucio; Leogrande, Angelo
Abstract:	In this article we estimate the imports of goods in European countries in the period 2010-2019 for 28 countries. We use Panel Data with Fixed Effects, Panel Data with Random Effects, Pooled OLS, WLS. Our results show that “Imports of Goods” is negatively associated with “Private Consumption Expenditure at Current Prices”, “Consumption of Fixed Capital”, and “Gross Domestic Product” and positively associated with “Harmonised consumer price index” and “Gross Operating Surplus: Total Economy”. Finally, we compare a set of predictive models based on different machine learning techniques using RapidMiner, and we find that “Gradient Boosted Trees”, “Random Forest”, and “Decision Tree” are more efficient then “Deep Learning”, “Generalized Linear Model” and “Support Vector Machine”, in the sense of error minimization, to forecast the degree of “Imports of Goods”.
Keywords:	General Trade, Global Outlook, International Economic Order and Integration, Empirical Studies of Trade, Trade Forecasting and Simulation.
JEL:	F00 F01 F02 F14 F17
Date:	2021–07–05
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:108663&r=

Expl(AI)ned: The impact of explainable artificial intelligence on cognitive processes

By:	Bauer, Kevin; von Zahn, Moritz; Hinz, Oliver
Abstract:	This paper explores the interplay of feature-based explainable AI (XAI) techniques, information processing, and human beliefs. Using a novel experimental protocol, we study the impact of providing users with explanations about how an AI system weighs inputted information to produce individual predictions (LIME) on users' weighting of information and beliefs about the task-relevance of information. On the one hand, we find that feature-based explanations cause users to alter their mental weighting of available information according to observed explanations. On the other hand, explanations lead to asymmetric belief adjustments that we interpret as a manifestation of the confirmation bias. Trust in the prediction accuracy plays an important moderating role for XAI-enabled belief adjustments. Our results show that feature-based XAI does not only superficially influence decisions but really change internal cognitive processes, bearing the potential to manipulate human beliefs and reinforce stereotypes. Hence, the current regulatory efforts that aim at enhancing algorithmic transparency may benefit from going hand in hand with measures ensuring the exclusion of sensitive personal information in XAI systems. Overall, our findings put assertions that XAI is the silver bullet solving all of AI systems' (black box) problems into perspective.
Keywords:	XAI,explainable machine learning,Information Processing,Belief updating,algorithmic transparency
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:zbw:safewp:315&r=

Does Saving Cause Borrowing?

By:	Paolina C. Medina; Michaela Pagel
Abstract:	We study whether savings nudges have the unintended consequence of additional borrowing in high-interest credit. We use data from a pre-registered experiment that encouraged 3.1 million bank customers to save via SMS messages and train a machine learning algorithm to predict individual-level treatment effects. We then focus on individuals who are predicted to save most in response to the intervention and hold credit card debt. We find that these individuals save 5.7% more (61.84 USD per month) but do not change their borrowing: for every additional dollar saved, we can rule out increases of more than two cents in interest expenses.
JEL:	D14 G5
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28956&r=

Short-term electricity price forecastingmodels comparative analysis : Machine Learning vs. Econometrics

By:	Antoine FerrÉ (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School); Guillaume de Certaines (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School); Jérôme Cazelles (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School); Tancrède Cohet (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School); Arash Farnoosh (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School); Frédéric Lantz (IFPEN - IFP Energies nouvelles - IFPEN - IFP Energies nouvelles, IFP School)
Abstract:	This paper gives an overview of several models applied to forecast the day-ahead prices of the German electricity market between 2014 and 2015 using hourly wind and solar productions as well as load. Four econometric models were built: SARIMA, SARIMAX, Holt-Winters and Monte Carlo Markov Chain Switching Regimes. Two machine learning approaches were also studied: a Gaussian mixture classification coupled with a random forest and finally, an LSTM algorithm. The best performances were obtained using the SARIMAX and LSTM models. The SARIMAX model makes good predictions and has the advantage through its explanatory variables to better capture the price volatility. The addition of other explanatory variables could improve the prediction of the models presented. The RF exhibits good results and allows to build a confidence interval. The LSTM model provides excellent results, but the precise understanding of the functioning of this model is much more complex.
Keywords:	Energy Markets,Renewable Energy,Econometric modelling,Bootstrap Method,Merit-Order effect
Date:	2021–05
URL:	http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03262208&r=

Neural network regression for Bermudan option pricing

By:	Bernard Lapeyre (CERMICS - Centre d'Enseignement et de Recherche en Mathématiques et Calcul Scientifique - ENPC - École des Ponts ParisTech, MATHRISK - Mathematical Risk Handling - UPEM - Université Paris-Est Marne-la-Vallée - ENPC - École des Ponts ParisTech - Inria de Paris - Inria - Institut National de Recherche en Informatique et en Automatique); Jérôme Lelong (DAO - Données, Apprentissage et Optimisation - LJK - Laboratoire Jean Kuntzmann - Inria - Institut National de Recherche en Informatique et en Automatique - CNRS - Centre National de la Recherche Scientifique - UGA - Université Grenoble Alpes - Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology - UGA - Université Grenoble Alpes)
Abstract:	The pricing of Bermudan options amounts to solving a dynamic programming principle, in which the main difficulty, especially in high dimension, comes from the conditional expectation involved in the computation of the continuation value. These conditional expectations are classically computed by regression techniques on a finite dimensional vector space. In this work, we study neural networks approximations of conditional expectations. We prove the convergence of the well-known Longstaff and Schwartz algorithm when the standard least-square regression is replaced by a neural network approximation. We illustrate the numerical efficiency of neural networks as an alternative to standard regression methods for approximating conditional expectations on several numerical examples.
Keywords:	Deep learning,Bermudan options,Regression methods,Optimal stopping,Neural networks,optimal stopping,regression methods,deep learning,neural networks
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-02183587&r=

Heterogeneous Treatment Effects in Regression Discontinuity Designs

By:	\'Agoston Reguly
Abstract:	The paper proposes a supervised machine learning algorithm to uncover treatment effect heterogeneity in classical regression discontinuity (RD) designs. Extending Athey and Imbens (2016), I develop a criterion for building an honest ``regression discontinuity tree'', where each leaf of the tree contains the RD estimate of a treatment (assigned by a common cutoff rule) conditional on the values of some pre-treatment covariates. It is a priori unknown which covariates are relevant for capturing treatment effect heterogeneity, and it is the task of the algorithm to discover them, without invalidating inference. I study the performance of the method through Monte Carlo simulations and apply it to the data set compiled by Pop-Eleches and Urquiola (2013) to uncover various sources of heterogeneity in the impact of attending a better secondary school in Romania.
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2106.11640&r=

Clustering and attention model based for Intelligent Trading

By:	Mimansa Rana; Nanxiang Mao; Ming Ao; Xiaohui Wu; Poning Liang; Matloob Khushi
Abstract:	The foreign exchange market has taken an important role in the global financial market. While foreign exchange trading brings high-yield opportunities to investors, it also brings certain risks. Since the establishment of the foreign exchange market in the 20th century, foreign exchange rate forecasting has become a hot issue studied by scholars from all over the world. Due to the complexity and number of factors affecting the foreign exchange market, technical analysis cannot respond to administrative intervention or unexpected events. Our team chose several pairs of foreign currency historical data and derived technical indicators from 2005 to 2021 as the dataset and established different machine learning models for event-driven price prediction for oversold scenario.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.06782&r=

Credit scoring using neural networks and SURE posterior probability calibration

By:	Matthieu Garcin; Samuel St\'ephan
Abstract:	In this article we compare the performances of a logistic regression and a feed forward neural network for credit scoring purposes. Our results show that the logistic regression gives quite good results on the dataset and the neural network can improve a little the performance. We also consider different sets of features in order to assess their importance in terms of prediction accuracy. We found that temporal features (i.e. repeated measures over time) can be an important source of information resulting in an increase in the overall model accuracy. Finally, we introduce a new technique for the calibration of predicted probabilities based on Stein's unbiased risk estimate (SURE). This calibration technique can be applied to very general calibration functions. In particular, we detail this method for the sigmoid function as well as for the Kumaraswamy function, which includes the identity as a particular case. We show that stacking the SURE calibration technique with the classical Platt method can improve the calibration of predicted probabilities.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.07206&r=

Causal inference in case-control studies

By:	Sung Jae Jun (Institute for Fiscal Studies and Pennsylvania State University); Sokbae (Simon) Lee (Institute for Fiscal Studies and Columbia University and IFS)
Abstract:	We investigate identi?cation of causal parameters in case-control and related studies. The odds ratio in the sample is our main estimand of interest and we articulate its relationship with causal parameters under various scenarios. It turns out that the odds ratio is generally a sharp upper bound for counterfactual relative risk under some monotonicity assumptions, without resorting to strong ig-norability, nor to the rare-disease assumption. Further, we propose semparametrically ef?cient, easy-to-implement, machine-learning-friendly estimators of the aggregated (log) odds ratio by exploiting an explicit form of the ef?cient in?uence function. Using our new estimators, we develop methods for causal inference and illustrate the usefulness of our methods by a real-data example.
Date:	2020–05–04
URL:	http://d.repec.org/n?u=RePEc:ifs:cemmap:19/20&r=

Image Content, Complexity, and the Market Value of Art

By:	Stephen Sheppard (Williams College)
Abstract:	This paper presents an approach to measuring the complexity and content of art images that is based on information theory and can be replicated using widely-available analytic tools. The approach is combined with other machine learning algorithms to produce image content measurements for a sample of over 313,000 works offered for sale at auction over the past four decades. The work was produced by 1090 artists employing a variety of styles and using a variety of media and support. Drawing on approaches from economics, mathematics, computer science and psychology, models are estimated to measure the association of image complexity and other image characteristics with the auction price for which the painting was sold. The results support the hypothesis that art buyers have a preference for image complexity and are willing to pay for it. A one standard error increase in the entropy of the image is estimated to be associated with an increased market value of 138%, other factors held equal. We also examine and estimate the impact of faces, likelihood of the image containing racy or adult content, and other content measures. While these don't have as large an estimated impact as image complexity, many of them have large impacts that suggest such measures should be more widely applied in understanding the determinants of the market values of art.
Keywords:	Art Market, Image Processing, Information, Complexity
JEL:	Z11 C81
Date:	2021–07–01
URL:	http://d.repec.org/n?u=RePEc:wil:wileco:2021-08&r=

Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts

By:	Sukjin Han; Eric H. Schulman; Kristen Grauman; Santhosh Ramakrishnan
Abstract:	Many differentiated products have key attributes that are unstructured and thus high-dimensional (e.g., design, text). Instead of treating unstructured attributes as unobservables in economic models, quantifying them can be important to answer interesting economic questions. To propose an analytical framework for this type of products, this paper considers one of the simplest design products -- fonts -- and investigates merger and product differentiation using an original dataset from the world's largest online marketplace for fonts. We quantify font shapes by constructing embeddings from a deep convolutional neural network. Each embedding maps a font's shape onto a low-dimensional vector. In the resulting product space, designers are assumed to engage in Hotelling-type spatial competition. From the image embeddings, we construct two alternative measures that capture the degree of design differentiation. We then study the causal effects of a merger on the merging firm's creative decisions using the constructed measures in a synthetic control method. We find that the merger causes the merging firm to increase the visual variety of font design. Notably, such effects are not captured when using traditional measures for product offerings (e.g., specifications and the number of products) constructed from structured data.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.02739&r=

Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities

By:	Rian Dolphin; Barry Smyth; Yang Xu; Ruihai Dong
Abstract:	Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.03926&r=

Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey

By:	Ali B. Barlas (BBVA Research); Seda Guler Mert (BBVA Research); Berk Orkun Isa (BBVA Research); Alvaro Ortiz (BBVA Research); Tomasa Rodrigo (BBVA Research); Baris Soybilgen (Bilgi University); Ege Yazgan (Bilgi University)
Abstract:	We use the aggregate information from individual-to-firm and firm-to-firm in Garanti BBVA Bank transactions to mimic domestic private demand. Particularly, we replicate the quarterly national accounts aggregate consumption and investment (gross fixed capital formation) and its bigger components (Machinery and Equipment and Construction) in real time for the case of Turkey. In order to validate the usefulness of the information derived from these indicators we test the nowcasting ability of both indicators to nowcast the Turkish GDP using different nowcasting models. The results are successful and confirm the usefulness of Consumption and Investment Banking transactions for nowcasting purposes. The value of the Big data information is more relevant at the beginning of the nowcasting process, when the traditional hard data information is scarce. This makes this information specially relevant for those countries where statistical release lags are longer like the Emerging Markets.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.03299&r=

Exploiting Symmetry in High-Dimensional Dynamic Programming

By:	Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Jesse Perla; Arnav Sood
Abstract:	We propose a new method for solving high-dimensional dynamic programming problems and recursive competitive equilibria with a large (but finite) number of heterogeneous agents using deep learning. The „curse of dimensionality“ is avoided due to four complementary techniques: (1) exploiting symmetry in the approximate law of motion and the value function; (2) constructing a concentration of measure to calculate high-dimensional expectations using a single Monte Carlo draw from the distribution of idiosyncratic shocks; (3) sampling methods to ensure the model fits along manifolds of interest; and (4) selecting the most generalizable over-parameterized deep learning approximation without calculating the stationary distribution or applying a transversality condition. As an application, we solve a global solution of a multi-firm version of the classic Lucas and Prescott (1971) model of „investment under uncertainty.“ First, we compare the solution against a linear-quadratic Gaussian version for validation and benchmarking. Next, we solve nonlinear versions with aggregate shocks. Finally, we describe how our approach applies to a large class of models in economics.
Keywords:	dynamic programming, deep learning, breaking the curse of dimensionality
JEL:	C45 C60 C63
Date:	2021
URL:	http://d.repec.org/n?u=RePEc:ces:ceswps:_9161&r=

End-to-End Risk Budgeting Portfolio Optimization with Neural Networks

By:	Ayse Sinem Uysal; Xiaoyue Li; John M. Mulvey
Abstract:	Portfolio optimization has been a central problem in finance, often approached with two steps: calibrating the parameters and then solving an optimization problem. Yet, the two-step procedure sometimes encounter the "error maximization" problem where inaccuracy in parameter estimation translates to unwise allocation decisions. In this paper, we combine the prediction and optimization tasks in a single feed-forward neural network and implement an end-to-end approach, where we learn the portfolio allocation directly from the input features. Two end-to-end portfolio constructions are included: a model-free network and a model-based network. The model-free approach is seen as a black-box, whereas in the model-based approach, we learn the optimal risk contribution on the assets and solve the allocation with an implicit optimization layer embedded in the neural network. The model-based end-to-end framework provides robust performance in the out-of-sample (2017-2021) tests when maximizing Sharpe ratio is used as the training objective function, achieving a Sharpe ratio of 1.16 when nominal risk parity yields 0.79 and equal-weight fix-mix yields 0.83. Noticing that risk-based portfolios can be sensitive to the underlying asset universe, we develop an asset selection mechanism embedded in the neural network with stochastic gates, in order to prevent the portfolio being hurt by the low-volatility assets with low returns. The gated end-to-end with filter outperforms the nominal risk-parity benchmarks with naive filtering mechanism, boosting the Sharpe ratio of the out-of-sample period (2017-2021) to 1.24 in the market data.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.04636&r=

Exploiting Symmetry in High-Dimensional Dynamic Programming

By:	Mahdi Ebrahimi Kahou; Jesús Fernández-Villaverde; Jesse Perla; Arnav Sood
Abstract:	We propose a new method for solving high-dimensional dynamic programming problems and recursive competitive equilibria with a large (but finite) number of heterogeneous agents using deep learning. The "curse of dimensionality" is avoided due to four complementary techniques: (1) exploiting symmetry in the approximate law of motion and the value function; (2) constructing a concentration of measure to calculate high-dimensional expectations using a single Monte Carlo draw from the distribution of idiosyncratic shocks; (3) sampling methods to ensure the model fits along manifolds of interest; and (4) selecting the most generalizable over-parameterized deep learning approximation without calculating the stationary distribution or applying a transversality condition. As an application, we solve a global solution of a multi-firm version of the classic Lucas and Prescott (1971) model of "investment under uncertainty." First, we compare the solution against a linear-quadratic Gaussian version for validation and benchmarking. Next, we solve nonlinear versions with aggregate shocks. Finally, we describe how our approach applies to a large class of models in economics.
JEL:	C02 E00
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28981&r=

Mapping urban living standards in developing countries with energy consumption data

By:	Agyemang, Felix; Fox, Sean; Memon, Rashid
Abstract:	Data deficits in developing countries impede evidence-based urban planning and policy, as well as fundamental research. We show that residential electricity consumption data can be used to partially address this challenge by serving as a proxy for relative living standards at the block or neighbourhood scale. We illustrate this potential by combining infrastructure and land use data from Open Street Map with georeferenced data from ~2 million residential electricity meters in the megacity of Karachi, Pakistan to map median electricity consumption at block level. Equivalent areal estimates of economic activity derived from high-resolution night lights data (VIIRS) are shown to be a poor predictor of intraurban variation in living standards by comparison. We argue that electricity data are an underutilised source of information that could be used to address empirical questions related to urban poverty and development at relatively high spatial and temporal resolution. Given near universal access to electricity in urban areas globally, this potential is significant
Date:	2021–06–20
URL:	http://d.repec.org/n?u=RePEc:osf:socarx:razb2&r=

Trust predicts compliance to Covid-19 containment policies: evidence from ten countries using big data

By:	Francesco Sarracino; Talita Greyling; Kelsey J. O'Connor; Chiara Peroni; Stephanie Rossouw
Abstract:	Previous evidence indicates that trust is an important correlate of compliance with Covid-19 containment policies. However, this conclusion hinges on two crucial assumptions: first, that compliance does not change over time, and second, that mobility and self-reported measures are good proxies for compliance. We demonstrate that compliance changes over the period March 2020 to January 2021, in ten mostly European countries, and that increasing (decreasing) trust in others predicts increasing (decreasing) compliance. We develop the first time-varying measure of compliance, which is calculated as the association between containment policies and people’s mobility behavior using data from Oxford Policy Tracker and Google. We also develop new measures of both trust in others and national institutions by applying sentiment analysis to Twitter data. We test the predictive role of trust using a variety of dynamic panel regression techniques. This evidence indicates compliance should not be taken for granted and confirms the importance of cultivating social trust.
Keywords:	compliance; covid-19; trust; big data; Twitter.
JEL:	D91 I18 H12
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:usi:wpaper:858&r=

The Demand for Executive Skills

By:	Stephen Hansen; Tejas Ramdas; Raffaella Sadun; Joe Fuller
Abstract:	We use a unique corpus of job descriptions for C-suite positions to document skills requirements in top managerial occupations across a large sample of firms. A novel algorithm maps the text of each executive search into six separate skill clusters reflecting cognitive, interpersonal, and operational dimensions. The data show an increasing relevance of social skills in top managerial occupations, and a greater emphasis on social skills in larger and more information intensive organizations. The results suggest the need for training, search and governance mechanisms able to facilitate the match between firms and top executives along multiple and imperfectly observable skills.
JEL:	J23 J24 M12
Date:	2021–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:28959&r=

Price change prediction of ultra high frequency financial data based on temporal convolutional network

By:	Wei Dai; Yuan An; Wen Long
Abstract:	Through in-depth analysis of ultra high frequency (UHF) stock price change data, more reasonable discrete dynamic distribution models are constructed in this paper. Firstly, we classify the price changes into several categories. Then, temporal convolutional network (TCN) is utilized to predict the conditional probability for each category. Furthermore, attention mechanism is added into the TCN architecture to model the time-varying distribution for stock price change data. Empirical research on constituent stocks of Chinese Shenzhen Stock Exchange 100 Index (SZSE 100) found that the TCN framework model and the TCN (attention) framework have a better overall performance than GARCH family models and the long short-term memory (LSTM) framework model for the description of the dynamic process of the UHF stock price change sequence. In addition, the scale of the dataset reached nearly 10 million, to the best of our knowledge, there has been no previous attempt to apply TCN to such a large-scale UHF transaction price dataset in Chinese stock market.
Date:	2021–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2107.00261&r=

This nep-big issue is ©2021 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.