nep-big 2020-08-24 papers

on Big Data

Issue of 2020‒08‒24
25 papers chosen by
Tom Coupé
University of Canterbury

Social Capital, Networks, and Economic Wellbeing By Hellerstein, Judith K.; Neumark, David
Which bills are lobbied? Predicting and interpreting lobbying activity in the US. By Ivan Slobozhan; Peter Ormosi; Rajesh Sharma
A Novel Ensemble Deep Learning Model for Stock Prediction Based on Stock Prices and News By Yang Li; Yi Pan
Du concept à la mise en œuvre du machine learning dans les entreprises : L'expérience de Datapred By Henri Bourdeau; Corentin Petit; Christophe Midler
China's Missing Pigs: Correcting China's Hog Inventory Data Using a Machine Learning Approach By Yongtong Shao; Minghao Li; Dermot J. Hayes; Wendong Zhang; Tao Xiong; Wei Xie
Nvidia’s stock returns prediction using machine learning techniques for time series forecasting problem By Marcin Chlebus; Michał Dyczko; Michał Woźniak
A Note on the Interpretability of Machine Learning Algorithms By Dominique Guegan
A Note on the Interpretability of Machine Learning Algorithms By Dominique Guégan
AI in FinTech: A Research Agenda By Longbing Cao
Machine Learning approach for Credit Scoring By A. R. Provenzano; D. Trifir\`o; A. Datteo; L. Giada; N. Jean; A. Riciputi; G. Le Pera; M. Spadaccino; L. Massaron; C. Nordio
Misogynistic and xenophobic hate language online: a matter of anonymity By von Essen, Emma; Jansson, Joakim
Data Mining and Machine Learning Techniques for Cyber Security Intrusion Detection By K., Sai Manoj; Aithal, Sreeramana
Macroeconomic Data Transformations Matter By Philippe Goulet Coulombe; Maxime Leroux; Dalibor Stevanovic; St\'ephane Surprenant
The Mode Treatment Effect By Neng-Chieh Chang
Data-Driven Option Pricing using Single and Multi-Asset Supervised Learning By Anindya Goswami; Sharan Rajani; Atharva Tanksale
Local mortality estimates during the COVID-19 pandemic in Italy By Augusto Cerqua; Roberta Di Stefano; Marco Letta; Sara Miccoli
Google Correlate y Google Trends como herramientas para realizar un nowcast de las ventas minoristas By María Florencia Camusso; Ramiro Emmanuel Jorge
Competing Models By Montiel Olea, José Luis; Ortoleva, Pietro; Pai, Mallesh; Prat, Andrea
Hybrid ARFIMA Wavelet Artificial Neural Network Model for DJIA Index Forecasting By Heni Boubaker; Giorgio Canarella; Rangan Guzpta; Stephen M. Miller
HRP performance comparison in portfolio optimization under various codependence and distance metrics By Illya Barziy; Marcin Chlebus
Multi-stream RNN for Merchant Transaction Prediction By Zhongfang Zhuang; Chin-Chia Michael Yeh; Liang Wang; Wei Zhang; Junpeng Wang
Relative wealth concerns with partial information and heterogeneous priors By Chao Deng; Xizhi Su; Chao Zhou
Big Data for Sampling Design : The Venezuelan Migration Crisis in Ecuador By Munoz,Juan Eduardo; Gallegos Munoz,Jose Victor; Olivieri,Sergio Daniel
Weighted Accuracy Algorithmic Approach In Counteracting Fake News And Disinformation By Kwadwo Osei Bonsu
Deep xVA solver - A neural network based counterparty credit risk management framework By Alessandro Gnoatto; Athena Picarelli; Christoph Reisinger

Social Capital, Networks, and Economic Wellbeing

By:	Hellerstein, Judith K. (University of Maryland); Neumark, David (University of California, Irvine)
Abstract:	One definition of social capital is the "networks of relationships among people who live and work in a particular society, enabling that society to function effectively". This definition of social capital highlights two key features. First, it refers to connections between people, shifting our focus from characteristics of individuals and families to the ties between them. Second, it emphasizes that social capital is present not simply because individuals are connected, but rather when these network relationships lead to productive social outcomes. In that sense, social capital is productive capital, in the same way that economists think of physical capital or human capital as productive capital. Social capital, under this definition, is still very broad. Networks can be formed along many dimensions of society in which people interact – neighborhoods, workplaces, extended families, schools, etc. We focus on networks whose existence fosters social capital in one specific way: by facilitating the transfer of information that helps improve the economic wellbeing of network members, especially via better labor market outcomes. We review evidence showing that networks play this important role in labor market outcomes, as well as in other outcomes related to economic wellbeing, paying particular attention to evidence of how networks can help less-skilled individuals. We also discuss the measurement of social capital, including new empirical methods in machine learning that might provide new evidence on the underlying connections that do – or might – lead to productive networks. Throughout, we discuss the policy implications of what we know so far about networks and social capital.
Keywords:	social capital, networks
JEL:	J1 J8
Date:	2020–06
URL:	http://d.repec.org/n?u=RePEc:iza:izadps:dp13413&r=all

Which bills are lobbied? Predicting and interpreting lobbying activity in the US.

By:	Ivan Slobozhan (Institute of Computer Science, University of Tartu); Peter Ormosi (Centre for Competition Policy and Norwich Business School, University of East Anglia); Rajesh Sharma (Institute of Computer Science, University of Tartu)
Abstract:	Using lobbying data from OpenSecrets.org, we offer several experiments applying machine learning techniques to predict if a piece of legislation (US bill) has been subjected to lobbying activities or not. We also investigate the influence of the intensity of the lobbying activity on how discernible a lobbied bill is from one that was not subject to lobbying. We compare the performance of a number of different models (logistic regression, random forest, CNN and LSTM) and text embedding representations (BOW, TF-IDF, GloVe, Law2Vec). We report results of above 0.85% ROC AUC scores, and 78% accuracy. Model performance significantly improves (95% ROC AUC, and 88% accuracy) when bills with higher lobbying intensity are looked at. We also propose a method that could be used for unlabelled data. Through this we show that there is a considerably large number of previously unlabelled US bills where our predictions suggest that some lobbying activity took place. We believe our method could potentially contribute to the enforcement of the US Lobbying Disclosure Act (LDA) by indicating the bills that were likely to have been affected by lobbying but were not led as such.
Keywords:	lobbying; rent seeking; text classification; US bills
Date:	2020–01–01
URL:	http://d.repec.org/n?u=RePEc:uea:ueaccp:2020_03&r=all

A Novel Ensemble Deep Learning Model for Stock Prediction Based on Stock Prices and News

By:	Yang Li; Yi Pan
Abstract:	In recent years, machine learning and deep learning have become popular methods for financial data analysis, including financial textual data, numerical data, and graphical data. This paper proposes to use sentiment analysis to extract useful information from multiple textual data sources and a blending ensemble deep learning model to predict future stock movement. The blending ensemble model contains two levels. The first level contains two Recurrent Neural Networks (RNNs), one Long-Short Term Memory network (LSTM) and one Gated Recurrent Units network (GRU), followed by a fully connected neural network as the second level model. The RNNs, LSTM, and GRU models can effectively capture the time-series events in the input data, and the fully connected neural network is used to ensemble several individual prediction results to further improve the prediction accuracy. The purpose of this work is to explain our design philosophy and show that ensemble deep learning technologies can truly predict future stock price trends more effectively and can better assist investors in making the right investment decision than other traditional methods.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2007.12620&r=all

Du concept à la mise en œuvre du machine learning dans les entreprises : L'expérience de Datapred

By:	Henri Bourdeau (DataPred); Corentin Petit (DataPred); Christophe Midler (i3-CRG - Centre de recherche en gestion i3 - X - École polytechnique - Université Paris-Saclay - CNRS - Centre National de la Recherche Scientifique)
Abstract:	Artificial Intelligence is the subject of exceptional enthusiasm in the industry, but in practice, there is little concrete data to attest to the conditions and results of its implementation. The PIC Master's project made it possible to experiment with the concrete deployment of a machine learning offer in a leading company in the field of time data processing, Datapred. The lessons learned are threefold. First, the importance of the Proof Of Concept phase in the implementation of these new technologies: a key step in ensuring the credibility of the concrete effectiveness of these technologies for customers, a major step in learning the business issues addressed for data analysis experts. Then, the magnitude of the distance between the promise of a successful POC and the sale of finalized AI software. Due to the complex decision-making processes at large corporate customers as well as the coordination of the work of POC engineers and product developers in the software design company. Finally, the need, in order to cross this distance, to make changes in the strategy and organization of the AI company, changes relating to its software design, the organization of its R&D and its business model.
Abstract:	L'Intelligence Artificielle est l'objet d'un engouement exceptionnel dans l'industrie, mais, dans la pratique, peu de données concrètes permettent d'attester des conditions et des résultats de sa mise en œuvre. Le projet de Master PIC a permis d'expérimenter le déploiement concret d'une offre de machine learning dans une entreprise leader du domaine du traitement des données temporelles, Datapred. Les enseignements tirés sont triples. D'abord, l'importance de la phase de Proof Of Concept dans la mise en œuvre de ces nouvelles technologies : étape clé de crédibilité de l'efficacité concrète de ces technologies pour les clients, étape majeure d'apprentissage des problématiques métiers abordées pour les experts de l'analyse de données. Ensuite, l'ampleur de la distance qui sépare la promesse d'un POC réussi de la vente d'un logiciel d'IA finalisé. Du fait des processus décisionnels complexes chez les grandes entreprises clientes comme de la coordination du travail des ingénieurs POC et des développeurs produit dans l'entreprise de conception logicielle. Enfin, la nécessité, pour franchir cette distance, d'opérer des mutations de la stratégie et de l'organisation de l'entreprise d'IA, mutation portant sur sa conception logiciel, l'organisation de sa R&D et son business model.
Keywords:	Machine Learning,Proof Of Concept,Digital transition,Growth Strategy,Intelligence articficielle,apprentissage automatique,POC,conception logiciel,stratégie de croissance.
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:hal:journl:hal-02873935&r=all

China's Missing Pigs: Correcting China's Hog Inventory Data Using a Machine Learning Approach

By:	Yongtong Shao; Minghao Li; Dermot J. Hayes (Center for Agricultural and Rural Development (CARD)); Wendong Zhang (Center for Agricultural and Rural Development (CARD)); Tao Xiong; Wei Xie
Abstract:	Small sample size often limits forecasting tasks such as the prediction of production, yield, and consumption of agricultural products. Machine learning offers an appealing alternative to traditional forecasting methods. In particular, Support Vector Regression has superior forecasting performance in small sample applications. In this article, we introduce Support Vector Regression via an application to China's hog market. Since 2014, China's hog inventory data has experienced an abnormal decline that contradicts price and consumption trends. We use Support Vector Regression to predict the true inventory based on the price-inventory relationship before 2014. We show that, in this application with a small sample size, Support Vector Regression out-performs neural networks, random forest, and linear regression. Predicted hog inventory decreased by 3.9% from November 2013 to September 2017, instead of the 25.4% decrease in the reported data.
Date:	2020–08
URL:	http://d.repec.org/n?u=RePEc:ias:cpaper:20-wp607&r=all

Nvidia’s stock returns prediction using machine learning techniques for time series forecasting problem

By:	Marcin Chlebus (Faculty of Economic Sciences, University of Warsaw); Michał Dyczko (Faculty of Mathematics and Computer Science, Warsaw University of Technology); Michał Woźniak (Faculty of Economic Sciences, University of Warsaw)
Abstract:	The main aim of this paper was to predict daily stock returns of Nvidia Corporation company quoted on Nasdaq Stock Market. The most important problems in this research are: statistical specificity of return ratios i.e. time series might occur to be a white noise and the fact of necessity of applying many atypical machine learning methods to handle time factor influence. The period of study covered 07/2012 - 12/2018. Models used in this paper were: SVR, KNN, XGBoost, LightGBM, LSTM, ARIMA, ARIMAX. Features which, were used in models comes from such classes like: technical analysis, fundamental analysis, Google Trends entries, markets related to Nvidia. It was empirically proved that there is a possibility to construct prediction model of Nvidia daily return ratios which can outperform simple naive model. The best performance was obtained by SVR based on stationary attributes. Generally, it was shown that models based on stationary variables perform better than models based on stationary and non-stationary variables. Ensemble approach designed especially for time series failed to make an improvement in forecast precision. It seems that usage of machine learning models for the problem of time series with various explanatory variable classes brings good results.
Keywords:	nvidia, stock returns, machine learning, technical analysis, fundamental analysis, google trends, stationarity, ensembling
JEL:	C32 C38 C44 C51 C52 C61 C65 G11 G15
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:war:wpaper:2020-22&r=all

A Note on the Interpretability of Machine Learning Algorithms

By:	Dominique Guegan (UP1 - Université Panthéon-Sorbonne, CES - Centre d'économie de la Sorbonne - UP1 - Université Panthéon-Sorbonne - CNRS - Centre National de la Recherche Scientifique, University of Ca’ Foscari [Venice, Italy])
Abstract:	We are interested in the analysis of the concept of interpretability associated with a ML algorithm. We distinguish between the "How", i.e., how a black box or a very complex algorithm works, and the "Why", i.e. why an algorithm produces such a result. These questions appeal to many actors, users, professions, regulators among others. Using a formal standardized framework , we indicate the solutions that exist by specifying which elements of the supply chain are impacted when we provide answers to the previous questions. This presentation, by standardizing the notations, allows to compare the different approaches and to highlight the specificities of each of them: both their objective and their process. The study is not exhaustive and the subject is far from being closed.
Keywords:	Interpretability,Counterfactual approach,Artificial Intelligence,Agnostic models,LIME method,Machine learning
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:hal:cesptp:halshs-02900929&r=all

A Note on the Interpretability of Machine Learning Algorithms

By:	Dominique Guégan (Université Paris1 Panthéon-Sorbonne, Centre d'Economie de la Sorbonne, - Ca' Foscari University of Venezia)
Abstract:	We are interested in the analysis of the concept of interpretability associated with a ML algorithm. We distinguish between the "How", i.e., how a black box or a very complex algorithm works, and the "Why", i.e. why an algorithm produces such a result. These questions appeal to many actors, users, professions, regulators among others. Using a formal standardized framework, we indicate the solutions that exist by specifying which elements of the supply chain are impacted when we provide answers to the previous questions. This presentation, by standardizing the notations, allows to compare the different approaches and to highlight the specificities of each of them: both their objective and their process. The study is not exhaustive and the subject is far from being closed
Keywords:	Agnostic models; Artificial Intelligence; Counterfactual approach; Interpretability; LIME method; Machine learning
JEL:	C K
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:mse:cesdoc:20012&r=all

AI in FinTech: A Research Agenda

By:	Longbing Cao
Abstract:	Smart FinTech has emerged as a new area that synthesizes and transforms AI and finance, and broadly data science, machine learning, economics, etc. Smart FinTech also transforms and drives new economic and financial businesses, services and systems, and plays an increasingly important role in economy, technology and society transformation. This article presents a highly summarized research overview of smart FinTech, including FinTech businesses and challenges, various FinTech-associated data and repositories, FinTech-driven business decision and optimization, areas in smart FinTech, and research methods and techniques for smart FinTech.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2007.12681&r=all

Machine Learning approach for Credit Scoring

By:	A. R. Provenzano; D. Trifir\`o; A. Datteo; L. Giada; N. Jean; A. Riciputi; G. Le Pera; M. Spadaccino; L. Massaron; C. Nordio
Abstract:	In this work we build a stack of machine learning models aimed at composing a state-of-the-art credit rating and default prediction system, obtaining excellent out-of-sample performances. Our approach is an excursion through the most recent ML / AI concepts, starting from natural language processes (NLP) applied to economic sectors' (textual) descriptions using embedding and autoencoders (AE), going through the classification of defaultable firms on the base of a wide range of economic features using gradient boosting machines (GBM) and calibrating their probabilities paying due attention to the treatment of unbalanced samples. Finally we assign credit ratings through genetic algorithms (differential evolution, DE). Model interpretability is achieved by implementing recent techniques such as SHAP and LIME, which explain predictions locally in features' space.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2008.01687&r=all

Misogynistic and xenophobic hate language online: a matter of anonymity

By:	von Essen, Emma (Swedish Institute for Social Research, Stockholm University); Jansson, Joakim (Linneaus University)
Abstract:	In this paper, we quantify hateful content in online civic discussions of politics and es- timate the causal link between hateful content and writer anonymity. To measure hate, we first develop a supervised machine-learning model that predicts hate against foreign residents and hate against women on a dominant Swedish Internet discussion forum. We find that an exogenous decrease in writer anonymity leads to less hate against foreign residents but an increase in hate against women. We conjecture that the mechanisms behind the changes comprise a combination of users decreasing the amount of their hate- ful writing and a substitution of hate against foreign residents for hate against women. The discussion of the results highlights the role of social repercussions in discouraging antisocial and criminal activities.
Keywords:	online hate; anonymity; discussion forum; machine learning; big data
JEL:	C55 D00 D80 D90
Date:	2020–08–20
URL:	http://d.repec.org/n?u=RePEc:hhs:sofiwp:2020_007&r=all

Data Mining and Machine Learning Techniques for Cyber Security Intrusion Detection

By:	K., Sai Manoj; Aithal, Sreeramana
Abstract:	An interference discovery framework is customizing that screens a singular or an arrangement of PCs for toxic activities that are away for taking or blue-penciling information or spoiling framework shows. The most methodology used as a piece of the present interference recognition framework is not prepared to deal with the dynamic and complex nature of computerized attacks on PC frameworks. In spite of the way that compelling adaptable methodologies like various frameworks of AI can realize higher discovery rates, cut down bogus alert rates and reasonable estimation and correspondence cost. The use of data mining can realize ceaseless model mining, request, gathering and littler than ordinary data stream. This examination paper portrays a connected with composing audit of AI and data delving procedures for advanced examination in the assistance of interference discovery. In perspective on the number of references or the congruity of a rising methodology, papers addressing each procedure were recognized, examined, and compacted. Since data is so fundamental in AI and data mining draws near, some striking advanced educational records used as a piece of AI and data burrowing are depicted for computerized security is shown, and a couple of recommendations on when to use a given system are given.
Keywords:	Cloud Computing, Data mining, Block Chain, Machine Learning, Cyber Security, Attacks, ADS, SMV.
JEL:	G0 K3 K32
Date:	2020–02–25
URL:	http://d.repec.org/n?u=RePEc:pra:mprapa:101753&r=all

Macroeconomic Data Transformations Matter

By:	Philippe Goulet Coulombe; Maxime Leroux; Dalibor Stevanovic; St\'ephane Surprenant
Abstract:	From a purely predictive standpoint, rotating the predictors' matrix in a low-dimensional linear regression setup does not alter predictions. However, when the forecasting technology either uses shrinkage or is non-linear, it does. This is precisely the fabric of the machine learning (ML) macroeconomic forecasting environment. Pre-processing of the data translates to an alteration of the regularization -- explicit or implicit -- embedded in ML algorithms. We review old transformations and propose new ones, then empirically evaluate their merits in a substantial pseudo-out-sample exercise. It is found that traditional factors should almost always be included in the feature matrix and moving average rotations of the data can provide important gains for various forecasting targets.
Date:	2020–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2008.01714&r=all

The Mode Treatment Effect

By:	Neng-Chieh Chang
Abstract:	Mean, median, and mode are three essential measures of the centrality of probability distributions. In program evaluation, the average treatment effect (mean) and the quantile treatment effect (median) have been intensively studied in the past decades. The mode treatment effect, however, has long been neglected in program evaluation. This paper fills the gap by discussing both the estimation and inference of the mode treatment effect. I propose both traditional kernel and machine learning methods to estimate the mode treatment effect. I also derive the asymptotic properties of the proposed estimators and find that both estimators follow the asymptotic normality but with the rate of convergence slower than the regular rate $\sqrt{N}$, which is different from the rates of the classical average and quantile treatment effect estimators.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2007.11606&r=all

Data-Driven Option Pricing using Single and Multi-Asset Supervised Learning

By:	Anindya Goswami; Sharan Rajani; Atharva Tanksale
Abstract:	We propose three different data driven approaches for pricing European style call options using supervised machine-learning algorithms. The proposed approaches are tested on two stock market indices, NIFTY50 and BANKNIFTY from the Indian equity market. Although neither historical nor implied volatility is used as an input, the results show that the trained models have been able to capture the option pricing mechanism better than or similar to the Black Scholes formula for all the experiments. Our choice of scale free I/O allows us to train models using combined data of multiple different assets from a financial market. This not only allows the models to achieve far better generalization and predictive capability, but also solves the problem of paucity of data, the primary limitation of using machine learning techniques. We also illustrate the performance of the trained models in the period leading up to the 2020 Stock Market Crash, Jan 2019 to April 2020.
Date:	2020–08
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2008.00462&r=all

Local mortality estimates during the COVID-19 pandemic in Italy

By:	Augusto Cerqua (Department of Social Sciences and Economics, Sapienza University of Rome); Roberta Di Stefano (Department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome); Marco Letta (Department of Social Sciences and Economics, Sapienza University of Rome); Sara Miccoli (Department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome)
Abstract:	Estimates of the real death toll of the COVID-19 pandemic have proven to be problematic in many countries, and Italy is no exception. Mortality estimates at the local level are even more uncertain as they require strict conditions, such as granularity and accuracy of the data at hand, which are rarely met. The ‘official’ approach adopted by public institutions to estimate the ‘excess of mortality’ during the pandemic is based on a comparison between observed all-cause mortality data for 2020 with an average of mortality figures in the past years for the same period. In this paper, we show that more sophisticated approaches such as counterfactual and machine learning techniques outperform the official method by improving prediction accuracy by up to 18%, thus providing a more realistic picture of local excess mortality. The predictive gain is particularly sizable for small- and medium-sized municipalities. After showing the superiority of data-driven statistical methods, we apply the best-performing algorithms to generate a municipality-level dataset of local excess mortality estimates during the COVID-19 pandemic. This dataset is publicly shared and will be periodically updated as new data become available.
Keywords:	COVID-19, coronavirus, mortality estimates, Italy, municipalities
JEL:	C21 C52 I10 J11
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:saq:wpaper:14/20&r=all

Google Correlate y Google Trends como herramientas para realizar un nowcast de las ventas minoristas

By:	María Florencia Camusso; Ramiro Emmanuel Jorge
Abstract:	El trabajo internaliza información proveniente de las herramientas Google Trends y Google Correlate, con el objetivo de realizar un nowcast de las ventas de supermercados de la Provincia de Santa Fe; indicador que se publica con algunos meses de rezago. En primer lugar se identifican un conjunto de variables proxies con alto poder predictivo y luego se plantea un método de agregación para incorporar los patrones de búsqueda a la serie target. Las estimaciones obtenidas con el modelo, son contrastadas con datos reales de la serie target (ex post) y con los forecasts que arroja el X13-ARIMA-SEATS. Los resultados indican que las herramientas y el procedimiento adoptado permiten realizar una estimación consistente y ganar oportunidad respecto a las publicaciones oficiales.
Keywords:	Cycles, nowcast, big data, Google tools Argentino
JEL:	E27 E32
Date:	2019–11
URL:	http://d.repec.org/n?u=RePEc:aep:anales:4127&r=all

Competing Models

By:	Montiel Olea, José Luis; Ortoleva, Pietro; Pai, Mallesh; Prat, Andrea
Abstract:	We develop a model in which different agents compete to predict a variable of interest. This variable is related to observables via an unknown data generating process. All agents are Bayesian, but may have 'misspecified models' of the world, i.e., they consider different subsets of observables to make their prediction. After observing a common dataset, who has the highest confidence in her predictive ability? We characterize it and show that it crucially depends on the size of the dataset. With big data, we show it is typically 'large dimensional,' possibly using more variables than the true model. With small data, we show (under additional assumptions) that it is an agent using a model that is 'small-dimensional,' in the sense of considering fewer covariates than the true data generating process. The theory is applied to auctions of assets where bidders observe the same information but hold different priors.
Date:	2019–10
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:14066&r=all

Hybrid ARFIMA Wavelet Artificial Neural Network Model for DJIA Index Forecasting

By:	Heni Boubaker (International University of Rabat); Giorgio Canarella (University of Nevada, Las Vegas); Rangan Guzpta (University of Pretoria); Stephen M. Miller (University of Nevada, Las Vegas)
Abstract:	This paper proposes a hybrid modelling approach for forecasting returns and volatilities of the stock market. The model, called ARFIMA-WLLWNN model, integrates the advantages of the ARFIMA model, the wavelet decomposition technique (namely, the discrete MODWT with Daubechies least asymmetric wavelet ﬁlter) and artificial neural network (namely, the LLWNN neural network). The model develops through a two-phase approach. In phase one, a wavelet decomposition improves the forecasting accuracy of the LLWNN neural network, resulting in the Wavelet Local Linear Wavelet Neural Network (WLLWNN) model. The Back Propagation (BP) and Particle Swarm Optimization (PSO) learning algorithms optimize the WLLWNN structure. In phase two, the residuals of an ARFIMA model of the conditional mean become the input to the WLLWNN model. The hybrid ARFIMA-WLLWNN model is evaluated using daily closing prices for the Dow Jones Industrial Average (DJIA) index over 01/01/2010 to 02/11/2020. The experimental results indicate that the PSO-optimized version of the hybrid ARFIMA-WLLWNN outperforms the LLWNN, WLLWNN, ARFIMA-LLWNN, and the ARFIMA-HYAPARCH models and provides more accurate out-of-sample forecasts over validation horizons of one, five and twenty-two days.
Keywords:	Wavelet decomposition, WLLWNN, Neural network, ARFIMA, HYGARCH
JEL:	C45 C58 G17
Date:	2020–08
URL:	http://d.repec.org/n?u=RePEc:uct:uconnp:2020-10&r=all

HRP performance comparison in portfolio optimization under various codependence and distance metrics

By:	Illya Barziy; Marcin Chlebus (Faculty of Economic Sciences, University of Warsaw)
Abstract:	Problem of portfolio optimization was formulated almost 70 years ago in the works of Harry Markowitz. However, the studies of possible optimization methods are still being provided in order to obtain better results of asset allocation using the empirical approximations of codependences between assets. In this work various codependences and metrics are tested in the Hierarchical Risk Parity algorithm to determine whether the results obtained are superior to those of the standard Pearson correlation as a measure of codependence. In order to compare how HRP uses the information from alternative codependence metrics, the MV, IVP, and CLA optimization algorithms were used on the same data. Dataset used for comparison consisted of 32 ETFs representing equity of different regions and sectors as well as bonds and commodities. The time period tested was 01.01.2007-20.12.2019. Results show that alternative codependence metrics show worse results in terms of Sharpe ratios and maximum drawdowns in comparison to the standard Pearson correlation for each optimization method used. The added value of this work is using alternative codependence and distance metrics on real data, and including transaction costs to determine their impact on the result of each algorithm.
Keywords:	Hierarchical Risk Parity, portfolio optimization, ETF, hierarchical structure, clustering, backtesting, distance metrics, risk management, machine learning
JEL:	C32 C38 C44 C51 C52 C61 C65 G11 G15
Date:	2020
URL:	http://d.repec.org/n?u=RePEc:war:wpaper:2020-21&r=all

Multi-stream RNN for Merchant Transaction Prediction

By:	Zhongfang Zhuang; Chin-Chia Michael Yeh; Liang Wang; Wei Zhang; Junpeng Wang
Abstract:	Recently, digital payment systems have significantly changed people's lifestyles. New challenges have surfaced in monitoring and guaranteeing the integrity of payment processing systems. One important task is to predict the future transaction statistics of each merchant. These predictions can thus be used to steer other tasks, ranging from fraud detection to recommendation. This problem is challenging as we need to predict not only multivariate time series but also multi-steps into the future. In this work, we propose a multi-stream RNN model for multi-step merchant transaction predictions tailored to these requirements. The proposed multi-stream RNN summarizes transaction data in different granularity and makes predictions for multiple steps in the future. Our extensive experimental results have demonstrated that the proposed model is capable of outperforming existing state-of-the-art methods.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2008.01670&r=all

Relative wealth concerns with partial information and heterogeneous priors

By:	Chao Deng; Xizhi Su; Chao Zhou
Abstract:	We establish a Nash equilibrium in a market with $ N $ agents with the performance criteria of relative wealth level when the market return is unobservable. Each investor has a random prior belief on the return rate of the risky asset. The investors can be heterogeneous in both the mean and variance of the prior. By a separation result and a martingale argument, we show that the optimal investment strategy under a stochastic return rate model can be characterized by a fully-coupled linear FBSDE. Two sets of deep neural networks are used for the numerical computation to first find each investor's estimate of the mean return rate and then solve the FBSDEs. We establish the existence and uniqueness result for the class of FBSDEs with stochastic coefficients and solve the utility game under partial information using deep neural network function approximators. We demonstrate the efficiency and accuracy by a base-case comparison with the solution from the finite difference scheme in the linear case and apply the algorithm to the general case of nonlinear hidden variable process. Simulations of investment strategies show a herd effect that investors trade more aggressively under relativeness concerns. Statistical properties of the investment strategies and the portfolio performance, including the Sharpe ratios and the Variance Risk ratios (VRRs) are examed. We observe that the agent with the most accurate prior estimate is likely to lead the herd, and the effect of competition on heterogeneous agents varies more with market characteristics compared to the homogeneous case.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2007.11781&r=all

Big Data for Sampling Design : The Venezuelan Migration Crisis in Ecuador

By:	Munoz,Juan Eduardo; Gallegos Munoz,Jose Victor; Olivieri,Sergio Daniel
Abstract:	The worsening of Ecuador's socioeconomic conditions and the rapid inflow of Venezuelan migrants demand a rapid government response. Representative information on the migration and host communities is vital for evidence-based policy design. This study presents an innovative methodology based on the use of big data for sampling design of a representative survey of migrants and host communities'populations. This approach tackles the difficulties posed by the lack of information on the total number of Venezuelan migrants?regular and irregular?and their geographical location in the country. The total estimated population represents about 3 percent of the total Ecuadoran population. Venezuelans settled across urban areas, mainly in Quito, Guayaquil, and Manta (Portoviejo). The strategy implemented may be useful in designing similar exercises in countries with limited information (that is, lack of a recent census or migratory registry) and scarce resources for rapidly gathering socioeconomic data on migrants and host communities for policy design.
Keywords:	ICTApplications,Public Sector Administrative&Civil Service Reform,Economics and Finance of Public Institution Development,Democratic Government,State Owned Enterprise Reform,Public Sector Administrative and Civil Service Reform,De Facto Governments,Telecommunications Infrastructure,Inequality
Date:	2020–07–22
URL:	http://d.repec.org/n?u=RePEc:wbk:wbrwps:9329&r=all

Weighted Accuracy Algorithmic Approach In Counteracting Fake News And Disinformation

By:	Kwadwo Osei Bonsu
Abstract:	As the world is becoming more dependent on the internet for information exchange, some overzealous journalists, hackers, bloggers, individuals and organizations tend to abuse the gift of free information environment by polluting it with fake news, disinformation and pretentious content for their own agenda. Hence, there is the need to address the issue of fake news and disinformation with utmost seriousness. This paper proposes a methodology for fake news detection and reporting through a constraint mechanism that utilizes the combined weighted accuracies of four machine learning algorithms.
Date:	2020–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:2008.01535&r=all

Deep xVA solver - A neural network based counterparty credit risk management framework

By:	Alessandro Gnoatto (Department of Economics (University of Verona)); Athena Picarelli (Department of Economics (University of Verona)); Christoph Reisinger (University of Oxford)
Abstract:	In this paper, we present a novel computational framework for portfolio-wide risk management problems where the presence of a potentially large number of risk factors makes traditional numerical techniques ineffective. The new method utilises a coupled system of BSDEs for the valuation adjustments (xVA) and solves these by a recursive application of a neural network based BSDE solver. This not only makes the computation of xVA for high-dimensional problems feasible, but also produces hedge ratios and dynamic risk measures for xVA, and allows simulations of the collateral account.
Keywords:	CVA, DVA, FVA, ColVA, xVA, EPE, Collateral, xVA hedging, Deep BSDE Solver
JEL:	G12 G13 C63
Date:	2020–05
URL:	http://d.repec.org/n?u=RePEc:ver:wpaper:07/2020&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.