nep-big New Economics Papers
on Big Data
Issue of 2020‒10‒19
twenty-six papers chosen by
Tom Coupé
University of Canterbury

  1. Using Satellite Imagery to Understand and Promote Sustainable Development By Marshall Burke; Anne Driscoll; David Lobell; Stefano Ermon
  2. DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis By Chuheng Zhang; Yuanqi Li; Xi Chen; Yifei Jin; Pingzhong Tang; Jian Li
  3. Deep Learning algorithms for solving high dimensional nonlinear Backward Stochastic Differential Equations By Lorenc Kapllani; Long Teng
  4. Artificial Intelligence and High-Skilled Work: Evidence from Analysts By Jillian Grennan; Roni Michaely
  5. Classification of monetary and fiscal dominance regimes using machine learning techniques By Hinterlang, Natascha; Hollmayr, Josef
  6. What factors determine unequal suburbanisation? New evidence from Warsaw, Poland By Honorata Bogusz; Szymon Winnicki; Piotr Wójcik
  7. Prediction intervals for Deep Neural Networks By Tullio Mancini; Hector Calvo-Pardo; Jose Olmo
  8. Street Network Models and Indicators for Every Urban Area in the World By Boeing, Geoff
  9. Transparency, Auditability and eXplainability of Machine Learning Models in Credit Scoring By Michael B\"ucker; Gero Szepannek; Alicja Gosiewska; Przemyslaw Biecek
  10. Fractional differentiation and its use in machine learning By Janusz Gajda; Rafał Walasek
  11. Who benefits from health insurance? Uncovering heterogeneous policy impacts using causal machine learning By Noemi Kreif; Andrew Mirelman; Rodrigo Moreno-Serra; Taufik Hidayat,; Karla DiazOrdaz; Marc Suhrcke
  12. Intangible capital indicators based on web scraping of social media By Breithaupt, Patrick; Kesler, Reinhold; Niebel, Thomas; Rammer, Christian
  13. Predicting Non Farm Employment By Tarun Bhatia
  14. An AI approach to measuring financial risk By Lining Yu; Wolfgang Karl H\"ardle; Lukas Borke; Thijs Benschop
  15. Modelling and mapping the intra-urban spatial distribution of Plasmodium falciparum parasite rate using very-high-resolution satellite derived indicators By Stefanos Georganos; Oscar Brousse; Sébastien Dujardin; Catherine Linard; Daniel Casey; Marco Milliones; Benoit Parmentier; Nicole P M Van Lipzig; Matthias Demuzere; Taïs Grippa; Sabine Vanhuysse; Nicholus Mboga; Verónica Andreo; Robert William B R.W. Snow; Moritz Lennert
  16. Firm-Level Risk Exposures and Stock Returns in the Wake of COVID-19 By Steven J. Davis; Stephen Hansen; Cristhian Seminario-Amez
  17. Forecasting impacts of Agricultural Production on Global Maize Price By Rotem Zelingher; David Makowski; Thierry Brunelle
  18. Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics By Mosavi, Amir; Faghan, Yaser; Ghamisi, Pedram; Duan, Puhong; Ardabili, Sina Faizollahzadeh; Hassan, Salwana; Band, Shahab S.
  19. Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices By Nadja Klein; Michael Stanley Smith; David J. Nott
  20. Learning Classifiers under Delayed Feedback with a Time Window Assumption By Masahiro Kato; Shota Yasui
  21. Deep Learning for Digital Asset Limit Order Books By Rakshit Jha; Mattijs De Paepe; Samuel Holt; James West; Shaun Ng
  23. Violencias basadas en género en tiempos de Covid- 19 By Susana Martínez-Restrepo; Lina Tafur Marín; Juan Guillermo Osio; Pablo Cortés
  24. Heterogeneous effects of waste pricing policies By Marica Valente
  25. Transfer Payment Systems and Financial Distress: Insights from Health Insurance Premium Subsidies By Schmid, Christian P. R.; Schreiner, Nicolas; Stutzer, Alois
  26. Disagreement among ESG rating agencies: shall we be worried? By Lopez, Claude; Contreras, Oscar; Bendix, Joseph

  1. By: Marshall Burke; Anne Driscoll; David Lobell; Stefano Ermon
    Abstract: Accurate and comprehensive measurements of a range of sustainable development outcomes are fundamental inputs into both research and policy. We synthesize the growing literature that uses satellite imagery to understand these outcomes, with a focus on approaches that combine imagery with machine learning. We quantify the paucity of ground data on key human-related outcomes and the growing abundance and resolution (spatial, temporal, and spectral) of satellite imagery. We then review recent machine learning approaches to model-building in the context of scarce and noisy training data, highlighting how this noise often leads to incorrect assessment of models’ predictive performance. We quantify recent model performance across multiple sustainable development domains, discuss research and policy applications, explore constraints to future progress, and highlight key research directions for the field.
    JEL: C45 C55 O1
    Date: 2020–10
  2. By: Chuheng Zhang; Yuanqi Li; Xi Chen; Yifei Jin; Pingzhong Tang; Jian Li
    Abstract: Modern machine learning models (such as deep neural networks and boosting decision tree models) have become increasingly popular in financial market prediction, due to their superior capacity to extract complex non-linear patterns. However, since financial datasets have very low signal-to-noise ratio and are non-stationary, complex models are often very prone to overfitting and suffer from instability issues. Moreover, as various machine learning and data mining tools become more widely used in quantitative trading, many trading firms have been producing an increasing number of features (aka factors). Therefore, how to automatically select effective features becomes an imminent problem. To address these issues, we propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection. Specifically, we identify the key samples based on the training dynamics on each sample and elicit key features based on the ablation impact of each feature via shuffling. Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction. We conduct extensive experiments, including price prediction for cryptocurrencies and stock trading, using both DNN and gradient boosting decision tree as base models. Our experiment results demonstrate that DoubleEnsemble achieves a superior performance compared with several baseline methods.
    Date: 2020–10
  3. By: Lorenc Kapllani; Long Teng
    Abstract: We study deep learning-based schemes for solving high dimensional nonlinear backward stochastic differential equations (BSDEs). First we show how to improve the performances of the proposed scheme in [W. E and J. Han and A. Jentzen, Commun. Math. Stat., 5 (2017), pp.349-380] regarding computational time and stability of numerical convergence by using the advanced neural network architecture instead of the stacked deep neural networks. Furthermore, the proposed scheme in that work can be stuck in local minima, especially for a complex solution structure and longer terminal time. To solve this problem, we investigate to reformulate the problem by including local losses and exploit the Long Short Term Memory (LSTM) networks which are a type of recurrent neural networks (RNN). Finally, in order to study numerical convergence and thus illustrate the improved performances with the proposed methods, we provide numerical results for several 100-dimensional nonlinear BSDEs including a nonlinear pricing problem in finance.
    Date: 2020–10
  4. By: Jillian Grennan (Duke University - Fuqua School of Business; Duke Innovation & Entrepreneurship Initiative); Roni Michaely (University of Geneva - Geneva Finance Research Institute (GFRI); Swiss Finance Institute)
    Abstract: Policymakers fear artificial intelligence (AI) will disrupt labor markets, especially for high-skilled workers. We investigate this concern using novel, task-specific data for security analysts. Exploiting variation in AI's power across stocks, we show analysts with portfolios that are more exposed to AI are more likely to reallocate efforts to soft skills, shift coverage towards low AI stocks, and even leave the profession. Analyst departures disproportionately occur among highly accurate analysts, leaving for non-research jobs. Reallocating efforts toward tasks that rely on social skills improve consensus forecasts. However, increased exposure to AI reduces the novelty in analysts' research which reduces compensation.
    Keywords: artificial intelligence, big data, technology, automation, sell-side analysts, job displacement, labor and finance, social skills, non-cognitive skills, tasks, skill premium, skill-biased technological change, compensation
    JEL: G17 G24 J23 J24 J31 O33
    Date: 2020–08
  5. By: Hinterlang, Natascha; Hollmayr, Josef
    Abstract: This paper identifies U.S. monetary and fiscal dominance regimes using machine learning techniques. The algorithms are trained and verified by employing simulated data from Markov-switching DSGE models, before they classify regimes from 1968-2017 using actual U.S. data. All machine learning methods outperform a standard logistic regression concerning the simulated data. Among those the Boosted Ensemble Trees classifier yields the best results. We find clear evidence of fiscal dominance before Volcker. Monetary dominance is detected between 1984-1988, before a fiscally led regime turns up around the stock market crash lasting until 1994. Until the beginning of the new century, monetary dominance is established, while the more recent evidence following the financial crisis is mixed with a tendency towards fiscal dominance.
    Keywords: Monetary-fiscal interaction,Machine Learning,Classification,Markov-switching DSGE
    JEL: C38 E31 E63
    Date: 2020
  6. By: Honorata Bogusz (Faculty of Economic Sciences, University of Warsaw and Labfam); Szymon Winnicki (Faculty of Economic Sciences, University of Warsaw); Piotr Wójcik (Faculty of Economic Sciences, Data Science Lab WNE UW, University of Warsaw)
    Abstract: This article investigates the causes of spatially uneven migration from Warsaw to its suburban boroughs. The method is based on the gravity model of migration extended by additional measures of possible pulling factors. We report a novel approach to modelling suburbanisation: several linear and non-linear predictive models are estimated and explainable AI methods are used to interpret the shape of relationships between the dependent variable and the most important regressors. It is confirmed that migrants choose boroughs of better amenities and of smaller distance to Warsaw city center.
    Keywords: suburbanisation, gravity model of migration, machine learning models, explainable artificial intelligence
    JEL: R23 P25 C14 C51 C52
    Date: 2020
  7. By: Tullio Mancini; Hector Calvo-Pardo; Jose Olmo
    Abstract: The aim of this paper is to propose a suitable method for constructing prediction intervals for the output of neural network models. To do this, we adapt the extremely randomized trees method originally developed for random forests to construct ensembles of neural networks. The extra-randomness introduced in the ensemble reduces the variance of the predictions and yields gains in out-of-sample accuracy. An extensive Monte Carlo simulation exercise shows the good performance of this novel method for constructing prediction intervals in terms of coverage probability and mean square prediction error. This approach is superior to state-of-the-art methods extant in the literature such as the widely used MC dropout and bootstrap procedures. The out-of-sample accuracy of the novel algorithm is further evaluated using experimental settings already adopted in the literature.
    Date: 2020–10
  8. By: Boeing, Geoff (Northeastern University)
    Abstract: Cities worldwide exhibit a variety of street network patterns and configurations that shape human mobility, equity, health, and livelihoods. This study models and analyzes the street networks of each urban area in the world, using boundaries derived from the Global Human Settlement Layer. Street network data are acquired and modeled using the open-source OSMnx software and OpenStreetMap. In total, this study models over 150 million OpenStreetMap street network nodes and over 300 million edges across 9,000 urban areas in 178 countries. This paper presents the study's reproducible computational workflow, introduces two new open data repositories of processed global street network models and calculated indicators, and reports summary descriptive findings on street network form worldwide. It makes four contributions. First, it reports the methodological advances of using this open-source tool in spatial network modeling and analyses with open big data. Second, it produces an open data repository containing street network models for each of these urban areas, in various file formats, for public reuse. Third, it analyzes these models to produce an open data repository containing dozens of street network form indicators for each urban area. No such global urban street network indicator data set has previously existed. Fourth, it presents an aggregate summary descriptive analysis of global street network form at the scale of the urban area, reporting the first such worldwide results in the literature.
    Date: 2020–09–18
  9. By: Michael B\"ucker; Gero Szepannek; Alicja Gosiewska; Przemyslaw Biecek
    Abstract: A major requirement for credit scoring models is to provide a maximally accurate risk prediction. Additionally, regulators demand these models to be transparent and auditable. Thus, in credit scoring, very simple predictive models such as logistic regression or decision trees are still widely used and the superior predictive power of modern machine learning algorithms cannot be fully leveraged. Significant potential is therefore missed, leading to higher reserves or more credit defaults. This paper works out different dimensions that have to be considered for making credit scoring models understandable and presents a framework for making ``black box'' machine learning models transparent, auditable and explainable. Following this framework, we present an overview of techniques, demonstrate how they can be applied in credit scoring and how results compare to the interpretability of score cards. A real world case study shows that a comparable degree of interpretability can be achieved while machine learning techniques keep their ability to improve predictive power.
    Date: 2020–09
  10. By: Janusz Gajda (Faculty of Economic Sciences, University of Warsaw); Rafał Walasek (Faculty of Economic Sciences, University of Warsaw)
    Abstract: This article covers the implementation of fractional (non-integer order) differentiation on four datasets based on stock prices of main international stock indexes: WIG 20, S&P 500, DAX, Nikkei 225. This concept has been proposed by Lopez de Prado to find the most appropriate balance between zero differentiation and fully differentiated time series. The aim is making time series stationary while keeping its memory and predictive power. This paper makes also the comparison between fractional and classical differentiation in terms of the effectiveness of artificial neural networks. This comparison is done in two viewpoints: Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The conclusion of the study is that fractionally differentiated time series performed better in trained ANN.
    Keywords: fractional differentiation, financial time series, stock exchange, artificial neural networks
    JEL: C22 C32 G10
    Date: 2020
  11. By: Noemi Kreif (Centre for Health Economics, University of York, York, UK); Andrew Mirelman (World Health Organization, Geneva, Switzerland); Rodrigo Moreno-Serra (Centre for Health Economics, University of York, York, UK); Taufik Hidayat, (Center for Health Economics and Policy Studies (CHEPS), Faculty of Public Health, Universitas Indonesia, Depok, Indonesia); Karla DiazOrdaz (Department of Medical Statistics, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK); Marc Suhrcke (Centre for Health Economics, University of York, UK and Luxembourg Institute of Socio-economic Research, Luxembourg)
    Abstract: To be able to target health policies more efficiently, policymakers require knowledge about which individuals benefit most from a particular programme. While traditional approaches for subgroup analyses are constrained only to consider a small number of arbitrarily set, pre-defined subgroups, recently proposed causal machine learning (CML) approaches help explore treatment-effect heterogeneity in a more flexible yet principled way. This paper illustrates one such approach – ‘causal forests’ – in evaluating the effect of mothers’ health insurance enrolment in Indonesia. Contrasting two health insurance schemes (subsidised and contributory) to no insurance, we find beneficial average impacts of enrolment in contributory health insurance on maternal health care utilisation and infant mortality. For subsidised health insurance, however, both effects were smaller and not statistically significant. The causal forest algorithm identified significant heterogeneity in the impacts of the contributory insurance scheme: disadvantaged mothers (i.e. with lower wealth quintiles, lower educated, or in rural areas) benefit the most in terms of increased health care utilisation. No significant heterogeneity was found for the subsidised scheme, even though this programme targeted vulnerable populations. Our study demonstrates the power of CML approaches to uncover the heterogeneity in programme impacts, hence providing policymakers with valuable information for programme design.
    Keywords: policy evaluation;machine learning;heterogeneous treatment effects;health insurance
    Date: 2020–10
  12. By: Breithaupt, Patrick; Kesler, Reinhold; Niebel, Thomas; Rammer, Christian
    Abstract: Knowledge-based capital is a key factor for productivity growth. Over the past 15 years, it has been increasingly recognised that knowledge-based capital comprises much more than technological knowledge and that these other components are essential for understanding productivity developments and competitiveness of both firms and economies. We develop selected indicators for knowledge-based capital, often denoted as intangible capital, on the basis of publicly available data from online platforms. These indicators based on data from Facebook and the employer branding and review platform Kununu are compared by OLS regressions with firm-level survey data from the Mannheim Innovation Panel (MIP). All regressions show a positive and significant relationship between survey-based firm-level expenditures for marketing and on-the-job training and the respective information stemming from the online platforms. We therefore explore the possibility of predicting brand equity and firm-specific human capital with machine learning methods.
    Keywords: Web Scraping,Knowledge-Based Capital,Intangibles
    JEL: C81 E22 O30
    Date: 2020
  13. By: Tarun Bhatia
    Abstract: U.S. Nonfarm employment is considered one of the key indicators for assessing the state of the labor market. Considerable deviations from the expectations can cause market moving impacts. In this paper, the total U.S. nonfarm payroll employment is predicted before the release of the BLS employment report. The content herein outlines the process for extracting predictive features from the aggregated payroll data and training machine learning models to make accurate predictions. Publically available revised employment report by BLS is used as a benchmark. Trained models show excellent behaviour with R2 of 0.9985 and 99.99% directional accuracy on out of sample periods from January 2012 to March 2020. Keywords Machine Learning; Economic Indicators; Ensembling; Regression, Total Nonfarm Payroll
    Date: 2020–09
  14. By: Lining Yu; Wolfgang Karl H\"ardle; Lukas Borke; Thijs Benschop
    Abstract: AI artificial intelligence brings about new quantitative techniques to assess the state of an economy. Here we describe a new measure for systemic risk: the Financial Risk Meter (FRM). This measure is based on the penalization parameter (lambda) of a linear quantile lasso regression. The FRM is calculated by taking the average of the penalization parameters over the 100 largest US publicly traded financial institutions. We demonstrate the suitability of this AI based risk measure by comparing the proposed FRM to other measures for systemic risk, such as VIX, SRISK and Google Trends. We find that mutual Granger causality exists between the FRM and these measures, which indicates the validity of the FRM as a systemic risk measure. The implementation of this project is carried out using parallel computing, the codes are published on with keyword FRM. The R package RiskAnalytics is another tool with the purpose of integrating and facilitating the research, calculation and analysis methods around the FRM project. The visualization and the up-to-date FRM can be found on
    Date: 2020–09
  15. By: Stefanos Georganos; Oscar Brousse; Sébastien Dujardin; Catherine Linard; Daniel Casey; Marco Milliones; Benoit Parmentier; Nicole P M Van Lipzig; Matthias Demuzere; Taïs Grippa; Sabine Vanhuysse; Nicholus Mboga; Verónica Andreo; Robert William B R.W. Snow; Moritz Lennert
    Abstract: BACKGROUND: The rapid and often uncontrolled rural-urban migration in Sub-Saharan Africa is transforming urban landscapes expected to provide shelter for more than 50% of Africa's population by 2030. Consequently, the burden of malaria is increasingly affecting the urban population, while socio-economic inequalities within the urban settings are intensified. Few studies, relying mostly on moderate to high resolution datasets and standard predictive variables such as building and vegetation density, have tackled the topic of modeling intra-urban malaria at the city extent. In this research, we investigate the contribution of very-high-resolution satellite-derived land-use, land-cover and population information for modeling the spatial distribution of urban malaria prevalence across large spatial extents. As case studies, we apply our methods to two Sub-Saharan African cities, Kampala and Dar es Salaam. METHODS: Openly accessible land-cover, land-use, population and OpenStreetMap data were employed to spatially model Plasmodium falciparum parasite rate standardized to the age group 2-10 years (PfPR2-10) in the two cities through the use of a Random Forest (RF) regressor. The RF models integrated physical and socio-economic information to predict PfPR2-10 across the urban landscape. Intra-urban population distribution maps were used to adjust the estimates according to the underlying population. RESULTS: The results suggest that the spatial distribution of PfPR2-10 in both cities is diverse and highly variable across the urban fabric. Dense informal settlements exhibit a positive relationship with PfPR2-10 and hotspots of malaria prevalence were found near suitable vector breeding sites such as wetlands, marshes and riparian vegetation. In both cities, there is a clear separation of higher risk in informal settlements and lower risk in the more affluent neighborhoods. Additionally, areas associated with urban agriculture exhibit higher malaria prevalence values. CONCLUSIONS: The outcome of this research highlights that populations living in informal settlements show higher malaria prevalence compared to those in planned residential neighborhoods. This is due to (i) increased human exposure to vectors, (ii) increased vector density and (iii) a reduced capacity to cope with malaria burden. Since informal settlements are rapidly expanding every year and often house large parts of the urban population, this emphasizes the need for systematic and consistent malaria surveys in such areas. Finally, this study demonstrates the importance of remote sensing as an epidemiological tool for mapping urban malaria variations at large spatial extents, and for promoting evidence-based policy making and control efforts.
    Keywords: Dar es Salaam; Kampala; Population; Random forest; Remote sensing; Urban malaria
    Date: 2020–09
  16. By: Steven J. Davis; Stephen Hansen; Cristhian Seminario-Amez
    Abstract: Firm-level stock returns differ enormously in reaction to COVID-19 news. We characterize these reactions using the Risk Factors discussions in pre-pandemic 10-K filings and two text-analytic approaches: expert-curated dictionaries and supervised machine learning (ML). Bad COVID-19 news lowers returns for firms with high exposures to travel, traditional retail, aircraft production and energy supply -- directly and via downstream demand linkages -- and raises them for firms with high exposures to healthcare policy, e-commerce, web services, drug trials and materials that feed into supply chains for semiconductors, cloud computing and telecommunications. Monetary and fiscal policy responses to the pandemic strongly impact firm-level returns as well, but differently than pandemic news. Despite methodological differences, dictionary and ML approaches yield remarkably congruent return predictions. Importantly though, ML operates on a vastly larger feature space, yielding richer characterizations of risk exposures and outperforming the dictionary approach in goodness-of-fit. By integrating elements of both approaches, we uncover new risk factors and sharpen our explanations for firm-level returns. To illustrate the broader utility of our methods, we also apply them to explain firm-level returns in reaction to the March 2020 Super Tuesday election results.
    JEL: E44 G12 G14 G18
    Date: 2020–09
  17. By: Rotem Zelingher (ECO-PUB - Economie Publique - AgroParisTech - Université Paris-Saclay - INRAE - Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement); David Makowski; Thierry Brunelle (CIRED - Centre international de recherche sur l'environnement et le développement - Cirad - Centre de Coopération Internationale en Recherche Agronomique pour le Développement - AgroParisTech - EHESS - École des hautes études en sciences sociales - ENPC - École des Ponts ParisTech - CNRS - Centre National de la Recherche Scientifique)
    Abstract: Agricultural price shocks strongly affect farmers' income and food security. It is therefore important to understand the origin of these shocks and anticipate their occurrence. In this study, we explore the possibility of predicting global prices of one of the world main agricultural commodity-maize-based on variations in regional production. We examine the performances of several machine-learning (ML) methods and compare them with a powerful time series model (TBATS) trained with 56 years of price data. Our results show that, out of nineteen regions, global maize prices are mostly influenced by Northern America. More specifically, small positive production changes relative to the previous year in Northern America negatively impact the world price while production of other regions have weak or no influence. We find that TBATS is the most accurate method for a forecast horizon of three months or less. For longer forecasting horizons, ML techniques based on bagging and gradient boosting perform better but require yearly input data on regional maize productions. Our results highlight the interest of ML for predicting global prices of major commodities and reveal the strong sensitivity of global maize price to small variations of maize production in Northern America.
    Keywords: Food-security,Maize,Agricultural commodity prices,Regional production,Machine learning
    Date: 2020–09–22
  18. By: Mosavi, Amir; Faghan, Yaser; Ghamisi, Pedram; Duan, Puhong; Ardabili, Sina Faizollahzadeh; Hassan, Salwana; Band, Shahab S.
    Abstract: The popularity of deep reinforcement learning (DRL) applications in economics has increased exponentially. DRL, through a wide range of capabilities from reinforcement learning (RL) to deep learning (DL), offers vast opportunities for handling sophisticated dynamic economics systems. DRL is characterized by scalability with the potential to be applied to high-dimensional problems in conjunction with noisy and nonlinear patterns of economic data. In this paper, we initially consider a brief review of DL, RL, and deep RL methods in diverse applications in economics, providing an in-depth insight into the state-of-the-art. Furthermore, the architecture of DRL applied to economic applications is investigated in order to highlight the complexity, robustness, accuracy, performance, computational tasks, risk constraints, and profitability. The survey results indicate that DRL can provide better performance and higher efficiency as compared to the traditional algorithms while facing real economic problems in the presence of risk parameters and the ever-increasing uncertainties.
    Date: 2020–09–01
  19. By: Nadja Klein; Michael Stanley Smith; David J. Nott
    Abstract: Recurrent neural networks (RNNs) with rich feature vectors of past values can provide accurate point forecasts for series that exhibit complex serial dependence. We propose two approaches to constructing deep time series probabilistic models based on a variant of RNN called an echo state network (ESN). The first is where the output layer of the ESN has stochastic disturbances and a shrinkage prior for additional regularization. The second approach employs the implicit copula of an ESN with Gaussian disturbances, which is a deep copula process on the feature space. Combining this copula with a non-parametrically estimated marginal distribution produces a deep distributional time series model. The resulting probabilistic forecasts are deep functions of the feature vector and also marginally calibrated. In both approaches, Bayesian Markov chain Monte Carlo methods are used to estimate the models and compute forecasts. The proposed deep time series models are suitable for the complex task of forecasting intraday electricity prices. Using data from the Australian National Electricity Market, we show that our models provide accurate probabilistic price forecasts. Moreover, the models provide a flexible framework for incorporating probabilistic forecasts of electricity demand as additional features. We demonstrate that doing so in the deep distributional time series model in particular, increases price forecast accuracy substantially.
    Date: 2020–10
  20. By: Masahiro Kato; Shota Yasui
    Abstract: We consider training a binary classifier under delayed feedback (DF Learning). In DF Learning, we first receive negative samples; subsequently, some samples turn positive. This problem is conceivable in various real-world applications such as online advertisements, where the user action takes place long after the first click. Owing to the delayed feedback, simply separating the positive and negative data causes a sample selection bias. One solution is to assume that a long time window after first observing a sample reduces the sample selection bias. However, existing studies report that only using a portion of all samples based on the time window assumption yields suboptimal performance, and the use of all samples along with the time window assumption improves empirical performance. Extending these existing studies, we propose a method with an unbiased and convex empirical risk constructed from the whole samples under the time window assumption. We provide experimental results to demonstrate the effectiveness of the proposed method using a real traffic log dataset.
    Date: 2020–09
  21. By: Rakshit Jha; Mattijs De Paepe; Samuel Holt; James West; Shaun Ng
    Abstract: This paper shows that temporal CNNs accurately predict bitcoin spot price movements from limit order book data. On a 2 second prediction time horizon we achieve 71\% walk-forward accuracy on the popular cryptocurrency exchange coinbase. Our model can be trained in less than a day on commodity GPUs which could be installed into colocation centers allowing for model sync with existing faster orderbook prediction models. We provide source code and data at rderbook.
    Date: 2020–10
  22. By: Abramov, Dimitri Marques
    Abstract: Despite the market economy to be contemporaneously considered as a complex adaptive system, there are no collective feedback mechanism that provide long range stability and complexity to the system. In this scenario, the logical prediction is a long-term economic collapse by positive loops. In this work, I outline the fundamental idea of a floating taxation system as a feedback system to prevent market collapse by asymmetrical company overgrowth and extreme reduction of system complexity. The paradigm would promote the long-term stability of the economic system. I’ve implemented a generic computational neural network with 5000 virtual companies whose initial states (i.e. capital) and connective weights (trading network) were normally distributed. A negative feedback loop was implemented with different weights. The market complexity was measured in terms of joint entropy in an algorithm to calc neural complexity in networks. Without feedback, some companies had explosive growth annihilating all collateral ones until all system collapses. With feedback loops, the complexity was stable while many companies disappeared (negative selection) and the capital variance substantially increased (from 10 units in initial conditions to 2000 times) as well complexity (increment on order to 104). This data supports a theory about feedback dynamic mechanisms for market self-regulation based on floating taxes, maintaining homeostasis with complexity, capital growth, and competitive balance.
    Date: 2020–09–15
  23. By: Susana Martínez-Restrepo; Lina Tafur Marín; Juan Guillermo Osio; Pablo Cortés
    Abstract: Este es el quinto Brief de la serie de #GéneroYCovid de CoreWoman que busca monitorear cómo la Covid-19 está afectando de forma diferenciada a las mujeres y proponer soluciones para una reactivación con enfoque de género.
    Keywords: Violencia, Violencia Basada en Género, Violencia Doméstica, Género, Mujeres, COVID-19, Big Data, Colombia
    JEL: J12 J16
    Date: 2020–09–17
  24. By: Marica Valente
    Abstract: Using machine learning methods in a quasi-experimental setting, I study the heterogeneous effects of waste prices -- unit prices on household unsorted waste disposal -- on waste demands and social welfare. First, using a unique panel of Italian municipalities with large variation in prices and observables, I show that waste demands are nonlinear. I find evidence of nudge effects at low prices, and increasing elasticities at high prices driven by income effects and waste habits before policy. Second, I combine municipal level price effects on unsorted and recycling waste with their impacts on municipal and pollution costs. I estimate overall welfare benefits after three years of adoption, when waste prices cause significant waste avoidance. As waste avoidance is highest at low prices, this implies that even low prices can substantially change waste behaviors and improve welfare.
    Date: 2020–10
  25. By: Schmid, Christian P. R. (CSS Institute for Empirical Health Economics); Schreiner, Nicolas (University of Basel); Stutzer, Alois (University of Basel)
    Abstract: How should payment systems of means-tested benefits be designed to improve the financial situation of needy recipients most effectively? We study this question in the context of mandatory health insurance in Switzerland, where recipients initially received either a cash transfer or subsidized insurance premiums (a form of in-kind transfer). A federal reform in 2014 forced cantons (i.e. states) to universally switch to in-kind provision. We exploit this setting based on a difference-in-differences design, analyzing rich individual-level accounting data and applying a machine learning approach to identify cash recipients prior to the reform. We find that switching from cash to in-kind transfers reduces the likelihood of late premiums payments by about 20% and of government debt collection for long-term missed payments by approximately 16%. There is no evidence for a negative spillover effect on the timely payment of the non-subsidized coinsurance bills for health services after the regime change.
    Keywords: health insurance, transfers, cash subsidies, in-kind transfers, financial distress, debt collection
    JEL: D14 H24 I13
    Date: 2020–10
  26. By: Lopez, Claude; Contreras, Oscar; Bendix, Joseph
    Abstract: In this study, we show that using a common set of variables would partially resolve inconsistencies and the lack of comparability across rating providers that often confuse investors. Furthermore, we dissociate the impact of the rating agencies’ different focus on “E”, “S” or “G” from that of using different data. While the former, if properly disclosed, can be useful as it allows investors to choose what rating will be more in line with their preferences, the latter necessarily requires harmonization of the data collected.
    Keywords: ESG ratings, machine learning, Environmental, Social governance,
    JEL: C14 C5 G10 G11 G14 G30
    Date: 2020–09–20

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.