nep-big New Economics Papers
on Big Data
Issue of 2020‒03‒16
25 papers chosen by
Tom Coupé
University of Canterbury

  1. Industrial Growth in Sub-Saharan Africa: Evidence from Machine Learning with Insights from Nightlight Satellite Images By Christian S. Otchia; Simplice A. Asongu
  2. Identifying Urban Areas by Combining Human Judgment and Machine Learning : An Application to India By Galdo,Virgilio; Li,Yue-000316086; Rama,Martin G.
  3. Ascertaining price formation in cryptocurrency markets with DeepLearning By Fan Fang; Waichung Chung; Carmine Ventre; Michail Basios; Leslie Kanthan; Lingbo Li; Fan Wu
  4. A New Tool for Robust Estimation and Identification of Unusual Data Points By Christian Garciga; Randal Verbrugge
  5. Deep neural networks algorithms for stochastic control problems on finite horizon: numerical applications By Achref Bachouch; Côme Huré; Nicolas Langrené; Huyen Pham
  6. A deep learning approach for computations of exposure profiles for high-dimensional Bermudan options By Kristoffer Andersson; Cornelis Oosterlee
  7. Proxying Economic Activity with Daytime Satellite Imagery: Filling Data Gaps Across Time and Space By Patrick Lehnert; Michael Niederberger; Uschi Backes-Gellner
  8. Binary Classification Problems in Economics and 136 Different Ways to Solve Them By Anton Gerunov
  9. Causal mediation analysis with double machine learning By Helmut Farbmacher; Martin Huber; Henrika Langen; Martin Spindler
  10. Estimating Small Area Population Density Using Survey Data and Satellite Imagery : An Application to Sri Lanka By Engstrom,Ryan; Newhouse,David Locke; Soundararajan,Vidhya
  11. Estimating the Effect of Central Bank Independence on Inflation Using Longitudinal Targeted Maximum Likelihood Estimation By Philipp Baumann; Michael Schomaker; Enzo Rossi
  12. Double Machine Learning based Program Evaluation under Unconfoundedness By Knaus, Michael C.
  13. A Forward Guidance Indicator For The South African Reserve Bank: Implementing A Text Analysis Algorithm By Ruan Erasmus; Hylton Hollander
  14. The Evolution of Inequality of Opportunity in Germany: A Machine Learning Approach By Paolo Brunori; Guido Neidhofer
  15. Which Model for Poverty Predictions? By Paolo Verme
  16. Estimation of the ex ante Distribution of Returns for a Portfolio of U.S. Treasury Securities via Deep Learning By Foresti,Andrea
  17. Fast Lower and Upper Estimates for the Price of Constrained Multiple Exercise American Options by Single Pass Lookahead Search and Nearest-Neighbor Martingale By Nicolas Essis-Breton; Patrice Gaillardetz
  18. Is Chinese Growth Overstated? By Maximo Camacho; Hunter L. Clark; Xavier X. Sala-i-Martin
  19. Machine Learning Portfolio Allocation By Michael Pinelis; David Ruppert
  20. Network Competition and Team Chemistry in the NBA By William C. Horrace; Hyunseok Jung; Shane Sanders
  21. Labor Market Impacts and Responses : The Economic Consequences of a Marine Environmental Disaster By Hoang,Trung Xuan; Le,Duong Trung; Nguyen,Ha Minh; Vuong,Nguyen Dinh Tuan
  22. Recessions as Breadwinner for Forecasters State-Dependent Evaluation of Predictive Ability: Evidence from Big Macroeconomic US Data By Boriss Siliverstovs; Daniel Wochner
  23. Does e-commerce reduce traffic congestion? Evidence from Alibaba single day shopping event By Peng, Cong
  24. Efficient Targeting in Childhood Interventions By Paul, Alexander; Bleses, Dorthe; Rosholm, Michael
  25. A Text Mining Analysis of Central Bank Monetary Policy Communication in Nigeria By Omotosho, Babatunde S.

  1. By: Christian S. Otchia (Hyogo, Japan); Simplice A. Asongu (Yaoundé, Cameroon)
    Abstract: This study uses nightlight time data and machine learning techniques to predict industrial development in Africa. The results provide the first evidence on how machine learning techniques and nightlight data can be used to predict economic development in places where subnational data are missing or not precise. Taken together, the research confirms four groups of important determinants of industrial growth: natural resources, agriculture growth, institutions, and manufacturing imports. Our findings indicate that Africa should follow a more multisector approach for development, putting natural resources and agriculture productivity growth at the forefront.
    Keywords: Industrial growth; Machine learning; Africa
    JEL: I32 O15 O40 O55
    Date: 2019–01
    URL: http://d.repec.org/n?u=RePEc:abh:wpaper:19/046&r=all
  2. By: Galdo,Virgilio; Li,Yue-000316086; Rama,Martin G.
    Abstract: This paper proposes a methodology for identifying urban areas that combines subjective assessments with machine learning, and applies it to India, a country where several studies see the official urbanization rate as an under-estimate. For a representative sample of cities, towns and villages, as administratively defined, human judgment of Google images is used to determine whether they are urban or rural in practice. Judgments are collected across four groups of assessors, differing in their familiarity with India and with urban issues, following two different protocols. The judgment-based classification is then combined with data from the population census and from satellite imagery to predict the urban status of the sample. The Logit model, and LASSO and random forests methods, are applied. These approaches are then used to decide whether each of the out-of-sample administrative units in India is urban or rural in practice. The analysis does not find that India is substantially more urban than officially claimed. However, there are important differences at more disaggregated levels, with ?other towns? and ?census towns? being more rural, and some southern states more urban, than is officially claimed. The consistency of human judgment across assessors and protocols, the easy availability of crowd-sourcing, and the stability of predictions across approaches, suggest that the proposed methodology is a promising avenue for studying urban issues.
    Date: 2020–02–24
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:0160&r=all
  3. By: Fan Fang; Waichung Chung; Carmine Ventre; Michail Basios; Leslie Kanthan; Lingbo Li; Fan Wu
    Abstract: The cryptocurrency market is amongst the fastest-growing of all the financial markets in the world. Unlike traditional markets, such as equities, foreign exchange and commodities, cryptocurrency market is considered to have larger volatility and illiquidity. This paper is inspired by the recent success of using deep learning for stock market prediction. In this work, we analyze and present the characteristics of the cryptocurrency market in a high-frequency setting. In particular, we applied a deep learning approach to predict the direction of the mid-price changes on the upcoming tick. We monitored live tick-level data from $8$ cryptocurrency pairs and applied both statistical and machine learning techniques to provide a live prediction. We reveal that promising results are possible for cryptocurrencies, and in particular, we achieve a consistent $78\%$ accuracy on the prediction of the mid-price movement on live exchange rate of Bitcoins vs US dollars.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.00803&r=all
  4. By: Christian Garciga; Randal Verbrugge (Virginia Polytechnic Institute and State University)
    Abstract: Most consistent estimators are what Müller (2007) terms “highly fragile”: prone to total breakdown in the presence of a handful of unusual data points. This compromises inference. Robust estimation is a (seldom-used) solution, but commonly used methods have drawbacks. In this paper, building on methods that are relatively unknown in economics, we provide a new tool for robust estimates of mean and covariance, useful both for robust estimation and for detection of unusual data points. It is relatively fast and useful for large data sets. Our performance testing indicates that our baseline method performs on par with, or better than, two of the currently best available methods, and that it works well on benchmark data sets. We also demonstrate that the issues we discuss are not merely hypothetical, by re-examining a prominent economic study and demonstrating its central results are driven by a set of unusual points.
    Keywords: big data; machine learning; outlier identification; fragility; robust estimation; detMCD; RMVN
    JEL: C3 C4 C5
    Date: 2020–03–05
    URL: http://d.repec.org/n?u=RePEc:fip:fedcwq:87580&r=all
  5. By: Achref Bachouch (UiO - University of Oslo); Côme Huré (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique); Nicolas Langrené (CSIRO - Data61 [Canberra] - ANU - Australian National University - CSIRO - Commonwealth Scientific and Industrial Research Organisation [Canberra]); Huyen Pham (LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation - UPD7 - Université Paris Diderot - Paris 7 - SU - Sorbonne Université - CNRS - Centre National de la Recherche Scientifique)
    Abstract: This paper presents several numerical applications of deep learning-based algorithms that have been introduced in [HPBL18]. Numerical and comparative tests using TensorFlow illustrate the performance of our different algorithms, namely control learning by performance iteration (algorithms NNcontPI and ClassifPI), control learning by hybrid iteration (algorithms Hybrid-Now and Hybrid-LaterQ), on the 100-dimensional nonlinear PDEs examples from [EHJ17] and on quadratic backward stochastic differential equations as in [CR16]. We also performed tests on low-dimension control problems such as an option hedging problem in finance, as well as energy storage problems arising in the valuation of gas storage and in microgrid management. Numerical results and comparisons to quantization-type algorithms Qknn, as an efficient algorithm to numerically solve low-dimensional control problems, are also provided; and some corresponding codes are available on https://github.com/comeh/.
    Keywords: value iteration,Policy iteration algorithm,reinforcement learning,quantization,Deep learning
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:hal:journl:hal-01949221&r=all
  6. By: Kristoffer Andersson; Cornelis Oosterlee
    Abstract: In this paper, we propose a neural network-based method for approximating expected exposures and potential future exposures of Bermudan options. In a first phase, the method relies on the Deep Optimal Stopping algorithm (DOS) proposed in \cite{DOS}, which learns the optimal stopping rule from Monte-Carlo samples of the underlying risk factors. Cashflow paths are then created by applying the learned stopping strategy on a new set of realizations of the risk factors. Furthermore, in a second phase the risk factors are regressed against the cashflow-paths to obtain approximations of pathwise option values. The regression step is carried out by ordinary least squares as well as neural networks, and it is shown that the latter performs more accurate approximations. The expected exposure is formulated, both in terms of the cashflow-paths and in terms of the pathwise option values and it is shown that a simple Monte-Carlo average yields accurate approximations in both cases. The potential future exposure is estimated by the empirical $\alpha$-percentile. Finally, it is shown that the expected exposures, as well as the potential future exposures can be computed under either, the risk neutral measure, or the real world measure, without having to re-train the neural networks.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.01977&r=all
  7. By: Patrick Lehnert; Michael Niederberger; Uschi Backes-Gellner
    Abstract: This paper develops a novel procedure for proxying economic activity across time periods and spatial units, for which other data is not available. In developing this proxy, we apply machine-learning techniques to a unique historical time series of daytime satellite imagery dating back to 1984. Compared to night lights intensity, a satellite-based proxy that economists commonly use, our proxy has the advantages of more precisely predicting economic activity over a longer time series and at smaller regional levels. We demonstrate the proxy's usefulness for the example of Germany, where data on economic activity is otherwise unavailable, in particular for the regions belonging to the former German Democratic Republic. However, our procedure is generalizable to other regions and countries alike, and thus yields great potential for analyzing historical developments, evaluating local policy reforms, and controlling for economic activity at highly disaggregated regional levels in econometric applications.
    JEL: E01 E23 O18 R11 R14
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:iso:educat:0165&r=all
  8. By: Anton Gerunov (Faculty of Economics and Business Administration, Sofia University ÒSt. Kliment Ohridski")
    Abstract: This article investigates the performance of 136 different classification algorithms for economic problems of binary choice. They are applied to model five different choice situations Ð consumer acceptance during a direct marketing campaign, predicting default on credit card debt, credit scoring, forecasting firm insolvency, and modelling online consumer purchases. Algorithms are trained to generate class predictions of a given binary target variable, which are then used to measure their forecast accuracy using the area under a ROC curve. Results show that algorithms of the Random Forest family consistently outperform alternative methods and may be thus suitable for modelling a wide range of discrete choice situations.
    Keywords: Bdiscrete choice, classification, machine learning algorithms, modelling decisions.
    JEL: C35 C44 C45 D81
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:sko:wpaper:bep-2020-02&r=all
  9. By: Helmut Farbmacher; Martin Huber; Henrika Langen; Martin Spindler
    Abstract: This paper combines causal mediation analysis with double machine learning to control for observed confounders in a data-driven way under a selection-on-observables assumption in a high-dimensional setting. We consider the average indirect effect of a binary treatment operating through an intermediate variable (or mediator) on the causal path between the treatment and the outcome, as well as the unmediated direct effect. Estimation is based on efficient score functions, which possess a multiple robustness property w.r.t. misspecifications of the outcome, mediator, and treatment models. This property is key for selecting these models by double machine learning, which is combined with data splitting to prevent overfitting in the estimation of the effects of interest. We demonstrate that the direct and indirect effect estimators are asymptotically normal and root-n consistent under specific regularity conditions and investigate the finite sample properties of the suggested methods in a simulation study when considering lasso as machine learner. We also provide an empirical application to the U.S. National Longitudinal Survey of Youth, assessing the indirect effect of health insurance coverage on general health operating via routine checkups as mediator, as well as the direct effect. We find a moderate short term effect of health insurance coverage on general health which is, however, not mediated by routine checkups.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.12710&r=all
  10. By: Engstrom,Ryan; Newhouse,David Locke; Soundararajan,Vidhya
    Abstract: Country-level census data are typically collected once every 10 years. However, conflict, migration, urbanization, and natural disasters can cause rapid shifts in local population patterns. This study uses Sri Lankan data to demonstrate the feasibility of a bottom-up method that combines household survey data with contemporaneous satellite imagery to track frequent changes in local population density. A Poisson regression model based on indicators derived from satellite data, selected using the least absolute shrinkage and selection operator, accurately predicts village-level population density. The model is estimated in villages sampled in the 2012/13 Household Income and Expenditure Survey to obtain out-of-sample density predictions in the nonsurveyed villages. The predictions approximate the 2012 census density well and are more accurate than other bottom-up studies based on lower-resolution satellite data. The predictions are also more accurate than most publicly available population products, which rely on areal interpolation of census data to redistribute population at the local level. The accuracies are similar when estimated using a random forest model, and when density estimates are expressed in terms of population counts. The collective evidence suggests that combining surveys with satellite data is a cost-effective method to track local population changes at more frequent intervals.
    Date: 2019–03–12
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8776&r=all
  11. By: Philipp Baumann; Michael Schomaker; Enzo Rossi
    Abstract: Whether a country's central bank independence (CBI) status has a lowering effect on inflation is a controversial hypothesis. To date, this question could not be answered satisfactorily because the complex macroeconomics structure that gives rise to the data has not been adequately incorporated into statistical analyses. We have developed a causal model that summarizes the economic process of inflation. Based on this causal model and recent data, we discuss and identify the assumptions under which the effect of CBI on inflation can be identified and estimated. Given these and alternative assumptions we estimate this effect using modern doubly robust effect estimators, i.e. longitudinal targeted maximum likelihood estimators. The estimation procedure incorporated machine learning algorithms and was tailored to address the challenges that come with complex longitudinal macroeconomics data. We could not find strong support for the hypothesis that a central bank that is independent over a long period of time necessarily lowers inflation. Simulation studies evaluate the sensitivity of the proposed methods in complex settings when assumptions are violated, and highlight the importance of working with appropriate learning algorithms for estimation.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.02208&r=all
  12. By: Knaus, Michael C.
    Abstract: This paper consolidates recent methodological developments based on Double Machine Learning (DML) with a focus on program evaluation under unconfoundedness. DML based methods leverage flexible prediction methods to control for confounding in the estimation of (i) standard average effects, (ii) different forms of heterogeneous effects, and (iii) optimal treatment assignment rules. We emphasize that these estimators build all on the same doubly robust score, which allows to utilize computational synergies. An evaluation of multiple programs of the Swiss Active Labor Market Policy shows how DML based methods enable a comprehensive policy analysis. However, we find evidence that estimates of individualized heterogeneous effects can become unstable.
    Keywords: Causal machine learning, conditional average treatment effects, optimal policy learning, individualized treatment rules, multiple treatments
    JEL: C21
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:usg:econwp:2020:04&r=all
  13. By: Ruan Erasmus (Department of Economics, Stellenbosch University); Hylton Hollander (Department of Economics, Stellenbosch University)
    Abstract: The expansion of central bank communications and the increased use thereof as a policy tool to manage expectations have led to an area of research, semantic modelling, that analyses the words and phrases used by central banks. We use text-mining and text-analysis techniques on South African Reserve Bank monetary policy committee statements to construct an index measuring the stance of monetary policy: a forward guidance indicator (FGI). We show that, after controlling for market expectations, FGIs provide significant predictive power for future changes in the repurchase interest rate (the primary monetary policy instrument). Furthermore, we show that FGIs are primarily driven by inflation expectations, which highlights the strong link between the SARB's communication strategy and its inflation targeting mandate. In fact, we observe a systematic anti-inflation bias in the communicated stance of monetary policy---both absolutely and asymmetrically. The results are, however, sensitive to the selection of the dictionary used to analyse the text.
    Keywords: Monetary policy, Text analysis, Forward guidance, Inflation targeting
    JEL: C43 C53 E42 E47 E52 E58
    Date: 2020
    URL: http://d.repec.org/n?u=RePEc:sza:wpaper:wpapers339&r=all
  14. By: Paolo Brunori (University of Florence); Guido Neidhofer (ZEW)
    Abstract: We show that measures of inequality of opportunity (IOP) fully consistent with Roemer (1998)'s IOP theory can be straightforwardly estimated by adopting a machine learning approach, and apply our novel method to analyse the development of IOP in Germany during the last three decades. Hereby, we take advantage of information contained in 25 waves of the Socio-Economic Panel. Our analysis shows that in Germany IOP declined immediately after reunification, increased in the first decade of the century, and slightly declined again after 2010. Over the entire period, at the top of the distribution we always find individuals that resided in West-Germany before the fall of the Berlin Wall, whose fathers had a high occupational position, and whose mothers had a high educational degree. East-German residents in 1989, with low educated parents, persistently qualify at the bottom.
    Keywords: Inequality, opportunity, SOEP, Germany.
    JEL: D63 D30 D31
    Date: 2020–01
    URL: http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2020-514&r=all
  15. By: Paolo Verme (World Bank)
    Abstract: OLS models are the predominant choice for poverty predictions in a variety of contexts such as proxy-means tests, poverty mapping or cross-survey impu- tations. This paper compares the performance of econometric and machine learning models in predicting poverty using alternative objective functions and stochastic dominance analysis based on coverage curves. It finds that the choice of an optimal model largely depends on the distribution of incomes and the poverty line. Comparing the performance of different econometric and machine learning models is therefore an important step in the process of optimizing poverty predictions and targeting ratios.
    Keywords: Welfare Modelling; Income Distributions; Poverty Predictions; Imputations.
    JEL: D31 D63 E64 O15
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:inq:inqwps:ecineq2020-521&r=all
  16. By: Foresti,Andrea
    Abstract: This paper presents different deep neural network architectures designed to forecast the distribution of returns on a portfolio of U.S. Treasury securities. A long short-term memory model and a convolutional neural network are tested as the main building blocks of each architecture. The models are then augmented by cross-sectional data and the portfolio's empirical distribution. The paper also presents the fit and generalization potential of each approach.
    Date: 2019–03–21
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8790&r=all
  17. By: Nicolas Essis-Breton; Patrice Gaillardetz
    Abstract: This article presents fast lower and upper estimates for a large class of options: the class of constrained multiple exercise American options. Typical options in this class are swing options with volume and timing constraints, and passport options with multiple lookback rights. The lower estimate algorithm uses the artificial intelligence method of lookahead search. The upper estimate algorithm uses the dual approach to option pricing on a nearest-neighbor basis for the martingale space. Probabilistic convergence guarantees are provided. Several numerical examples illustrate the approaches including a swing option with four constraints, and a passport option with 16 constraints.
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2002.11258&r=all
  18. By: Maximo Camacho; Hunter L. Clark (Research and Statistics Group); Xavier X. Sala-i-Martin
    Abstract: For analysts of the Chinese economy, questions about the accuracy of the country?s official GDP data are a frequent source of angst, leading many to seek guidance from alternative indicators. These nonofficial gauges often suggest Beijing?s growth figures are exaggerated, but that conclusion is not supported by our analysis, which draws upon satellite measurements of the intensity of China?s nighttime light emissions?a good proxy for GDP growth that is presumably not subject to whatever measurement errors may affect the country?s official economic statistics.
    Keywords: nighttime lights; GDP; China
    JEL: F00
    URL: http://d.repec.org/n?u=RePEc:fip:fednls:87191&r=all
  19. By: Michael Pinelis; David Ruppert
    Abstract: We find economically and statistically significant gains from using machine learning to dynamically allocate between the market index and the risk-free asset. We model the market price of risk to determine the optimal weights in the portfolio: reward-risk market timing. This involves forecasting the direction of next month's excess return, which gives the reward, and constructing a dynamic volatility estimator that is optimized with a machine learning model, which gives the risk. Reward-risk timing with machine learning provides substantial improvements in investor utility, alphas, Sharpe ratios, and maximum drawdowns, after accounting for transaction costs, leverage constraints, and on a new out-of-sample test set. This paper provides a unifying framework for machine learning applied to both return- and volatility-timing.
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:arx:papers:2003.00656&r=all
  20. By: William C. Horrace (Center for Policy Research, Maxwell School, Syracuse University, 426 Eggers Hall, Syracuse, NY 13244); Hyunseok Jung (Department of Economics, University of Arkansas); Shane Sanders (Department of Sports Management, Syracuse University)
    Abstract: We consider a heterogeneous social interaction model where agents interact with peers within their own network but also interact with agents across other (non-peer) networks. To address potential endogeneity in the networks, we assume that each network has a central planner who makes strategic network decisions based on observable and unobservable characteristics of the peers in her charge. The model forms a simultaneous equation system that can be estimated by Quasi-Maximum Likelihood. We apply a restricted version of our model to data on National Basketball Association games, where agents are players, networks are individual teams organized by coaches, and competition is head-to-head. That is, at any time a player only interacts with two networks: their team and the opposing team. We find significant positive within-team peer-effects and both negative and positive opposing-team competitor-effects in NBA games. The former are interpretable as “team chemistries" which enhance the individual performances of players on the same team. The latter are interpretable as “team rivalries," which can either enhance or diminish the individual performance of opposing players.
    Keywords: Spatial Analysis, Peer Effects, Endogeneity, Machine Learning
    JEL: C13 C31 D24
    Date: 2020–03
    URL: http://d.repec.org/n?u=RePEc:max:cprwps:226&r=all
  21. By: Hoang,Trung Xuan; Le,Duong Trung; Nguyen,Ha Minh; Vuong,Nguyen Dinh Tuan
    Abstract: This paper examines the labor market impacts of a large-scale marine environmental crisis caused by toxic chemical contamination in Vietnam's central coast in 2016. Combining labor force surveys with satellite data on fishing-boat detection, the analysis finds negative and heterogeneous impacts on fishery incomes and employment and uncovers interesting coping patterns. Satellite data suggest that upstream fishers traveled to safe fishing grounds, and thus bore lower income damage. Downstream fishers, instead, endured severe impact and were more likely to substitute fishery hours for working secondary jobs. The paper also finds evidence on an impact recovery to fishing intensity and fishery income, and a positive labor market spillover to freshwater fishery.
    Date: 2019–04–22
    URL: http://d.repec.org/n?u=RePEc:wbk:wbrwps:8827&r=all
  22. By: Boriss Siliverstovs (Bank of Latvia); Daniel Wochner (ETH Zurich)
    Abstract: This paper re-examines the findings of Stock and Watson (2012b) who assessed the predictive performance of DFMs over AR benchmarks for hundreds of target variables by focusing on possible business cycle performance asymmetries in the spirit of Chauvet and Potter (2013) and Siliverstovs (2017a; 2017b; 2020). Our forecasting experiment is based on a novel big macroeconomic dataset (FRED-QD) comprising over 200 quarterly indicators for almost 60 years (1960–2018; see, e.g. McCracken and Ng (2019b)). Our results are consistent with this nascent state-dependent evaluation literature and generalize their relevance to a large number of indicators. We document systematic model performance differences across business cycles (longitudinal) as well as variable groups (cross-sectional). While the absolute size of prediction errors tend to be larger in busts than in booms for both DFMs and ARs, DFMs relative improvement over ARs is typically large and statistically significant during recessions but not during expansions (see, e.g. Chauvet and Potter (2013)). Our findings further suggest that the widespread practice of relying on full sample forecast evaluation metrics may not be ideal, i.e. for at least two thirds of all 216 macroeconomic indicators full sample rRMSFEs systematically over-estimate performance in expansionary subsamples and under-estimate it in recessionary subsamples (see, e.g. Siliverstovs (2017a; 2020)). These findings are robust to several alternative specifications and have high practical relevance for both consumers and producers of model-based economic forecasts.
    Keywords: forecast evaluation, dynamic factor models, business cycle asymmetries, big macroeconomic datasets, US
    JEL: C32 C45 C52 E17
    Date: 2020–02–11
    URL: http://d.repec.org/n?u=RePEc:ltv:wpaper:202002&r=all
  23. By: Peng, Cong
    Abstract: Traditional retail involves traffic both from warehouses to stores and from consumers to stores. Ecommerce cuts intermediate traffic by delivering goods directly from the warehouses to the consumers. Although plenty of evidence has shown that vans that are servicing e-commerce are a growing contributor to traffic and congestion, consumers are also making fewer shopping trips using vehicles. This poses the question of whether e-commerce reduces traffic congestion. The paper exploits the exogenous shock of an influential online shopping retail discount event in China (similar to Cyber Monday), to investigate how the rapid growth of e-commerce affects urban traffic congestion. Portraying e-commerce as trade across cities, I specified a CES demand system with heterogeneous consumers to model consumption, vehicle demand and traffic congestion. I tracked hourly traffic congestion data in 94 Chinese cities in one week before and two weeks after the event. In the week after the event, intra-city traffic congestion dropped by 1.7% during peaks and 1% during non-peak hours. Using Baidu Index (similar to Google Trends) as a proxy for online shopping, I found online shopping increasing by about 1.6 times during the event. Based on the model, I find evidence for a 10% increase in online shopping causing a 1.4% reduction in traffic congestion, with the effect most salient from 9am to 11am and from 7pm to midnight. A welfare analysis conducted for Beijing suggests that the congestion relief effect has a monetary value of around 239 million dollars a year. The finding suggests that online shopping is more traffic-efficient than offline shopping, along with sizable knock-on welfare gains.
    Keywords: e-commerce; traffic congestion; heterogeneous consumers; shopping vehicle demand; air pollution
    JEL: R40 O30
    Date: 2019–08
    URL: http://d.repec.org/n?u=RePEc:ehl:lserod:103411&r=all
  24. By: Paul, Alexander (Aarhus University); Bleses, Dorthe (Aarhus University); Rosholm, Michael (Aarhus University)
    Abstract: Many targeted childhood interventions such as the Perry Preschool Project select eligible children based on a risk score. The variables entering the risk score and their corresponding weights are usually chosen ad hoc and are unlikely to be optimal. This paper develops a simple economic model and exploits Danish administrative data to address the issue of efficient targeting in childhood interventions. We define children to be in need of an intervention if they suffer from an socially undesirable outcome, such as criminal behavior, at around age 30. Because interventions are most effective very early in life, we then test if and to what extent indicators available at birth can predict the emergence of these outcomes. We find fair to good levels of prediction accuracy for many outcomes, especially educational attainment, criminal behavior, placement in foster care as well as combinations of these outcomes. Logistic regressions perform as well as other machine learning methods. A parsimonious set of indicators consisting of sex, parental education and parental income predicts almost as accurately as using the full set of predictors. Finally, we derive optimal weights for the construction of risk scores. Unlike the ad hoc weights used in typical childhood interventions, we find that optimal weights vary with the outcome of interest, differ between father and mother for the same predictor and should be disproportionately large when parents are at the bottom of the education and income distribution.
    Keywords: targeting, early childhood intervention, machine learning
    JEL: I18 I28 I38
    Date: 2020–02
    URL: http://d.repec.org/n?u=RePEc:iza:izadps:dp12989&r=all
  25. By: Omotosho, Babatunde S.
    Abstract: This paper employs text-mining techniques to analyse the communication strategy of the Central Bank of Nigeria (CBN) during the period 2004-2019. Since the policy communique released after each meeting of the CBN’s monetary policy committee (MPC) represents an important tool of central bank communication, we construct a corpus based on 87 policy communiques with a total of 123, 353 words. Having processed the textual data into a form suitable for analysis, we examined the readability, sentiments, and topics of the policy documents. While the CBN’s communication has increased substantially over the years, implying increased monetary policy transparency; the computed Coleman and Liau readability index shows that the word and sentence structures of the policy communiques have become more complex, thus reducing its readability. In terms of monetary policy sentiments, we find an average net score of -10.5 per cent, reflecting the level of policy uncertainties faced by the MPC over the sample period. In addition, our results indicate that the topics driving the linguistic contents of the communiques were influenced by the Bank’s policy objectives as well as the nature of shocks hitting the economy per period.
    Keywords: Central bank communication, Text mining, Monetary policy
    JEL: E02 E32 E52 E58 E61
    Date: 2019
    URL: http://d.repec.org/n?u=RePEc:pra:mprapa:98850&r=all

This nep-big issue is ©2020 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.