nep-big New Economics Papers
on Big Data
Issue of 2018‒08‒27
ten papers chosen by
Tom Coupé
University of Canterbury

  1. Wilderness Conservation and the Reach of the State: Evidence from National Borders in the Amazon By Robin Burgess; Francisco J.M. Costa; Benjamin A. Olken
  2. The Bigger Picture: Combining Econometrics with Analytics Improve Forecasts of Movie Success By Steven F. Lehrer; Tian Xie
  3. Using online job vacancies to understand the UK labour market from the bottom-up By Turrell, Arthur; Thurgood, James; Djumalieva, Jyldyz; Copple, David; Speigner, Bradley
  4. Compilation of Experimental Price Indices Using Big Data and Machine Learning:A Comparative Analysis and Validity Verification of Quality Adjustments By Nobuhiro Abe; Kimiaki Shinozaki
  5. A novel machine learning approach for identifying the drivers of domestic electricity users’ price responsiveness By Guo, P.; Lam, J.; Li, V.
  6. Big Data & Macroeconomic Nowcasting: Methodological Review By George Kapetanios; Fotis Papailias
  7. The Simple Empirics of Optimal Online Auctions By Dominic Coey; Bradley Larsen; Kane Sweeney; Caio Waisman
  8. American Put Option pricing using Least squares Monte Carlo method under Bakshi, Cao and Chen Model Framework (1997) and comparison to alternative regression techniques in Monte Carlo By Anurag Sodhi
  9. Regional Market Integration and City Growth in East Africa: Local but no Regional Effects? By Andreas Eberhard-Ruiz; Alexander Moradi
  10. Reflexiones sobre la evaluación de desempeño en la IV Revolución Industrial By Lago, José Luis

  1. By: Robin Burgess; Francisco J.M. Costa; Benjamin A. Olken
    Abstract: Preserving wilderness ecosystems in developing countries is challenging because their remote location places them far from state control. We investigate this using 30x30 meter satellite data to determine how Amazonian deforestation changes discretely at the Brazilian international border. In 2000, Brazilian pixels were 30 percent more likely to be deforested, and between 2001 and 2005 annual Brazilian deforestation was more than 3 times the rate observed across the border. In 2006, just after Brazil introduces policies to reduce illegal deforestation, these differences disappear. These results demonstrate the power of the state to affect whether wilderness ecosystems are conserved or exploited.
    JEL: O13 Q23
    Date: 2018–07
  2. By: Steven F. Lehrer; Tian Xie
    Abstract: There exists significant hype regarding how much machine learning and incorporating social media data can improve forecast accuracy in commercial applications. To assess if the hype is warranted, we use data from the film industry in simulation experiments that contrast econometric approaches with tools from the predictive analytics literature. Further, we propose new strategies that combine elements from each literature in a bid to capture richer patterns of heterogeneity in the underlying relationship governing revenue. Our results demonstrate the importance of social media data and value from hybrid strategies that combine econometrics and machine learning when conducting forecasts with new big data sources. Specifically, while recursive partitioning strategies greatly outperform dimension reduction strategies and traditional econometric approaches in forecast accuracy, there are further significant gains from using hybrid approaches. Further, Monte Carlo experiments demonstrate that these benefits arise from the significant heterogeneity in how social media measures and other film characteristics influence box office outcomes.
    JEL: C52 C53
    Date: 2018–06
  3. By: Turrell, Arthur (Bank of England); Thurgood, James (Bank of England); Djumalieva, Jyldyz (Nesta); Copple, David (Bank of England); Speigner, Bradley (Bank of England)
    Abstract: What type of disaggregation should be used to analyse heterogeneous labour markets? How granular should that disaggregation be? Economic theory does not currently tell us; perhaps data can. Analyses typically split labour markets according to top-down classification schema such as sector or occupation. But these may be slow-moving or inaccurate relative to the structure of the labour market as perceived by firms and workers. Using a dataset of 15 million job adverts posted online between 2008 and 2016, we create an empirically driven, ‘bottom-up’ segmentation of the labour market which cuts across wage, sector, and occupation. Our segmentation is based upon applying machine learning techniques to the demand expressed in the text of job descriptions. This segmentation automatically identifies traditional job roles but also surfaces sub-markets not apparent in current classifications. We show that the segmentation has explanatory power for offered wages. The methodology developed could be deployed to create data-driven taxonomies in conditions of rapidly changing labour markets and demonstrates the potential of unsupervised machine learning in economics.
    Keywords: Vacancies; classification; disaggregation
    JEL: J42
    Date: 2018–07–27
  4. By: Nobuhiro Abe (Bank of Japan); Kimiaki Shinozaki (Bank of Japan)
    Abstract: This paper compiles experimental price indices for 20 home electrical appliances and digital consumer electronic products using big data obtained from, the largest price comparison website in Japan, and a machine-learning algorithm which pairs legacy and successor products with high precision. In so doing, authors examine the validity of quality adjustment methods by performing comparative analyses on the difference these methods have on price indices. Findings from the analyses are as follows: Indices applied with the Webscraped Prices Comparison Method--the quality adjustment method newly developed and introduced by the Bank of Japan--are more cost-effective than those applied with the Hedonic Regression Method which is known to possess high accuracy in index creation. Indices applied with the Matched-Model Method, which is frequently applied to price indices using big data is unable to precisely reflect price increases intended to ensure the profitability often seen in home electronics at time of product turnover. This indicates the significant downward bias in price indices. These findings once again highlight the importance of selecting the appropriate quality adjustment method when compiling price indices.
    Keywords: price index; quality adjustment method; hedonic approach; support vector machine
    JEL: C43 C45 E31
    Date: 2018–08–20
  5. By: Guo, P.; Lam, J.; Li, V.
    Abstract: Time-based pricing programs for domestic electricity users have been effective in reducing peak demand and facilitating renewables integration. Nevertheless, high cost, price non-responsiveness and adverse selection may create the possible challenges. To overcome these challenges, it can be fruitful to investigate the ‘high-potential’ users, which are more responsive to price changes and apply time-based pricing to these users. Few studies have investigated how to identify which users are more price-responsive. We aim to fill this gap by comprehensively identifying the drivers of domestic users’ price responsiveness, in order to facilitate the selection of the high-potential users. We adopt a novel data-driven approach, first by a feed forward neural network model to accurately determine the baseline monthly peak consumption of individual households, followed by an integrated machine-learning variable selection methodology to identify the drivers of price responsiveness applied to Irish smart meter data from 2009-10 as part of a national Time of Use trial. This methodology substantially outperforms traditional variable selection methods by combining three advanced machine-learning techniques. Our results show that the response of energy users to price change is affected by a number of factors, ranging from demographic and dwelling characteristics, psychological factors, historical electricity consumption, to appliance ownership. In particular, historical electricity consumption, income, the number of occupants, perceived behavioural control, and adoption of specific appliances, including immersion water heater and dishwasher, are found to be significant drivers of price responsiveness. We also observe that continual price increase within a moderate range does not drive additional peak demand reduction, and that there is an intention-behaviour gap, whereby stated intention does not lead to actual peak reduction behavior. Based on our findings, we have conducted scenario analysis to demonstrate the feasibility of selecting the high potential users to achieve significant peak reduction.
    Keywords: Time-based electricity pricing, price responsiveness, high-potential users, variable selection, Time of Use, machine learning
    JEL: Q41
    Date: 2018–08–16
  6. By: George Kapetanios; Fotis Papailias
    Abstract: This paper is concerned with an introduction to big data which can be potentially used in nowcasting the UK GDP and other key macroeconomic variables. We discuss various big data classifications and review some indicative studies in the big data and macroeconomic nowcasting literature. A detailed discussion of big data methodologies is also provided. In particular, we focus on sparse regressions, heuristic optimisation of information criteria, factor methods and textual-data methods.
    Keywords: Big Data, Machine Learning, Sparse Regressions, Factor Models
    JEL: C32 C53
    Date: 2018–07
  7. By: Dominic Coey; Bradley Larsen; Kane Sweeney; Caio Waisman
    Abstract: We study reserve prices computed to maximize the expected profit of the seller based on historical observations of incomplete bid data typically available to the auction designer in online auctions for advertising or e-commerce. This direct approach to computing reserve prices circumvents the need to fully recover distributions of bidder valuations. We derive asymptotic results and also provide a new bound, based on the empirical Rademacher complexity, for the number of historical auction observations needed in order for revenue under the estimated reserve price to approximate revenue under the optimal reserve arbitrarily closely. This simple approach to estimating reserves may be particularly useful for auction design in Big Data settings, where traditional empirical auctions methods may be costly to implement. We illustrate the approach with e-commerce auction data from eBay. We also demonstrate how this idea can be extended to estimate all objects necessary to implement the Myerson (1981) optimal auction.
    JEL: C10 D44 L10
    Date: 2018–06
  8. By: Anurag Sodhi
    Abstract: This paper explores alternative regression techniques in pricing American put options and compares to the least-squares method (LSM) in Monte Carlo implemented by Longstaff-Schwartz, 2001 which uses least squares to estimate the conditional expected payoff to the option holder from continuation. The pricing is done under general model framework of Bakshi, Cao and Chen 1997 which incorporates, stochastic volatility, stochastic interest rate and jumps. Alternative regression techniques used are Artificial Neural Network (ANN) and Gradient Boosted Machine (GBM) Trees. Model calibration is done on American put options on SPY using these three techniques and results are compared on out of sample data.
    Date: 2018–08
  9. By: Andreas Eberhard-Ruiz; Alexander Moradi
    Abstract: We investigate changes in the spatial concentration of economic activities after the establishment of a regional economic community between Kenya, Tanzania, and Uganda in 2001. Measuring city growth using satellite imagery of lights emanated out to space at night, we demonstrate that cities close to the community’s internal borders expanded more than other cities further away. The growth effect is temporary and also highly localized: only cities less than 90 minutes of travel from the border experience an acceleration in growth rates; after four years growth rates revert to their pre-treatment level. We show that this is consistent with an asymmetric reduction in trade costs for two types of trade modalities that co-exist in many parts of sub-Saharan Africa, local small-scale trade and regional large-scale trade, with a larger reduction in costs of the former. Yet, while local e?ects are relatively large, equivalent to a 5.6% higher GDP for cities near the EAC’s internal borders, they do not imply a large reorganisation of economic activity across space nor a substantial alteration of countries’ urban systems.
    Keywords: Market Integration; Trade; Cross-Border Trade; City Growth; Periphery; Africa
    JEL: F1 F14 F15 O17 O18 O55 R12
    Date: 2018
  10. By: Lago, José Luis
    Abstract: En un mundo digital, la complementariedad entre el accionar de las nuevas tecnologías y las necesidades de los usuarios es lo que -genéricamente- caracterizará cada vez más la labor en una organización. En este contexto, el proceso de evaluación de desempeño cobra relevancia al aportar indicadores que legitiman la sustancia del trabajo humano ante el sostenido avance de la inteligencia artificial y la robótica en el campo laboral. Aquí solo se reseñan las bases sobre las cuales se creen (o adecuen) los instrumentos evaluativos que identifiquen su presencia y midan la evolución. Para lo cual, este ensayo parte de las consideraciones relevantes del proceso de evaluación de desempeño. Intentando, a su vez, no solo identificar los cambios que por la Cuarta Revolución Industrial modifican el quehacer de sus miembros, sino también las ventajas del accionar humano frente a la Inteligencia Artificial.
    Keywords: Evaluación de Desempeño; Inteligencia Artificial; Revolucion Industrial;
    Date: 2018

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.