nep-big New Economics Papers
on Big Data
Issue of 2018‒03‒26
eight papers chosen by
Tom Coupé
University of Canterbury

  1. The Effects of Temporal Aggregation on Search Engine Data By Tierney, Heather L.R.; Kim, Jiyoon (June); Nazarov, Zafar
  2. The Political Economy of Collective Memories: Evidence from Russian Politics By Alessandro Belmonte; Michael Rochlitz
  3. Synthetic Control Methods and Big Data By Daniel Kinn
  4. Deep Learning for Causal Inference By Vikas Ramachandra
  5. Double/De-Biased Machine Learning Using Regularized Riesz Representers By Victor Chernozhukov; Whitney Newey; James Robins
  6. Blinded by the Light? Heterogeneity in the Luminosity-Growth Nexus and the African Growth Miracle By Lionel Roger
  7. Mapping the Radical Innovations in Food Industry: A Text Mining Study By Ilya Kuzminov; Pavel Bakhtin; Elena Khabirova; Maxim Kotsemir; Alina Lavrynenko
  8. Computation of optimal transport and related hedging problems via penalization and neural networks By Stephan Eckstein; Michael Kupper

  1. By: Tierney, Heather L.R.; Kim, Jiyoon (June); Nazarov, Zafar
    Abstract: Using structured machine learning, this paper examines the effect that temporal aggregation has on big data from Google Analytics and Google Trends. Specifically, daily and weekly data from the Charleston Area Convention and Visitors Bureau (CACVB) website from January 2008 to March 2009 via Google Analytics and weekly, monthly, and quarterly data from Google Trends for seven economic variables from 2004 to 2011 are examined. Taking into account the different levels of aggregation, the CDFs and the estimated regression results are examined. The Kolmogorov-Smirnov test rejects the null of equivalent data distributions in the vast majority of cases for the CACVB data, but this is not the case for the economic variable. Through data mining, this paper also finds that aggregation has the potential of affecting the level of integration and the regression results for both the CACVB data and the seven economic variables.
    Keywords: Big Data, Machine Learning, Data Mining, Aggregation, Unit roots, Scaling Effects, Normalization Effects
    JEL: C19 C43
    Date: 2018–01–30
  2. By: Alessandro Belmonte (IMT Institute for Advanced Studies); Michael Rochlitz (National Research University Higher School of Economics)
    Abstract: How do political elites exploit salient historical events to reactivate collective memories and entrench their power? We study this question using data from the Russian Federation under Putin. We document a substantial recollection campaign of the traumatic transition the Russian population experienced during the 1990s, starting with the year 2003. We combine this time discontinuity in the recollection of negative collective memories with regional-level information about traumatic experiences of the 1990s. Our results show that Russians vote more for the government, and less for the liberal political opposition, in regions that suffered more during the transition period, once memories from the period are recalled on state-controlled media. We then provide additional evidence on the mechanism and nd, using a text analysis of local newspapers, that in those regions where local newspapers more intensively recall the chaotic 1990s, electoral support for the government is higher. Finally, we show that in regions in which the media is less independent from the state, this recollection campaign is more effective.
    Keywords: collective memory, recollection of the past, voting, Russia.
    JEL: D74 D83 P16 Z13
    Date: 2018
  3. By: Daniel Kinn
    Abstract: Many macroeconomic policy questions may be assessed in a case study framework, where the time series of a treated unit is compared to a counterfactual constructed from a large pool of control units. I provide a general framework for this setting, tailored to predict the counterfactual by minimizing a tradeoff between underfitting (bias) and overfitting (variance). The framework nests recently proposed structural and reduced form machine learning approaches as special cases. Furthermore, difference-in-differences with matching and the original synthetic control are restrictive cases of the framework, in general not minimizing the bias-variance objective. Using simulation studies I find that machine learning methods outperform traditional methods when the number of potential controls is large or the treated unit is substantially different from the controls. Equipped with a toolbox of approaches, I revisit a study on the effect of economic liberalisation on economic growth. I find effects for several countries where no effect was found in the original study. Furthermore, I inspect how a systematically important bank respond to increasing capital requirements by using a large pool of banks to estimate the counterfactual. Finally, I assess the effect of a changing product price on product sales using a novel scanner dataset.
    Date: 2018–02
  4. By: Vikas Ramachandra
    Abstract: In this paper, we propose deep learning techniques for econometrics, specifically for causal inference and for estimating individual as well as average treatment effects. The contribution of this paper is twofold: 1. For generalized neighbor matching to estimate individual and average treatment effects, we analyze the use of autoencoders for dimensionality reduction while maintaining the local neighborhood structure among the data points in the embedding space. This deep learning based technique is shown to perform better than simple k nearest neighbor matching for estimating treatment effects, especially when the data points have several features/covariates but reside in a low dimensional manifold in high dimensional space. We also observe better performance than manifold learning methods for neighbor matching. 2. Propensity score matching is one specific and popular way to perform matching in order to estimate average and individual treatment effects. We propose the use of deep neural networks (DNNs) for propensity score matching, and present a network called PropensityNet for this. This is a generalization of the logistic regression technique traditionally used to estimate propensity scores and we show empirically that DNNs perform better than logistic regression at propensity score matching. Code for both methods will be made available shortly on Github at:
    Date: 2018–02
  5. By: Victor Chernozhukov; Whitney Newey; James Robins
    Abstract: We provide adaptive inference methods for linear functionals of sparse linear approximations to the conditional expectation function. Examples of such functionals include average derivatives, policy effects, average treatment effects, and many others. The construction relies on building Neyman-orthogonal equations that are approximately invariant to perturbations of the nuisance parameters, including the Riesz representer for the linear functionals. We use L1-regularized methods to learn approximations to the regression function and the Riesz representer, and construct the estimator for the linear functionals as the solution to the orthogonal estimating equations. We establish that under weak assumptions the estimator concentrates in a 1/root n neighborhood of the target with deviations controlled by the normal laws, and the estimator attains the semi-parametric efficiency bound in many cases. In particular, either the approximation to the regression function or the approximation to the Rietz representer can be "dense" as long as one of them is sufficiently "sparse". Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models.
    Date: 2018–02
  6. By: Lionel Roger
    Abstract: Night-time light emissions are a popular proxy for growth in circumstances where official data are deemed unreliable. We show that the underlying relationship varies substantially across countries, undermining the imposition of a single slope common in the literature. We propose a two-step method to improve country-specific growth estimates informed by night-light data, making use of a machine-learning algorithm to discern factors driving differences in the luminosity-growth elasticity across countries. The improved performance of this strategy over existing approaches is established in a number of simulation exercises. Applied to African data between 1992 and 2013 we find little evidence of an `African Growth Miracle' undetected by official statistics, as suggested by Young (2012); instead, we observe that countries which recently revised their GDP figures tend to report substantially inflated growth rates over recent years, in line with Jerven (2014)'s hypothesis of purely `statistical growth'.
    Keywords: night lights, economic growth, african growth miracle
    Date: 2018
  7. By: Ilya Kuzminov (National Research University Higher School of Economics); Pavel Bakhtin (National Research University Higher School of Economics); Elena Khabirova (National Research University Higher School of Economics); Maxim Kotsemir (National Research University Higher School of Economics); Alina Lavrynenko (National Research University Higher School of Economics)
    Abstract: The article presents the results of the study of radical innovations in the global food industry which were obtained through semantic analysis of heterogeneous unstructured text data sources by applying innovative big data text mining system. The approach used allows performing rapid, yet comprehensive aggregation of the whole polyphony of existing knowledge of the technology development in any sector for traditional foresight, future oriented technology analysis, and horizon scanning studies. The sources for the analysis include research papers, patent applications with both full-text data and additional structured metadata, analytical reports by main international organizations and national key players, various media and news resources, including all the major technology innovation, disruption and venture capital news websites. Their processing with an introduced approach for trend- and technology-mapping helps to identify ongoing and emerging technology-related trends, weak signals on possible scientific breakthroughs in the global food industry, including most promising startup strategies and food innovation controversies. This kind of analysis can be performed on a regular basis owing to constant accumulation of textual data and serve as a framework for constant science and technology (S&T) monitoring for early warning on changing technology landscape and its implications on agriculture and food markets
    Keywords: radical innovations, trends, weak signals, big data, text mining, food industry
    JEL: O1 O3
    Date: 2018
  8. By: Stephan Eckstein; Michael Kupper
    Abstract: This paper presents a widely applicable approach to solving (multi-marginal, martingale) optimal transport and related problems via neural networks. The core idea is to penalize the optimization problem in its dual formulation and reduce it to a finite dimensional one which corresponds to optimizing a neural network with smooth objective function. We present numerical examples from optimal transport, martingale optimal transport, portfolio optimization under uncertainty and generative adversarial networks that showcase the generality and effectiveness of the approach.
    Date: 2018–02

This nep-big issue is ©2018 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.