nep-big New Economics Papers
on Big Data
Issue of 2017‒12‒03
seven papers chosen by
Tom Coupé
University of Canterbury

  1. Macroeconomic nowcasting and forecasting with big data By Bok, Brandyn; Caratelli, Daniele; Giannone, Domenico; Sbordone, Argia M.; Tambalotti, Andrea
  2. Survey of Big Data Use and Innovation in Japanese Manufacturing Firms By MOTOHASHI Kazuyuki
  3. Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics By Erik Brynjolfsson; Daniel Rock; Chad Syverson
  4. The geography of city liveliness and consumption: evidence from location-based big data By Wu, Wenjie; Wang, Jianghao; Li, Chengyu; Wang, Mark
  5. Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo By Mitsuru Igami
  6. Orthogonal Machine Learning: Power and Limitations By Lester Mackey; Vasilis Syrgkanis; Ilias Zadik
  7. What Works for Whom? A Bayesian Approach to Channeling Big Data Streams for Public Program Evaluation By Mariel McKenzie Finucane; Ignacio Martinez; Scott Cody

  1. By: Bok, Brandyn (Federal Reserve Bank of New York); Caratelli, Daniele (Federal Reserve Bank of New York); Giannone, Domenico (Federal Reserve Bank of New York); Sbordone, Argia M. (Federal Reserve Bank of New York); Tambalotti, Andrea (Federal Reserve Bank of New York)
    Abstract: Data, data, data . . . Economists know it well, especially when it comes to monitoring macroeconomic conditions—the basis for making informed economic and policy decisions. Handling large and complex data sets was a challenge that macroeconomists engaged in real-time analysis faced long before “big data” became pervasive in other disciplines. We review how methods for tracking economic conditions using big data have evolved over time and explain how econometric techniques have advanced to mimic and automate the best practices of forecasters on trading desks, at central banks, and in other market-monitoring roles. We present in detail the methodology underlying the New York Fed Staff Nowcast, which employs these innovative techniques to produce early estimates of GDP growth, synthesizing a wide range of macroeconomic data as they become available.
    Keywords: monitoring economic conditions; business cycle; macroeconomic data; large data sets; high-dimensional data; real-time data flow; factor model; state space models; Kalman filter
    JEL: C32 C53 E32
    Date: 2017–11–01
  2. By: MOTOHASHI Kazuyuki
    Abstract: This paper shows the results of a survey on big data use in manufacturing firms and innovation, conducted in November 2015. The survey investigated (1) firms'organization of big data use, (2) collection and business use of big data by type of data, and (3) use of datasets outside firms, with 539 respondents out of 4,000 firms. We divided the entire manufacturing process into three parts, i.e., development, mass production, and after services, and find that big data are widely used in all activities. In addition, firms with dedicated big data use function are more likely to conduct big data activity across various departments, as well as demonstrate a higher performance impact. However, we also find great disparity in terms of the usage style, particularly by firm size. For example, more than half of small and mid-sized enterprises (SMEs) responded that they have heard of Internet of Things (IoT), yet they are unaware of how to respond to such trend. Policy implications based on the results include (1) promoting diffusion of big data use, particularly for SMEs, (2) supporting human capital development for big data use, and (3) strategic standardization activities of IoT.
    Date: 2017–08
  3. By: Erik Brynjolfsson; Daniel Rock; Chad Syverson
    Abstract: We live in an age of paradox. Systems using artificial intelligence match or surpass human level performance in more and more domains, leveraging rapid advances in other technologies and driving soaring stock prices. Yet measured productivity growth has declined by half over the past decade, and real income has stagnated since the late 1990s for a majority of Americans. We describe four potential explanations for this clash of expectations and statistics: false hopes, mismeasurement, redistribution, and implementation lags. While a case can be made for each, we argue that lags have likely been the biggest contributor to the paradox. The most impressive capabilities of AI, particularly those based on machine learning, have not yet diffused widely. More importantly, like other general purpose technologies, their full effects won’t be realized until waves of complementary innovations are developed and implemented. The required adjustment costs, organizational changes, and new skills can be modeled as a kind of intangible capital. A portion of the value of this intangible capital is already reflected in the market value of firms. However, going forward, national statistics could fail to measure the full benefits of the new technologies and some may even have the wrong sign.
    JEL: D2 O3 O4
    Date: 2017–11
  4. By: Wu, Wenjie; Wang, Jianghao; Li, Chengyu; Wang, Mark
    Abstract: Understanding the complexity in the connection between city liveliness and spatial configurationsfor consumptive amenities has been an important but understudied research field in fast urbanising countries like China. This paper presents the first step towards filling this gap though location-based big data perspectives. City liveliness is measured by aggregated spacetime human activity intensities using mobile phone positioning data.Consumptive amenities are identified by point-of-interest data from Chinese Yelp website (dian ping). The results provide the insights into the geographic contextual uncertainties of consumptive amenities in shaping the rise and fall in the vibrancy of city liveliness.
    Keywords: big data; local linear estimator; city liveliness; consumption; China
    JEL: Q15
    Date: 2016–11
  5. By: Mitsuru Igami
    Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing number of tasks, including the classical games of chess, shogi, and Go, but understanding and explaining AI remain challenging. This paper studies the machine-learning algorithms for developing the game AIs, and provides their structural interpretations. Specifically, chess-playing Deep Blue is a calibrated value function, whereas shogi-playing Bonanza represents an estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy network" is a deep neural network (DNN) version of Hotz and Miller's (1993) conditional choice probability estimates; its "reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) simulation method for estimating the value function. Their performances suggest DNNs are a useful functional form when the state space is large and data are sparse. Explicitly incorporating strategic interactions and unobserved heterogeneity in the data-generating process would further improve AIs' explicability.
    Date: 2017–10
  6. By: Lester Mackey; Vasilis Syrgkanis; Ilias Zadik
    Abstract: Double machine learning provides $\sqrt{n}$-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an $n^{-1/4}$ rate. The key is to employ \emph{Neyman-orthogonal} moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the $n^{-1/4}$ requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order notion of orthogonality that grants robustness to more complex or higher-dimensional nuisance parameters. In the partially linear model setting popular in causal inference, we use Stein's lemma to show that we can construct second-order orthogonal moments if and only if the treatment residual is not normally distributed. We conclude by demonstrating the robustness benefits of an explicit doubly-orthogonal estimation procedure for treatment effect.
    Date: 2017–11
  7. By: Mariel McKenzie Finucane; Ignacio Martinez; Scott Cody
    Abstract: In the coming years, public programs will capture even more and richer data than they do now, including data from web-based tools used by participants in employment services, from tablet-based educational curricula, and from electronic health records for Medicaid beneficiaries.
    Keywords: heterogeneous impacts, Bayesian statistics, adaptive design, hierarchical models, randomized control trials
    JEL: I

This nep-big issue is ©2017 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at For comments please write to the director of NEP, Marco Novarese at <>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.